The Data Science of Economics, and Finance

  1. Now, any time you’re working in data science you have to make decisions, and here are a few you have to make with financial data.
  2. There are a lot of important benefits to algorithmic trading.
  3. Let me give you a quick, successful example
  4. what is fraud?
  5. Another question, by the way, is how common is fraud?

To help repair the economic and social damage wrought by the coronavirus pandemic, a transformational recovery is needed. The social and economic situation in the world was already shaken by the fall of 2019 when one-fourth of the world’s developed nations were suffering from social unrest.

The coronavirus accelerated those trends and I expect the aftermath to be in much worse shape. The urgency to reform our societies is going to be at its highest. Artificial intelligence and data science will be key enablers of such transformation. They have the potential to revolutionize our way of life and create new opportunities.

Author — Dnyanesh Walwadkar

Data Science in Economics and Finance for Decision Makers is a key resource for any financial-market participant, policy-maker, central banker, economist, or decision-maker required to understand the impact and opportunities presented by the transformation of digitalisation and fintech.The vast majority of our financial transactions these days involve no physical exchanges but rather the exchange of data that signifies wealth. And so, to understand money, you need to understand data.

The use of data science and artificial intelligence for economics and finance is providing benefits for scientists, professionals, and policy-makers by improving the available data analysis methodologies for economic forecasting and therefore making our societies better prepared for the challenges of tomorrow:

The blog will explore the ways that principles and practices of data science can help people when trading securities, giving and receiving credit, and averting fraud, see how social media and recent developments in Cryptocurrencies connect to data science. And we’ll explore important procedures in trend analysis, causal inference, and ethics as they apply to data science.

In courtroom cross-examinations and in wedding proposals, the general rule is: never ask a question unless you already know the answer. Now, wouldn’t it be great if you could already know the answer to your financial questions? Like whether a particular investment will be profitable, whether a loan application will get accepted, or whether your credit card is safe online, and while it’s not possible to be 100 percent certain about the future, data science can provide useful answers to those questions and a host of others in economics, banking, and finance.

Specifically, when I say Data Science and Money what do I mean? Well, Data science is the combination of mathematics and statistics with computer programming, in applied settings. And Data Science and Money is the application of those techniques, from data science, to solve practical problems in finance, banking, and economics.

  • In this context, the use of recent Data Science technologies for improving forecasting and nowcasting for several types of economic and financial applications has high potential.
  • The vast amount of data available in current times referred to as the Big Data era, opens a huge amount of opportunities to economists and scientists, with a condition that data would be handled, processed, linked, and analyzed.
  • From forecasting economic indexes with little observations and only a few variables, we now have millions of observations and hundreds of variables. Questions that previously could only be answered with a delay of several months or even years can now be addressed nearly in real-time. Big data, related analysis performed through (Deep) Machine Learning technologies, and the availability of more and more performing hardware (Cloud Computing infrastructures, GPUs, etc.) can integrate and augment the information carried out by publicly available aggregated variables produced by national and international statistical agencies.
  • By lowering the level of granularity, Data Science technologies can uncover economic relationships that are often not evident when variables are in an aggregated form over many products, individuals, or time periods. This evolution also brought about the development of FinTech, a newly coined abbreviation for Financial Technology, whose aim is to leverage cutting-edge technologies to compete with traditional financial methods for the delivery of financial services

Why Apply Data Science to Money?

Number one: you want to be able to identify unseen possibilities for profit, especially in a rapidly evolving and competitive environment. It also allows you to increase customer loyalty, find new markets, and so on. You can identify and quantify financial risk and take steps to reduce it to acceptable levels in an extremely fast-changing environment. And the overall goal is greater profitability, or return on investment for yourself, or more often, for the people who’ve entrusted you with their financial future. All three of these are excellent reasons to want to get the extra insight and power of Data Science when dealing with financial data.

Where does your data actually come from?

There are traditional Sources of Data for Finance, that includes things like Economic indicators like macroeconomic, GDP, and so on, the performance of investments over time, so you get the history of a stock, and Records of behaviors of clients, what they have done and whether they have made their payments on time and so forth. But data science allows you to go beyond that, you can get Unconventional Sources of Data. That can include things like Unstructured text, Tweets, and social media posts. It’s a very complicated kind of data, but data science allows you to get extra insight into that to be used in financial situations.

In recent years, technological advances have largely increased the number of devices generating information about human and economic activity (e.g., sensors, monitoring, IoT devices, social networks). These new data sources provide a rich, frequent, and diversified amount of information, from which the state of the economy could be estimated with accuracy and timeliness. Obtaining and analyzing such kinds of data is a challenging task due to its size and variety. However, if properly exploited, these new data sources could bring additional predictive power than standard regressors used in traditional economic and financial analysis.

Now, any time you’re working in data science you have to make decisions, and here are a few you have to make with financial data.

  • Number one: always what is the acceptable level of risk. Specifically, there’s more than one kind of risk. There are false positives and false negatives. A false positive, say for instance with the loan default, would be somebody who you think would default, but they wouldn’t, they’d make their payments. And a false negative would be somebody who you think would be safe, but actually would default. Those are completely independent judgments and you have to put values separately on each of them.
  • You also get to decide about the Sources of your data. Do you want existing in-house data, social media data, open data, or third-party data, and all of those involve issues of time, cost, and quality. How important is speed, or (near) real-time analysis? That’s going to change the kind of data you can use and the algorithms that you can use. And then, finally, the Importance of transparency, being able to know what’s actually happening, versus the really impressive precision of black-box models, the machine learning algorithms that are very hard to interpret.
  • These are decisions that you have to make before you get started on a particular data science project.

Where machines and their algorithms do a lot of the heavy lifting previously left to humans

It’s a common trope in science fiction movies that the machines are here to take the place of the humans, and while this often works into dystopian fantasy, there are plenty of places in a real-life where machines and their algorithms do a lot of the heavy lifting previously left to humans. Securities trading is one place where this change has already happened, with over 75 percent of all trades being made by machines, and so, let’s talk a little bit about why that’s the case, how it works, and what it means.

First off, we’re talking about algorithmic trading, and so, you may want to know what is that?

  • Well, algorithmic trading is the use of computer programs, or specifically the algorithms that process the information, to evaluate and trade securities, like stocks and bonds. Another name for this is automated trading systems, and a variation on this is informally called human-in-the-loop trading. You have the possibility of the computer doing absolutely everything, including making the final action, the buy or the sell, or pre-processing and then handing it over to a human to wrap it up.

There are a lot of important benefits to algorithmic trading.

  • Number one is that a computer can look at a lot more information than humans can, so there’s the volume of data.
  • Second, computers can analyze that data, and react a lot faster than humans can, so there’s also speed, and then put those two together, and you have a lot more potential for profitability, as well as the ability to offer more appropriate recommendations to your customers and clients.
  • To do this, you’re going to need some data, and the data for algorithmic trading, well, can look a lot like normal data for trading. You look at a stock’s historical performance, or you can look at the performance of predictive stocks or other stocks where their changes predict changes in the target stock.
  • You can look at information like company reports, or changes in the governmental regulatory environment, you can even look at information from social media, and all of these can be combined in ways to give you additional insight into potential changes in the value of a particular security.

Let me give you a quick, successful example

  • The application of Twitter bots to finance.
  • So for instance, you get information on Twitter, it’s unstructured text, and it requires a little bit of finesse to be able to do something interesting with that, from a data science point of view.
  • One company called Data miner mines tweets for financial clues, and they do this with all of the tweets, it’s an enormous, enormous amount of data that comes in extraordinarily quickly. They specifically look for mentions on stocks or companies, things that can influence prices, so for instance, there have been situations where a single tweet really predicted a 25 percent drop in value for certain stocks.
  • for a more specific example Data miner found information about Volkswagen’s diesel emissions scandal three days before the stock price plummeted 30 percent, that’s an enormous opportunity to get value.
  • Also, oil and gas traders received alerts about the death of the king of Saudi Arabia more than four hours before crude prices changed in the news, and so, if you monitor enough things, and you react quickly enough, you can get a lot of value out of that, and just so you know that people place a lot of trust in this.

keep in mind that no matter which algorithm you choose in Economical or Finance Data Science Problems, there are going to be some potential problems. So, first and foremost, large data sets are required for accurate prediction. Especially if you’re using something like deep learning, it wants a lot of data. It’s best if you have millions of records, and hundreds or possibly thousands of variables to deal with. Don’t forget that when you’re doing this kind of machine learning, your models may include unlawful demographic bias.

If there’s information in there about gender, age, ethnicity, you have to make sure that it doesn’t get included directly, or indirectly in your model. Also, models tend to lose predictive accuracy over time. They’re better at the beginning when you develop them, and when the circumstances in which the data operates are still the same. But as you get a little further down the road, a year, two years, three years, there’s drift. And so you’re going to have to revisit and retune your model every now and then.

And then also, there’s the issue that the most recently-available data may actually not be the most accurate or useful because it hasn’t been tested or verified yet. And so, you need to take a very close look at that in terms of making sure you’re providing accurate information to your algorithms.

Predicting repayment from free text

Now, for a fascinating example of how this works, there’s a project on predicting repayment from free text. This is from a research paper with a beautiful title, When Words Sweat, with the subtitle, Identifying Signals for Loan Defaults in the Text of Loan Applications. And what this is, is a study that looked at the applications for peer-to-peer loans on a network called Prosper, where in addition to providing a lot of regular information, people also wrote a sort of a short application letter about why somebody should loan to them.

And the researchers here did a word analysis to find out what words were predictive of repayment within a three-year window, and which ones were associated with default. And this is the comprehensive list, it’s several pages long, but here are some of the standouts.

  • When the word ‘reinvest’ showed up in a person’s letter, they were over four times more likely to successfully repay.
  • If they wrote ‘after-tax,’ 3.1 times more likely. If the word ‘graduate’ or some variation showed up, 1.5 times more likely. These all show that the person has some kind of financial sophistication.
  • On the other hand, some of the words associated with defaults included ‘a few months.’ Those people were 3.4 times more likely to default. The phrase ‘god bless,’ is 2.2 times more likely to default. And ‘I promise, I promise I’ll pay you back’ was almost double the risk. And that’s in part because those ones have a lot to do with the things that people use in scams, or when they’re lying and being over-defensive to others.

And so, what you find is that data science is able to sort through a huge number of applications, and then by looking at these specific patterns, find things that can be used as signals to the creditworthiness of a specific individual. And so, with that short example in mind, let’s just say again.

What are the benefits again of algorithmic evaluation of applications for loans and credit?

Number one, they’re really fast and that makes a difference. Number two, they let you go beyond standard data like FICO scores to get a more accurate assessment. And third, they allow you to get more money in the hands of people who will use it, and repay it wisely.

Real-time fraud detection

Real-time fraud detection is one of the most significant applications of data science in economics and banking and finance. Now, let’s begin by stating the obvious question,

what is fraud?

Well, fraud is a deliberate deception to secure unfair or unlawful gain. And if you want more specific information, the FBI describes 22 categories of fraud, including these among others: advance scheme fraud, credit card fraud, internet fraud, reverse mortgage scams, and others. And so, there are a lot of different ways that people can defraud another person. Some of these are more specific to online commerce and that’s where data science is usually going to come in.

Another question, by the way, is how common is fraud?

Well, if we look specifically at credit card fraud, approximately one-tenth of one percent of transactions are fraudulent, so one out of 1000. It’s a small number.

On the other hand, when you consider that there are about 100 billion credit card transactions per year, then 0.1 percent of those are going to be 100 million fraudulent transactions every year. And that’s not even counting the false positives. So, there’s a lot of action going on there in fraud detection.

And for the benefits of fraud detection, well, the obvious things are less money and less time is lost for consumers and then less money and less work time is lost for the lenders, and then for main social point of view, the web of trust, really an abstract thing, but the web of trust between lenders and borrowers and commercial organizations, that web address is strengthened.

Now, what kind of data goes into identifying fraud?

  • Well, there’s a lot of things that can go into it. Number one is the purchase amount. If you usually keep all your purchases under 20 dollars and suddenly there’s a 5000 dollar charge, something’s up.
  • The category of purchase. If you never pay for hotels and then suddenly three of them show up, that’s going to be an indication.
  • The time and place. Another side of the world, wrong time of the day. Frequency. Are there a lot of repeated transactions or are they trying them all at exactly the same time of day?
  • Also, the medium. Is the card present or was it done online or through the phone? Are they using a laptop? What operating system are they using?
  • There’s also biometrics, things like the speed of typing, the number of mistakes, or distinctive behaviors with a mouse. Potentially there are thousands of variables that can be used and you may want to rely on those if you think it’s going to get you improved accuracy within the constraints of time. In fact, that gets us to one of our very big trade-offs here.

Thank You. If it was helpful to you, share, comment, and like.

Dnyanesh Walwadkar

More content at Sign up for our free weekly newsletter. Get exclusive access to writing opportunities and advice in our community Discord.