Dataflows and Data Products in Financial Services


Executive Summary

The volume and velocity in novel data creation is surpassing the rate that legacy data warehouses can onboard, integrate, and extract insight out of. As a result, innovation is constantly bogged down with inertia, and getting things done becomes a distant aspiration.

What is required is a new way of thinking of data as a product, and offering the responsibility of that product’s creation, curation and circulation to the people that have the most expertise about it.

I’ll attemp to provide a way out from these blockers, by looking at a model of decentralised responsibility for creating data products in their appropriate domains. We’ll also look at how to scale this process to all the product teams in the business, by building advance analytics teams. Finally, how to keep all of it under control, with governance and monitoring, so it doesn’t collapse under its own increased entropy.

The outcome of this journey would be a company with autonomous teams that can create human-in-the-loop, algorithmic, and scalable decision-making platforms.

NB. my professional experience is mostly within asset management companies, so that will be the vantage point I’ll be taking in this analysis.


Most of the dataflow requirements that traditional asset management companies dealt with for the last 30–40 years have been stable and well-defined. That would be, primarily, price information on securities, portfolio holdings, benchmark constituents, risk metrics, and some variety of instruments’ metadata. There are others of course, but just keeping things simple for now.

It’s also fair to say that most investment management data systems are highly focused on batch operations for end of day (EOD) reporting, and Order Management Systems (OMSs).

There’s a high commonality in how investment teams are structured. In the top part of the graph below you can see a typical organisation structure. That is: a CEO with a direct report from a regional CEO or Chief Investment Officer (CIO), and then asset class teams, as subject matter experts (SME), reporting to the latter.

The last decade or so has brought increased visibility to the role of the Quantitative analytics teams as standalone groups within the Investment Vector teams. They are highly focused, financial data scientists, whose work offers investment insights drawn from large volumes of data. For this reason, each asset class, Equities (EQ), Fixed Interest (FI), Multi-asset/Solutions (MAS), etc. recruits and builds up their own team of these SMEs.

The primary requirement these teams all eventually have is for availability of current EOD and historical datasets, with a good level of Governance and Quality Control. The fact of the matter is though, that unless the business was originally constructed as a technology business, then technology is an afterthought in most cases, or a buy-vs-build decision from a requirements matrix. There are real and hard, budgetary and temporal constraints, in how businesses make these early decisions, so I’m not really here to throw blame at anyone. Insofar these Quant teams have filled the capability gaps that the OMSs have not been able to, and have acted as the glue between the myriad steps that are involved in the investment management processes.

The fact of the matter though is that most Financial Services companies treat Data & Technology (D&T) as a cost centre. As something that needs to be minimised, and not as the value generator that it is; or should’ve been. And this thinking leads to a proliferation of constraints in decision-making, that make it really hard to get things done.

Regardless of its AUM size, product shelf depth, or global distribution of teams, every asset management company should be able to answer the following cross-cutting questions, and minimise its lead time to decision:

  • What is the Investment Value at Risk (VaR) of the entire company investment book, and what are the top 10 risky positions or countries/sectors/factors of book risk?
  • What is the current Investment revenue VaR, i.e. what puts most revenue at risk? Where is poor performance coming from in the entire investment book? What is the combined effect of underperformance and client concentration?
  • What is the company’s revenue contribution correlation matrix? How much of what we do could all go south together?
  • In light of recent exogenous shock events, like the Russian invasion of Ukraine, what it a truly holistic company view on 1st and 2nd order exposure to Russian holdings/income, and how frequently does your intelligence view update per day/week?
  • Given that many companies have pledged a de-carbonisation of their whole investment books by 2030/50, how comprehensive is the total carbon intensity exposure across their entire investment book? How you maintain that trend, and at the same time avoid missing sustainable opportunities, will probably need an entire new post!

In order to offer answers to the questions above, the business needs to scale the analytical resource quotas they employ, and create bespoke business logic that offers an Insight-as-a-Service framework to all these different kinds of decision needs.

The way this is made extra difficult, is by the quant SMEs (a major analytical resource) working in isolation, and not sharing the burdens of data acquisitions, cleansing, and transformation across the other asset class desks. There are some legacy reasons for this of course, e.g. the EQ teams tended to have quite different D&T vendors from the FI teams, so that formed “natural” silos in how teams collaborated. This behaviour tends to get exacerbated when there is also a history of M&A integrations.

This legacy, “stable” model has been disrupted, in the last 5 years I would say, by the rise of capital inflows into ESG and Sustainable investing within AM companies. Because of its all-encompassing nature, that is by treating all securities as an issuance of a final parent disclosing company, it has introduced many new data sets commonly required by more and more asset classes.

This is increasing complexity for the business when trying to offer centralised, curated, and easily available current and historical datasets across the investment domains. Quant SME teams are left to their own devices to try and offer rapid cycles of new product creation, in a highly competitive, first pass the mark market.

In addition to all these constraints within developed markets, there are also supplementary, and most of the time obscure, complexities and constraints when trying to address emerging markets, or more occluded markets, e.g. Onshore, Chinese, sovereign-owned, special purpose /financing vehicles, or Sustainable investing raw data in Private Assets.

All these data landscapes need to be well understood and productionised, before even exporing niche areas, such as alternative and unstructured data, or even blockchain technologies.

Socio-technical change map. Before(Above) and After(Below)


The strategic policies to address the above diagnosis really boils down to a couple points.

Knowledge creators and SMEs need to be given capabilities and agency to innovate on a platform that solves two main constraints; data access and compute.

“Software creates ecosystems. Software has zero marginal costs. Software improves over time. Software offers infinite leverage. Software enables zero transaction costs.” Ben Thompson

The business needs to invest in levers of scalability:

1. Data abstraction via scalable, inexpensive, centralised cloud storage.

2. Work abstraction via software; which gives us business logic fungibility and infinite leverage.

To offer this leverage you need an apt pivot point. That pivot would be the building of a Modern Data Platform.


The specific actions to implement the above policies are as follows:

  1. Place a cloud Datalake at the centre of the company’s data strategy. All core data, i.e. data that is essential for the Investment Value chains, are first and always landed in the Datalake before flowing downstream. Historical and EOD data products are curated from these raw datasets.
  2. Standardise your platform toolkit to 1–2 programming languages at most, to gain the benefits of leveraged, collaborative sharing across all SMEs. The industry favourites at this point in time are Python and SQL. The first is more focused on Investment process/manufacturing via Advanced Analytics workflows, e.g. ML/AI/BigData, and the latter is best used within Business Inteligence and reporting analytics. Both are mature and with a broad and expanding employment pool of skilled people.
  3. Create a centre of excellence group [the Advanced Analytics team in the graph above] that will take ownership of the following dimensions of this capability:
  • Design and build the Modern Data Platform, based on the partner Datalake provider’s managed services.
  • Develop the acceleration toolkits that provide SME teams with easy, and secure, programmatic data access to the Datalake, e.g. Python/SQL API libraries.
  • Embed temporary Advanced Analytics squads into SME teams, which do not yet have the Analytics/Engineering resources to develop their own data products using the acceleration tookits, in order to help build and handover ownership of the new data products to the newly onboarded data analysts.
  • Finally, train and support all SMEs and knowledge creators that want to build their own data products, or run their own data analysis within the visualisation layer, e.g. explore data products in Power BI/Tableau for MI Reports.

The cost of hiring and equipping such a team can be net-zero, against the long-term reduction in costs that it can deliver. There are substantial long-term benefits from day-rate expensive quant SMEs not doing valuable work, or long lead times to market for FS products. This model offers a reduction in both metrics.

Such a team will have its Objectives & Key Results (OKR) fully aligned with the OKRs of the product teams that it supports, either via the acceleration tookits, embedded data product teams, or training.


One of the challenges of this re-structuring is how to align this new Advanced Analytics team to the existing org structure. This team would most naturally report to a Chief Data Officer (CDO) or a Group/Vector CTO. This allows the Engineering teams to have a seat at the Executive Team and represent technology to the wider business.

There should be a strong preference of a carrot over stick mentality when proposing to adopt these changes. A good leader allows for people to follow rather than force change. Focus on making amuse bouche rather than foie gras.

The Advanced Analytics team needs to quickly focus on a maximal value data product, and scope a Minimal Viable Product (MVP). It will then proceed to interactively create marginal value increments to offer to the consumers. The final MVP will provide a validation of the value chain to the consumers of it. The more useful this MVP is, the more championing its consumers will feed back to the executive team.

Measuring success

Focus on maximising the number of consumers of the toolkit, and via that, the data products curated in the datalake. Carefully curated data products, i.e. a holistic, common, semantic data model across all data vendors, increases the impact you have at the top of the value chain.

Focus on reducing the steps and complexity between the consumers of the data products, i.e., the Quant teams, getting the data products they need fromthe datalake. Offer a polyglot access patter, i.e. both Python and SQL, for the most popular data products to maximise adoption usage.

Minimise the time to decision. Every time an analyst is asked a question that drives a decision, a timer is set off. The time it takes from that first query to the point where the questioner can make a decision, should be actively monitored and minimised.

Finally, a Net Promoter Score should be collected and monitored via Pulse surveys at quarterly/semi-annual frequency. This will allow measurement of the indirect impact to happiness and simplicity the platform is offering to its customers.


In this exposition I’ve tried to lay out the view from the “code-face” [sic] with regard to the complexities of offering advanced analytical insights in a field that is rapidly expanding.

Even though the ideas I’m proposing aren’t novel and others have formulated and explained them much better than I have, my view is none-the-less from the vantage point of the day-to-day operations of large, institutional asset managers. The idea of moving from Data Warehouses (1990+) to Datalake (2010+) is probably 15–20 years old already, and the idea of a Distributed Data Mesh was introduced in 2019.

There is still however, a competitive advantage to be exploited here. The Modern Data Platform isn’t offered as a purchasable, end-to-end, SaaS product; not yet anyway! There’s no doubt this is where most of the fin/tech companies are working towards, and there are tectonic forces of land grabbing happening as we speak.

However, for the large asset management company that can invest the budget to recruit and empower these niche, expert teams, it is still possible to build this platform inhouse. This will lead them ahead of the competition being able to buy it as a SaaS product, and get first past the post to reap the benefits of harvesting opportunities more effectively.

This is an existential differentiator that will help a business survive the oncoming torrents of brutal competition on product prices and marginal profits.