Financial Services

What’s BERT and how does it benefit Financial Services?

4 min read

In this blog piece, Barry Haddow, head of NLP, introduces us to BERT and how this model allows human expertise and domain-specific intelligence to work together. Find out how we can further use NLP to offer our clients more insightful information and give them a competitive edge within the financial services industry.


If you were to join any academic NLP (Natural Language Processing) conference in the last two years, then you might be wondering if you had wandered into some strange Sesame Street related convention. First, there was ELMO, then there was BERT, then there was a family of BERT offspring: ALBERT, RoBERTa, CamenBERT (from France, obviously), AlBERTo and UMBERTO (Italian), and so on. BERT (and friends) have been used to improve web search (by Google)  as well as many tasks in natural language understanding.  So what exactly is BERT, and why is it so useful?


To understand BERT, we need to consider how the field of NLP has been revolutionised by neural networks (aka deep learning) since the early 2010s. Deep learning works by converting the task at hand to a complicated mathematical function (the neural network), which depends on many, many parameters. During “training” we find (learn) good values from these parameters, then we can use this set of parameters to apply the function on new data. For instance, suppose we want a system that can decide whether a film review is positive or negative. We construct a neural network which is able to ingest a review and produce a single number (say +1 or -1), then use a training set of reviews (with sentiment marked) to learn the parameters of this neural network. Once we are happy with the parameter set, we can use our neural network to decide the sentiment of any review. 


One detail missing from this, though, is how we actually input the text of a film review (or any text, for that matter) into a mathematical function. Doesn’t maths normally work with numbers? Well yes, so that means we need some way for converting words into numbers. That conversion is accomplished by an “embedding” which converts each of the words in the text into a long list of numbers. A good embedding should preserve relationships between words, so not only do we want “France” and “Italy” to have similar embeddings (since they’re both names of European countries) but the relationships Rome–Italy and Paris–France should be somehow expressed by the embeddings. 


Good embeddings can be learnt as part of the task learning process, but researchers realised early on that better neural networks could be created by using “pre-trained” embeddings. These were learnt using the huge quantities of text available in Wikipedia and in out-of-copyright books (for instance). The idea is that you come up with some auxiliary task, a neural network to do this task on this large set of data. For instance, you can train a network to predict the next word when given a prefix. The embeddings produced in training this network are general-purpose word embeddings which can be used in any NLP tasks. People trained sets of these word embeddings and then released them for anyone to download and use.


Whilst pre-trained word embeddings turned out to be very useful, they have an obvious flaw. Consider the sentences “I went to the bank to withdraw cash” and “I sat down at the riverbank” – both these sentences contain the word “bank”, but it is used in very different ways. Using a single embedding for both these uses of “bank” doesn’t seem like a very good idea. And in fact, natural language is full of these variations in meaning, generally much more subtle than this example, and a good set of embeddings should distinguish these. This problem led to the idea of a “contextual embedding”.


Using contextual embeddings means that words are never considered out-of-context. We train a neural network on some auxiliary task, for example, predict the next word, or to predict missing words, then we take the whole network and use it to initialise the training of the task we really care about. In other words, once you have your network trained on one of these auxiliary tasks, getting it to work on, say, film review classification is just a matter of “fine-tuning” it. The neural network learns good representations of English words and sentences in the initial training phase and learns to apply them to the task in the fine-tuning phase. 


BERT was a contextual embedding model released late in 2018 by researchers at Google. It was based on a transformer model, which is a particular type of neural network that is really effective at representing sentences (when trained in the right way). BERT was trained on two different tasks: predicting missing words, and predicting whether one sentence could plausibly follow another, in a text. BERT models were made available for download, in different sizes to suit your needs, and the released software and examples made it easy to use. This release of BERT spawned a whole industry of creating BERTs for other languages (such as the French CamenBERT) as well as BERTs optimised for efficient deployment (eg distillBERT) and domain-specific BERTs  (there’s even a FinBERT for financial reports). It even led to a whole field of study, BERTology.


At Aveni, we are very excited about the possibilities that BERT offers for improving NLP in the finance domain, and for speech analytics. One of the biggest issues we face in this area is the lack of labelled data. If we want to do text classification, entity extraction, sentiment analysis, or any other NLP task, then human-labelled data for our task and our domain is vital but expensive and tedious to collect. Using BERT and friends can drastically reduce the labelling requirements, allowing faster and cheaper development of new NLP-based applications… BERT allows an NLP practitioner to leverage the knowledge in billions of words of text in solving their particular problem. 



To learn more about what we offer, visit Aveni Detect

Find us on LinkedIn and Twitter

Related posts

Adviser productivity
The landscape of mergers and acquisitions (M&A) in wealth management is undergoing significant changes, driven largely by evolving regulatory scrutiny. In a recent webinar, Jana Sivananthan, CRO at 7IM along...
Financial Services
Just four months into 2024 and the FCA has been relentless at proving that Consumer Duty is every bit the ‘TCF with teeth’ the industry expected it to be. Companies...
The financial services (FS) industry is steeped in complexity and ever-evolving regulations. From process inefficiencies to outdated legacy systems that require manual data input that hasn’t been maintained to a...
What is the EU AI Act: the key takeaways   The December 2023 EU AI Act is the first comprehensive legal framework for AI in the world. It aims to...
Consumer Duty
Based on current trends and strategic planning, here are the top 5 regulatory developments the Financial Conduct Authority (FCA) will prioritise in 2024 and everything you need to know about...
We know that there’s a lot to come in the next twelve months. That’s why we asked a popular chatbot what it predicts to be the top 5 generative AI...
Adviser productivity
Cavendish Online, part of Lloyds Banking Group, has partnered with, the Artificial Intelligence fintech business, to become one of the first protection distributors in the market to use AI...
Financial Services
Nothing gets us through the day better than a podcast. That’s why we’ve put together this list of our top 5 financial services podcasts to add to your playlist for...
Adviser productivity
In a perfect world, financial advisers would only spend their time providing personalised financial advice to their clients. But, alas, we’re in the real world, where a good chunk of...
Consumer Duty
Navigate the uncertainties of Consumer Duty with our free Consumer Duty Board Report Template.   We all know that the Financial Conduct Authority’s (FCA) Consumer Duty regulations are now in...
Like us, you’ve probably noticed that generative AI is causing a productivity revolution. In order for this to be successful, your business needs to adopt domain specific solutions built for...
Adviser productivity
It wasn’t so long ago that the thought of receiving financial–or any–advice from a machine would feel like something from a film, and likely heavily caveated with warnings about the...

Aveni’s platform uses the latest in NLP to transform productivity and risk oversight.

Scale compliance at a fraction of the cost

Cut financial advice admin from hours to minutes with Aveni’s AI assisitant

Aveni Assist

Get up and running with Aveni Assist and how it can help transform productivity and compliance. 

Aveni Detect

Get up and running with Aveni Detect and how it can help transform productivity and compliance. 

Read the latest articles from Aveni

Access our latest whitepapers, webinars, brochures and more

Jargon-bust your way to a better understanding of all things AI