What’s BERT and how does it benefit Financial Services?

Written on
byBarry Haddow
Auto QA

In this blog piece, Barry Haddow, head of NLP, introduces us to BERT and how this model allows human expertise and domain-specific intelligence to work together. Find out how we can further use NLP to offer our clients more insightful information and give them a competitive edge within the financial services industry.


If you were to join any academic NLP (Natural Language Processing) conference in the last two years, then you might be wondering if you had wandered into some strange Sesame Street related convention. First, there was ELMO, then there was BERT, then there was a family of BERT offspring: ALBERT, RoBERTa, CamenBERT (from France, obviously), AlBERTo and UMBERTO (Italian), and so on. BERT (and friends) have been used to improve web search (by Google)  as well as many tasks in natural language understanding.  So what exactly is BERT, and why is it so useful?


To understand BERT, we need to consider how the field of NLP has been revolutionised by neural networks (aka deep learning) since the early 2010s. Deep learning works by converting the task at hand to a complicated mathematical function (the neural network), which depends on many, many parameters. During “training” we find (learn) good values from these parameters, then we can use this set of parameters to apply the function on new data. For instance, suppose we want a system that can decide whether a film review is positive or negative. We construct a neural network which is able to ingest a review and produce a single number (say +1 or -1), then use a training set of reviews (with sentiment marked) to learn the parameters of this neural network. Once we are happy with the parameter set, we can use our neural network to decide the sentiment of any review. 


One detail missing from this, though, is how we actually input the text of a film review (or any text, for that matter) into a mathematical function. Doesn’t maths normally work with numbers? Well yes, so that means we need some way for converting words into numbers. That conversion is accomplished by an “embedding” which converts each of the words in the text into a long list of numbers. A good embedding should preserve relationships between words, so not only do we want “France” and “Italy” to have similar embeddings (since they’re both names of European countries) but the relationships Rome–Italy and Paris–France should be somehow expressed by the embeddings. 


Good embeddings can be learnt as part of the task learning process, but researchers realised early on that better neural networks could be created by using “pre-trained” embeddings. These were learnt using the huge quantities of text available in Wikipedia and in out-of-copyright books (for instance). The idea is that you come up with some auxiliary task, a neural network to do this task on this large set of data. For instance, you can train a network to predict the next word when given a prefix. The embeddings produced in training this network are general-purpose word embeddings which can be used in any NLP tasks. People trained sets of these word embeddings and then released them for anyone to download and use.


Whilst pre-trained word embeddings turned out to be very useful, they have an obvious flaw. Consider the sentences “I went to the bank to withdraw cash” and “I sat down at the riverbank” – both these sentences contain the word “bank”, but it is used in very different ways. Using a single embedding for both these uses of “bank” doesn’t seem like a very good idea. And in fact, natural language is full of these variations in meaning, generally much more subtle than this example, and a good set of embeddings should distinguish these. This problem led to the idea of a “contextual embedding”.


Using contextual embeddings means that words are never considered out-of-context. We train a neural network on some auxiliary task, for example, predict the next word, or to predict missing words, then we take the whole network and use it to initialise the training of the task we really care about. In other words, once you have your network trained on one of these auxiliary tasks, getting it to work on, say, film review classification is just a matter of “fine-tuning” it. The neural network learns good representations of English words and sentences in the initial training phase and learns to apply them to the task in the fine-tuning phase. 


BERT was a contextual embedding model released late in 2018 by researchers at Google. It was based on a transformer model, which is a particular type of neural network that is really effective at representing sentences (when trained in the right way). BERT was trained on two different tasks: predicting missing words, and predicting whether one sentence could plausibly follow another, in a text. BERT models were made available for download, in different sizes to suit your needs, and the released software and examples made it easy to use. This release of BERT spawned a whole industry of creating BERTs for other languages (such as the French CamenBERT) as well as BERTs optimised for efficient deployment (eg distillBERT) and domain-specific BERTs  (there’s even a FinBERT for financial reports). It even led to a whole field of study, BERTology.


At Aveni, we are very excited about the possibilities that BERT offers for improving NLP in the finance domain, and for speech analytics. One of the biggest issues we face in this area is the lack of labelled data. If we want to do text classification, entity extraction, sentiment analysis, or any other NLP task, then human-labelled data for our task and our domain is vital but expensive and tedious to collect. Using BERT and friends can drastically reduce the labelling requirements, allowing faster and cheaper development of new NLP-based applications… BERT allows an NLP practitioner to leverage the knowledge in billions of words of text in solving their particular problem. 



To learn more about what we offer, visit Aveni Detect

Find us on LinkedIn and Twitter

Other recent posts


Demonstrating Consumer Duty Compliance with Technology - Key Takeaways from Aveni’s recent webinar

In our latest Consumer Duty webinar series,”Demonstrating Consumer Duty Compliance with Technology,” Joseph Twigg, CEO of Aveni, sat down with John Liver, Strategic Adviser at Kore and NED at Barclays, and Alan Blanchard, Head...

Aveni SPW

Schroders Personal Wealth adopts AI-based Aveni Detect platform to transform compliance function

Aveni, has been selected by Schroders Personal Wealth (SPW) to transform its compliance function. Through the deployment of the Aveni Detect platform SPW will use the latest advances in Natural Language Processing (NLP) to...


Human in the loop 101: what is it and why is it so important?

Financial services firms have been turning to Natural Language Processing (NLP) solutions to extract valuable insights from vast amounts of unstructured data. But even the most advanced algorithms can’t match the intuition and creativity...

Screenshot 2023-02-15 at 11.53.52

'Dear CEO' letter: The FCA highlights gaps in Consumer Duty Implementation plans

The FCA has sent its ‘Dear CEO’ letters to 8 sectors in the financial services industry, providing guidance on how to effectively implement and embed the Duty. The letter outlines four key components:   ...

Analyst working with computer in Business Analytics and Data Management System to make report with KPI and metrics connected to database. Corporate strategy for finance, operations, sales, marketing.

The top 5 things financial services CROs are prioritising

As the financial services industry continues to evolve, risk and compliance functions are facing unique challenges. At the forefront of this evolution, Chief Risk Officers (CROs) are working tirelessly to ensure that their organisations...

Auto QA

Auto QA - the key to compliance and risk reduction

We’re all aware that Quality Assurance is a critical aspect of managing risk in the financial services industry. Without it, firms would have no idea if they were adhering to regulatory requirements. However, with...

consumer duty resources

5 top RegTech trends Chief Risk Officers need to be on top of in 2023

As we progress into the new year, it’s important for Chief Risk Officers (CROs) to keep an eye on the latest RegTech trends. By adopting the right technologies, CROs can ensure that their organisation...


A quick history of Natural Language Processing 

Natural Language Processing (NLP) is a field of artificial intelligence that involves the use of algorithms, statistical models, and other techniques to analyse, understand, and generate human language. NLP has a wide range of...


Aveni’s 2022 Consumer Duty Resources

As the year comes to an end, it’s a good time to reflect on the events of the past 12 months, both the good and the not-so-good. For both the Financial Services industry and...