What’s BERT and how does it benefit Financial Services?

Written on
byBarry Haddow

In this blog piece, Barry Haddow, head of NLP, introduces us to BERT and how this model allows human expertise and domain-specific intelligence to work together. Find out how we can further use NLP to offer our clients more insightful information and give them a competitive edge within the financial services industry.


If you were to join any academic NLP (Natural Language Processing) conference in the last two years, then you might be wondering if you had wandered into some strange Sesame Street related convention. First, there was ELMO, then there was BERT, then there was a family of BERT offspring: ALBERT, RoBERTa, CamenBERT (from France, obviously), AlBERTo and UMBERTO (Italian), and so on. BERT (and friends) have been used to improve web search (by Google)  as well as many tasks in natural language understanding.  So what exactly is BERT, and why is it so useful?


To understand BERT, we need to consider how the field of NLP has been revolutionised by neural networks (aka deep learning) since the early 2010s. Deep learning works by converting the task at hand to a complicated mathematical function (the neural network), which depends on many, many parameters. During “training” we find (learn) good values from these parameters, then we can use this set of parameters to apply the function on new data. For instance, suppose we want a system that can decide whether a film review is positive or negative. We construct a neural network which is able to ingest a review and produce a single number (say +1 or -1), then use a training set of reviews (with sentiment marked) to learn the parameters of this neural network. Once we are happy with the parameter set, we can use our neural network to decide the sentiment of any review. 


One detail missing from this, though, is how we actually input the text of a film review (or any text, for that matter) into a mathematical function. Doesn’t maths normally work with numbers? Well yes, so that means we need some way for converting words into numbers. That conversion is accomplished by an “embedding” which converts each of the words in the text into a long list of numbers. A good embedding should preserve relationships between words, so not only do we want “France” and “Italy” to have similar embeddings (since they’re both names of European countries) but the relationships Rome–Italy and Paris–France should be somehow expressed by the embeddings. 


Good embeddings can be learnt as part of the task learning process, but researchers realised early on that better neural networks could be created by using “pre-trained” embeddings. These were learnt using the huge quantities of text available in Wikipedia and in out-of-copyright books (for instance). The idea is that you come up with some auxiliary task, a neural network to do this task on this large set of data. For instance, you can train a network to predict the next word when given a prefix. The embeddings produced in training this network are general-purpose word embeddings which can be used in any NLP tasks. People trained sets of these word embeddings and then released them for anyone to download and use.


Whilst pre-trained word embeddings turned out to be very useful, they have an obvious flaw. Consider the sentences “I went to the bank to withdraw cash” and “I sat down at the riverbank” – both these sentences contain the word “bank”, but it is used in very different ways. Using a single embedding for both these uses of “bank” doesn’t seem like a very good idea. And in fact, natural language is full of these variations in meaning, generally much more subtle than this example, and a good set of embeddings should distinguish these. This problem led to the idea of a “contextual embedding”.


Using contextual embeddings means that words are never considered out-of-context. We train a neural network on some auxiliary task, for example, predict the next word, or to predict missing words, then we take the whole network and use it to initialise the training of the task we really care about. In other words, once you have your network trained on one of these auxiliary tasks, getting it to work on, say, film review classification is just a matter of “fine-tuning” it. The neural network learns good representations of English words and sentences in the initial training phase and learns to apply them to the task in the fine-tuning phase. 


BERT was a contextual embedding model released late in 2018 by researchers at Google. It was based on a transformer model, which is a particular type of neural network that is really effective at representing sentences (when trained in the right way). BERT was trained on two different tasks: predicting missing words, and predicting whether one sentence could plausibly follow another, in a text. BERT models were made available for download, in different sizes to suit your needs, and the released software and examples made it easy to use. This release of BERT spawned a whole industry of creating BERTs for other languages (such as the French CamenBERT) as well as BERTs optimised for efficient deployment (eg distillBERT) and domain-specific BERTs  (there’s even a FinBERT for financial reports). It even led to a whole field of study, BERTology.


At Aveni, we are very excited about the possibilities that BERT offers for improving NLP in the finance domain, and for speech analytics. One of the biggest issues we face in this area is the lack of labelled data. If we want to do text classification, entity extraction, sentiment analysis, or any other NLP task, then human-labelled data for our task and our domain is vital but expensive and tedious to collect. Using BERT and friends can drastically reduce the labelling requirements, allowing faster and cheaper development of new NLP-based applications… BERT allows an NLP practitioner to leverage the knowledge in billions of words of text in solving their particular problem. 



To learn more about what we offer, visit Aveni Detect

Find us on LinkedIn and Twitter

Other recent posts

CRO Consumer Duty survey findings: Customers at the heart of your business? Now is the time to prove it.

CRO Consumer Duty survey findings: Customers at the heart of your business? Now is the time to prove it.

We surveyed +80 Senior Compliance and Risk Officers from across the UK’s Financial Services sector to find out their top priorities and investment plans for Consumer Duty and how they’ll meet the tightening data-driven...

Aveni’s end of the year Reunion!

Aveni's end of the year Reunion!

Aveni works at the forefront of innovation in AI and Natural Language Processing, providing quality assurance and regulatory compliance solutions for financial services and utilities companies in the UK. However, that’s not all we...

Adoption of machine learning and data driven technologies at the heart of human centred advice

Adoption of machine learning and data driven technologies at the heart of human centred advice

This is an excerpt from our Human+ whitepaper. Download here for full details about how data driven technologies can power human centred advice.   The recent acceleration in digitisation of financial advice combined with advances...

executive understanding ai

AI: Why an executive understanding is so important’  - 5 key takeaways from Aveni Labs webinar

Data-driven technologies underpinned by rapidly evolving AI, are set to be placed at the heart of firms’ operating models. It emphasises the need for Financial Services Executives to have a clear understanding of AI,...

Record calls consumer duty

The FCA emphasises the importance of recording customer calls to meet Consumer Duty requirements

In the FCA’s recent Consumer Duty Retail Lending webinar, Jonathan Phelan, Head of Department, Consumer Finance at the FCA urged firms to record customer interactions as a critical step in being able to monitor...

financial advice data capturing

Driving Automation in the Financial Advice Industry with Data Capturing

The present day financial advice industry has seen little innovation since the turn of the last decade, but is now entering a period where it will be fundamentally disrupted.    The industry has relied...

Colin Clark Aveni Chairman

Investment heavyweight Colin Clark appointed as chair of AI fintech business Aveni

Aveni.ai has appointed investment heavyweight Colin Clark as Chairman of its newly formed Board. The strategic appointment comes in the wake of a recent successful funding injection of £2.75 million to grow market share...

financial services space

Aveni's busy week: FinTech Award Win, HomeGame 2 and JP Morgan Innovation Events!

Last week was a busy one for Aveni. From winning a Scottish Financial Technology award to being a part of two exciting and impactful events with JP Morgan and the langcat, we loved getting...

Impact of AI

Human+: Human-in-the-Loop AI

AI is already changing our lives; from expert systems which predict the weather and the stock market, to facial recognition and internet search results. Its application is growing more extensive. Some uses of AI...