What’s BERT and how does it benefit Financial Services?

Written on
byBarry Haddow

In this blog piece, Barry Haddow, head of NLP, introduces us to BERT and how this model allows human expertise and domain-specific intelligence to work together. Find out how we can further use NLP to offer our clients more insightful information and give them a competitive edge within the financial services industry.

 

If you were to join any academic NLP (Natural Language Processing) conference in the last two years, then you might be wondering if you had wandered into some strange Sesame Street related convention. First, there was ELMO, then there was BERT, then there was a family of BERT offspring: ALBERT, RoBERTa, CamenBERT (from France, obviously), AlBERTo and UMBERTO (Italian), and so on. BERT (and friends) have been used to improve web search (by Google)  as well as many tasks in natural language understanding.  So what exactly is BERT, and why is it so useful?

 

To understand BERT, we need to consider how the field of NLP has been revolutionised by neural networks (aka deep learning) since the early 2010s. Deep learning works by converting the task at hand to a complicated mathematical function (the neural network), which depends on many, many parameters. During “training” we find (learn) good values from these parameters, then we can use this set of parameters to apply the function on new data. For instance, suppose we want a system that can decide whether a film review is positive or negative. We construct a neural network which is able to ingest a review and produce a single number (say +1 or -1), then use a training set of reviews (with sentiment marked) to learn the parameters of this neural network. Once we are happy with the parameter set, we can use our neural network to decide the sentiment of any review. 

 

One detail missing from this, though, is how we actually input the text of a film review (or any text, for that matter) into a mathematical function. Doesn’t maths normally work with numbers? Well yes, so that means we need some way for converting words into numbers. That conversion is accomplished by an “embedding” which converts each of the words in the text into a long list of numbers. A good embedding should preserve relationships between words, so not only do we want “France” and “Italy” to have similar embeddings (since they’re both names of European countries) but the relationships Rome–Italy and Paris–France should be somehow expressed by the embeddings. 

 

Good embeddings can be learnt as part of the task learning process, but researchers realised early on that better neural networks could be created by using “pre-trained” embeddings. These were learnt using the huge quantities of text available in Wikipedia and in out-of-copyright books (for instance). The idea is that you come up with some auxiliary task, a neural network to do this task on this large set of data. For instance, you can train a network to predict the next word when given a prefix. The embeddings produced in training this network are general-purpose word embeddings which can be used in any NLP tasks. People trained sets of these word embeddings and then released them for anyone to download and use.

 

Whilst pre-trained word embeddings turned out to be very useful, they have an obvious flaw. Consider the sentences “I went to the bank to withdraw cash” and “I sat down at the riverbank” – both these sentences contain the word “bank”, but it is used in very different ways. Using a single embedding for both these uses of “bank” doesn’t seem like a very good idea. And in fact, natural language is full of these variations in meaning, generally much more subtle than this example, and a good set of embeddings should distinguish these. This problem led to the idea of a “contextual embedding”.

 

Using contextual embeddings means that words are never considered out-of-context. We train a neural network on some auxiliary task, for example, predict the next word, or to predict missing words, then we take the whole network and use it to initialise the training of the task we really care about. In other words, once you have your network trained on one of these auxiliary tasks, getting it to work on, say, film review classification is just a matter of “fine-tuning” it. The neural network learns good representations of English words and sentences in the initial training phase and learns to apply them to the task in the fine-tuning phase. 

 

BERT was a contextual embedding model released late in 2018 by researchers at Google. It was based on a transformer model, which is a particular type of neural network that is really effective at representing sentences (when trained in the right way). BERT was trained on two different tasks: predicting missing words, and predicting whether one sentence could plausibly follow another, in a text. BERT models were made available for download, in different sizes to suit your needs, and the released software and examples made it easy to use. This release of BERT spawned a whole industry of creating BERTs for other languages (such as the French CamenBERT) as well as BERTs optimised for efficient deployment (eg distillBERT) and domain-specific BERTs  (there’s even a FinBERT for financial reports). It even led to a whole field of study, BERTology.

 

At Aveni, we are very excited about the possibilities that BERT offers for improving NLP in the finance domain, and for speech analytics. One of the biggest issues we face in this area is the lack of labelled data. If we want to do text classification, entity extraction, sentiment analysis, or any other NLP task, then human-labelled data for our task and our domain is vital but expensive and tedious to collect. Using BERT and friends can drastically reduce the labelling requirements, allowing faster and cheaper development of new NLP-based applications… BERT allows an NLP practitioner to leverage the knowledge in billions of words of text in solving their particular problem. 

 

 

To learn more about what we offer, visit Aveni Detect

Find us on LinkedIn and Twitter

Other recent posts

Forging a fairer way forward after the FCA’s ‘Dear CEO’ letter to Retail Lenders

Forging a fairer way forward after the FCA’s ‘Dear CEO’ letter to Retail Lenders

The rising cost of living has led to lending firms facing an increasing number of vulnerable customers, as more borrowers struggle with their personal finance. Consumers are experiencing growing financial pressure leading to a...

5 ways Financial firms can improve consumer understanding

5 ways Financial firms can improve consumer understanding

Ensuring consumers are provided with the information they need, in a way they understand to make informed decisions about financial products and services is no longer optional. The FCA has made this clear in...

Aveni secures £2.75M investment to expand AI platform

Aveni secures £2.75M investment to expand AI platform

We’re excited to share, that we’ve secured an investment of £2.75 million to deliver the expansion of our Aveni Detect platform to the financial services and utility markets across the UK. The ground-breaking conversational...

How firms can apply Consumer Duty of Care to Existing Services

How firms can apply Consumer Duty of Care to Existing Services

It’s no longer news that the FCA announced the new Consumer Duty of Care (CDC) proposal to help provide a higher standard of service and fairer outcomes for customers. While it may be easier...

The big fat Aveni reunion!

The big fat Aveni reunion!

Working remotely comes with so many positives, but sometimes, you need a change from working by yourself all day and having such easy access to the fridge and all the calories that lie within....

7 critical actions to take from the FCA’s Consumer Duty Consultation and how speech analytics can help achieve them

7 critical actions to take from the FCA’s Consumer Duty Consultation and how speech analytics can help achieve them

Financial Service firms need to rely on more than just human resources to implement the new FCA’s Customer Duty of Care rules and guidelines by April 2023. Deloitte recommended the following critical actions, and...

Why Board Members Should Prepare for the Consumer Duty of Care Changes

Why Board Members Should Prepare for the Consumer Duty of Care Changes

The FCA’s new Consumer Duty of Care proposals have been causing a bit of controversy since they were first announced in December last year. Although the aim is to help provide a higher standard...

Ramadan Diaries with Hayfa Bukhari

Ramadan Diaries with Hayfa Bukhari

In this blog, I explain what Ramadan means to me and what a typical day looks like during this time. I’ve written this blog before Ramdan starts so it gives you an idea of...

How Your Business Can Prepare for the FCA’s New Consumer Duty of Care Proposals

How Your Business Can Prepare for the FCA’s New Consumer Duty of Care Proposals

If you’re struggling to understand how to show regulators that your business is compliant following the FCA’s recent Consumer Duty of Care proposals, you’re absolutely not the only one.    Aimed at bringing in...