How low-resource Natural Language Processing is making Speech Analytics accessible to industry

6 min read

Natural Language Processing (NLP) is a subset of artificial intelligence that enables machines to understand, process and analyse natural language in the way that humans will. The machine analyses data, interprets, measures sentiment and provides the intended inference from it. The data used for Natural Language Processing (and other forms of machine learning) may be labelled. Labelled data is data with predefined tags that provides information that the machine can learn from. This process is called supervised learning. A simple example of labelled data is the bio data of customers with labels indicating that the strings of letters with an ‘@’ symbol is their email address, the two digit numbers is their age, the images are their passport photos, etc. However, with unlabelled data, there aren’t such tags and the machine has to categorise or cluster the data attributes with similar patterns. This process is known as unsupervised learning. 


Natural Language Processing has achieved remarkable progress in the past decade on the basis of neural models. Using large amounts of labelled data can help achieve state-of-the-art performance for tasks such as sentiment detection, Named Entity Recognition (NER), Natural Language Inference (NLI) or question-answering. For these tasks, the labels or tags would be the sentiment of a review, or the people, places or organisations mentioned in the text. However, the dependence on labelled data prevents NLP models from being applied to low-resource settings because of the time, money, and expertise that is often required to label large amounts of textual data.  


We’re going to take a look at recent advances in NLP, which allow deep learning models to learn from very few examples. This is crucial for speech analytics where labelled examples are often in very short supply. 


Pre-train and Fine-tune


In the last few years, a new paradigm – pre-train and fine-tune – has emerged, which allows us to leverage large amounts of unlabelled data for NLP. The premise is that perhaps it’s better to first learn to model the language itself, then once we have a model that understands the language, we can share this knowledge with the many different tasks we care about by fine-tuning it on small amounts of labelled data. Language modelling is a machine learning task where the model needs to learn how to predict a missing word given the context of the rest of the sentence. This is a generic task with abundant naturally occurring data and can be used to pre-train such a generic model.


Arguably, the model that kick-started this trend was the Bidirectional Encoder Representations from Transformers (BERT) model. BERT is a transformer-based machine learning technique for pre-training developed by Google. The model takes input sentences where some words are masked out, and the task is to predict the masked words. The thing that really set BERT apart was the ease of fine-tuning. BERT is cleverly designed so that it’s easy to do this for lots of different tasks. You can download BERT pre-trained on a large English corpus like the BooksCorpus, and then for your task, you fine-tune BERT on labelled data. You can add a task-specific “head” onto BERT to create a new architecture for your task. This approach has led to huge improvements over state-of-the-art, providing a nice off-the-shelf solution to standard problems. 


Let’s  look at some other ways of using pre-trained language models:




Recently, researchers realised that an alternative paradigm would be to make the final task look more like language modelling. This would mean a fine-tuning step won’t be needed. It would also mean that we’re potentially able to perform new downstream tasks with little or no labelled data. This paradigm is called pre-train and prompt



The first pre-train and prompt paper, which showed the potential of this approach, was published in 2020 by Google (Raffel et al. 2020). They suggested a unified approach to transfer learning in Natural Language Processing with the goal of setting a new state-of-the-art in the field. To this end, they treated all NLP problems as a “text-to-text” problem. Such a framework allows using the same model, objective, training procedure, and decoding process for different tasks, including summarisation, sentiment analysis, question answering, and machine translation. The researchers call their model a Text-to-Text Transfer Transformer (T5) and train it on the large corpus of web-scraped data to get state-of-the-art results on several NLP tasks. The way to make all NLP tasks text-to-text is by selecting the appropriate prompts. This is so that the pre-trained LM itself can be used to predict the desired output, sometimes even without any additional task-specific training. This allows few-shot (learning from only a few examples of labelled data) and even zero-shot (generalising to a new task with no examples of labelled data) behaviour.




In this example, we see a prompt that takes a prompting function to generate a sentence where the language model needs to predict Z, which in this case, we would expect to be a positive sentiment. This allows us to directly use the language model for a specific task, sentiment detection. 


There are many different possible tasks that language models can perform: 



Here we can see examples of different prompts for different tasks. T5 was applied to several benchmarks and surpassed previous state-of-the-art results across many different individual Natural Language Processing tasks. T5 caused great interest in prompting and since then various improvements and challenges have been identified. 


Finding good prompts is difficult, and recent work has focussed on finding them automatically. Another active research question is how and when to train a model with prompts. 


Working with large language models is also challenging. Large language model size has been increasing 10x every year for the last few years. This road leads to diminishing returns, higher costs, more complexity, and new risks. Downsizing efforts are also underway in the Natural Language Processing community, using transfer learning techniques such as knowledge distillation which trains a smaller student model that learns from the original model. This student model can then be used for more efficient inference eg. DistilBERT (Sanh et al. 2019).


Data Augmentation


Data augmentation is a set of techniques to artificially increase the amount of data by generating new copies from existing data. This includes making small changes to data or using deep learning models to generate new data points. Data augmentation techniques make machine learning models more robust by creating variations that the model may see in the real world. It is widely used in image processing, and augmenting textual data is more difficult due to the complexity and structure of a language. Common methods for data augmentation in Natural Language Processing are:


Token level augmentation 


Takes existing data and creates new examples by adding variety at the word level. Common augmentations would be synonym replacement, word insertion, word swap and word deletion. 


Sentence level augmentation 


Takes existing data and creates new examples by replacing whole sentences. A popular method here is back translation where for example an English sentence is translated into German, and then re-translated back into English. Another method is applying paraphrasing models to original texts. However, of particular interest to us, in the context of prompting, is that we can use a large pretrained language model to generate new examples from prompts of existing instances. GPT3Mix (Yoo et al. 2021) is a prompt-based approach that doesn’t require fine-tuning: a prompt is constructed using a few sample sentences from the task-specific training data as well as the data description. Then a large pretrained language model (GPT3) generates new sentences influenced by the sample sentences. 



Here we show an example taken from their paper on automatically generating training data for the sentiment detection task. The authors report a substantial improvement over baselines such as back translation. 


Aveni’s perspective


At Aveni Labs, we’re experimenting with and leveraging these approaches to produce models that can be trained using very little labelled data. We use prompting to create more labelled data, and use data augmentation to expand our labelled dataset. Our expert understanding of these methods means that we can deliver production ready models with far less data than was previously possible for machine learning solutions. Instead of needing 1,000s of training examples, we can make classifiers that work with only 100s, or in some cases even 10s, of real training examples. With this approach, we’re able to build superior models that need less human supervision, have excellent transcription accuracy, and greater functionality, for example, accurately identifying vulnerable customers


We work at the forefront of Artificial Intelligence and Natural Language Processing. Our world-class NLP engineers have employed these techniques and approaches to build our product – Aveni Detect – which lets you analyse 100% of customer interaction to power business improvement. Learn more here


Related posts

Adviser productivity
The landscape of mergers and acquisitions (M&A) in wealth management is undergoing significant changes, driven largely by evolving regulatory scrutiny. In a recent webinar, Jana Sivananthan, CRO at 7IM along...
Aveni’s fine-tuned RoBERTa language model has been knocking it out of the park when it comes to detecting vulnerabilities in call transcripts, even beating the latest GPT-4. Over the past...
In this webinar, Aveni’s CEO, Joseph Twigg, Head of NLP, Iria Del Rio and Chief Client Officer, Robbie Homer-Plews, held a live Q&A bootcamp as a crash course in AI...
Financial Services
Just four months into 2024 and the FCA has been relentless at proving that Consumer Duty is every bit the ‘TCF with teeth’ the industry expected it to be. Companies...
The financial services (FS) industry is steeped in complexity and ever-evolving regulations. From process inefficiencies to outdated legacy systems that require manual data input that hasn’t been maintained to a...
What is the EU AI Act: the key takeaways   The December 2023 EU AI Act is the first comprehensive legal framework for AI in the world. It aims to...
Consumer Duty
Based on current trends and strategic planning, here are the top 5 regulatory developments the Financial Conduct Authority (FCA) will prioritise in 2024 and everything you need to know about...
We know that there’s a lot to come in the next twelve months. That’s why we asked a popular chatbot what it predicts to be the top 5 generative AI...
Adviser productivity
Cavendish Online, part of Lloyds Banking Group, has partnered with, the Artificial Intelligence fintech business, to become one of the first protection distributors in the market to use AI...
Financial Services
Nothing gets us through the day better than a podcast. That’s why we’ve put together this list of our top 5 financial services podcasts to add to your playlist for...
Artificial intelligence (AI) is transforming almost every sector of the world, and the finance industry is no exception. From robo-advisors to algorithmic trading to chatbots answering customer questions, AI is...
Artificial Intelligence (AI) has been a hot topic, not just in finance but in homes and businesses across the world. From whipping up long paragraphs in seconds to translating languages,...

Aveni’s platform uses the latest in NLP to transform productivity and risk oversight.

Scale compliance at a fraction of the cost

Cut financial advice admin from hours to minutes with Aveni’s AI assisitant

Aveni Assist

Get up and running with Aveni Assist and how it can help transform productivity and compliance. 

Aveni Detect

Get up and running with Aveni Detect and how it can help transform productivity and compliance. 

Read the latest articles from Aveni

Access our latest whitepapers, webinars, brochures and more

Jargon-bust your way to a better understanding of all things AI