AWS Only
ComplyAdvantage Automates Sanctions Screening Using Firemind’s RAG-Large Language Model Solution
ComplyAdvantage needed to automate the extraction of relationships between sanctioned entities and their associated individuals or organisations from unstructured data sources. Firemind implemented a Retrieval-Augmented Generation (RAG) solution leveraging Large Language Models (LLMs) and AWS services to automate this process, enhancing speed and accuracy.
AWS ONLY ·
ComplyAdvantage Automates Sanctions Screening Using Firemind’s RAG-Large Language Model Solution
ComplyAdvantage needed to automate the extraction of relationships between sanctioned entities and their associated individuals or organisations from unstructured data sources. Firemind implemented a Retrieval-Augmented Generation (RAG) solution leveraging Large Language Models (LLMs) and AWS services to automate this process, enhancing speed and accuracy.
At a glance
ComplyAdvantage is a leading technology company that offers AI-driven compliance solutions. They specialise in Anti-Money Laundering (AML) monitoring, Politically Exposed Persons (PEP) screening, and real-time sanctions tracking. Their services help financial institutions and various industries manage compliance risks effectively and efficiently.
Challenge
Manually extracting relationships from unstructured sources, such as web pages and PDF documents, was time-consuming and limited the ability to track sanctions-related data efficiently.
Solution
Firemind implemented a RAG-Large Language Model solution that automatically extracts connections between sanctioned entities and related individuals or organisations from HTML and PDF documents.
Services Used
Amazon Kendra
Amazon Bedrock
Amazon S3
AWS Lambda
Outcomes
Automated data extraction and task processing
Data accuracy and reliability dramatically improved
4.2 minute automated extraction time
Business challenges
Manual extraction of sanctioned entity relationships from unstructured data
ComplyAdvantage was facing significant inefficiencies in identifying relationships between sanctioned entities and associated individuals or organisations. While they had access to structured data sources, such as government lists, many of the crucial connections, like family members or linked organisations, were only available in unstructured sources, including corporate websites, news articles, and PDFs. The manual process of scraping and extracting this information required considerable time and resources, severely limiting their ability to provide real-time updates.
This manual approach meant that ComplyAdvantage could only process a limited number of entities per day, which was insufficient given the volume of data and the speed at which new sanctions are imposed. They needed an automated solution to enhance the speed, accuracy, and scale of their sanctions screening process, ensuring they could provide their clients with the most up-to-date information.
“It was about finding an AWS Partner that understood our ethos and values. It’s been really refreshing talking to Firemind about the the project with clear communication and without the usual jargon and abbreviations.”
Pete Kilbane, Commercial Director — MRC
Solution
Automating entity relationship extraction with RAG-LLM
To address ComplyAdvantage’s need for scalable and automated sanctions screening, Firemind developed a Proof of Concept (POC) leveraging AWS services and a Retrieval-Augmented Generation (RAG) approach. The solution was designed to extract relationships between sanctioned entities and their associated individuals or organisations from unstructured data sources, such as HTML web pages and PDF documents. The core of the solution was built using a combination of Amazon Kendra, Amazon Bedrock, and AWS Lambda functions.
The process started with ComplyAdvantage providing a dataset of web-scraped HTML and PDF documents related to 25 sanctioned entities. Using Amazon Kendra, Firemind indexed the documents and set up search queries to retrieve relevant chunks of text containing potential relationships. These retrieved chunks were processed by pre-engineered prompts that were designed specifically to extract key details like family connections or associated organisations. The prompts were passed to Amazon Bedrock, which utilised a Large Language Model (LLM) to analyse the text and extract the required relationships.
This extracted information was structured in a JSON format and stored in Amazon S3, allowing ComplyAdvantage to access the data in a format aligned with their internal systems. AWS Lambda orchestrated the entire process, ensuring smooth data processing and scalability. This automated solution significantly reduced the manual effort needed to extract critical sanctions-related connections, enabling ComplyAdvantage to process data faster and more efficiently.
Automated extraction:
The RAG-Large Language Model solution reduced manual effort by automating the extraction of relationships from web pages and PDFs. This automation allowed the system to handle large volumes of unstructured data, making the process more efficient and minimising the need for human intervention. The model could extract key connections, such as familial relationships or organisational roles, directly from documents.
Improved efficiency:
Extraction time was reduced to 4.2 minutes per entity, enabling faster processing of sanctions-related data. By automating the relationship extraction process, the solution significantly reduced the time it took to analyse documents, allowing ComplyAdvantage to process a much higher volume of entities in less time compared to manual methods.
Model Spotlight
Claude 3 Sonnet
We chose Claude 3 Sonnet for its ability to handle complex reasoning and text analysis, essential for extracting relationships between sanctioned entities, such as family connections and organisational roles, from large datasets of PDFs and HTML files.
Sonnet reduced extraction time to 4.2 minutes per entity, greatly improving efficiency. In addition, it was well-suited for the Retrieval-Augmented Generation (RAG) solution, enabling the model to gather and extract different types of information by understanding the content of the files. Its multi-language support allowed for comprehensive analysis across various jurisdictions, making it versatile for the project’s global scope.
While other models like Claude 3 Haiku were considered, Sonnet’s advanced reasoning capabilities, particularly its ability to understand and extract organisational positions and personal relationships, made it the best choice for delivering precise and reliable results in this compliance-focused use case.
Why Firemind
“Overall, the project was successful and demonstrated a valuable outcome in the use of a LLM-RAG solution to identify and extract connections (names and relationships) related to sanctioned entities from unstructured data sources."
Firemind’s deep expertise in leveraging AWS services was a key factor in their selection. By conducting a comprehensive assessment of Comply Advantage’s systems and processes, Firemind was able to develop a tailored solution that leveraged innovative technologies like generative AI and large language models.
252seconds
Execution time
The average execution time per entity is 4.2 minutes which is significantly faster than the time that a human researcher would spend to extract this information.
Added value
We constructed focused prompts based on a thorough exploration of the sources and the relationship keywords shared as well as retrieved the texts of interest based on the seed names from the vector database, and parsed the text to identify the connection names and types related to the sanctioned entity using an LLM.
Get in touch
Want to learn more?
Seen a specific case study or insight and want to learn more? Or thinking about your next project? Drop us a message below!