Authored by Donghwan Lee, AI Team Lead and Scarlett Bae, AI Specialist

Looking at the historical development of AI models, the evolution from early rule-based systems to today’s highly advanced models like GPT-3 and GPT-4 is truly remarkable. AI began with simple rule-based approaches, progressed through machine learning, and reached deep learning, thanks to advances in technology and improved access to large-scale data. Today, AI has become an essential part of many industries.

Among the recent advancements in AI, the most prominent technology is undoubtedly the Large Language Model (LLM). LLMs are deep neural network models trained on massive text datasets, enabling them to understand and generate human language. These models are capable of learning from datasets containing billions or even trillions of words, allowing them to grasp complex relationships and context within text.

While LLMs are often viewed as just one branch of AI, they can still feel abstract and difficult to understand. Mathematically, they can be thought of as “complex systems made up of numerous nonlinear regression models”—which is essentially what a deep learning model is. When combined with a transformer architecture, which excels at capturing long-range dependencies in text, these systems evolve into large language models capable of engaging in natural human-like conversation.

[Figure 1. Neural Network Model and Nonlinear Regression Model]

Limitations of LLMs and How to Overcome Them

The strengths of Large Language Models (LLMs) are clear. With exceptional natural language processing capabilities, they can be applied across a wide range of fields—from document analysis and customer support to content creation. Their flexibility and scalability, powered by vast amounts of learned data, are second to none.

However, despite these advantages, LLMs have notable limitations. One such issue is hallucination—the generation of incorrect or fabricated information. Others include a lack of domain-specific expertise and challenges in performing sophisticated reasoning. These limitations pose significant risks when applying LLMs in fields like actuarial science, where accuracy and reliability are critical. Results based on inaccurate information can compromise financial soundness, lead to regulatory violations, and damage trust.

To address these concerns, research into methodologies that clearly present fact-based knowledge and logical structures is essential before adopting LLMs in high-stakes environments.

Solution 1: RAG (Retrieval-Augmented Generation)

One of the most promising approaches to overcoming LLM limitations is Retrieval-Augmented Generation (RAG). RAG allows LLMs to retrieve information from external knowledge sources in real time (Retrieval), augment their responses with this data (Augmentation), and finally generate more accurate outputs (Generation). Unlike traditional LLMs, which rely solely on pre-trained parameters, RAG significantly enhances accuracy and trustworthiness by referencing credible external sources.

Here's a closer look at how RAG works:

Query: The user inputs a question or request.
Retrieval: The system searches a knowledge base for semantically relevant information based on the query.
Augmentation: Retrieved data is combined with the original query as input to the answer-generation model.
Generation: The model uses this enriched input to generate a final response.

In conclusion, RAG provides a structured way to reduce hallucinations and is a highly effective and realistic framework. It is not just a patch for LLM weaknesses but a foundational technology for building trustworthy AI systems. This makes RAG especially suitable for actuarial work, where precision and credibility are paramount.

[Figure 2. RAG Flow and Retrieval Methodology]

Solution 2: Dataset – Document Formatting

Above all, the most critical factor in the success of any AI project is the dataset. The quality of the data directly determines the performance of the AI system. In the insurance industry, there are already cases where companies have invested heavily in AI systems but failed to achieve the expected results.

One key reason is document formatting. Many documents containing insurance company data are not created in a machine-readable format. This doesn’t simply refer to typos or grammatical errors—it means the structure of the document is often not optimized for AI to understand. Since document quality is essential for improving actuarial productivity through AI, it's crucial to address three outdated documentation practices that need reform.

First: Move away from PDF-based documentation

PDFs are designed for print, not for machine interpretation. While they are visually clear for human readers, their structure is often ambiguous for machines. There have been attempts to analyze PDFs using technologies like OCR (Optical Character Recognition) and Vision Transformers, but these approaches still face limitations in accuracy and require significant time and cost for pre- and post-processing.

In contrast, formats such as .docx, .tex, .html, and .md (Markdown) are text-based and globally recognized standards that AI can accurately parse. Notably, Microsoft’s open-source "Markitdown" project is being actively developed by a global community of contributors, making it highly suitable for various insurance documentation needs.

If your organization is still using non-standard or country-specific word processors, there’s a high risk of falling behind in the shift toward AI integration. It’s essential to either develop in-house tools that can convert these documents into machine-readable formats or initiate a company-wide transition to standard formats as soon as possible.

Second: Use LaTeX or KaTeX Instead of Image-Based Equations

In actuarial work, complex mathematical formulas frequently appear. Yet many documents still embed these formulas as images. The problem? AI systems cannot read image-based equations. While OCR technology can provide some recognition, it often lacks accuracy and increases processing costs.

The clear solution is to use TeX-based syntax, such as LaTeX or KaTeX, for writing equations. Even if a formula looks visually correct, if it lacks proper internal syntax, AI won’t be able to interpret it—this is a classic “garbage in, garbage out” scenario. KaTeX, in particular, is highly recommended. It renders quickly in web browsers and is easy to learn even for non-technical users, making it ideal for organization-wide adoption.

Third: Avoid Formatting Entire Documents with Tables

Some documents use tables throughout to manage layout. While this may appear neat to human readers, for AI, it’s nearly equivalent to an encrypted file. Tables obscure the semantic structure of a document—such as headings, paragraphs, and sections—making it extremely difficult for AI to grasp the context.

Especially when titles, subtitles, and explanations are all placed inside table cells, AI struggles to differentiate and understand the document’s core message. Instead, use semantic formatting tools built into word processors—like heading styles, paragraphs, and bullet points. This not only improves AI readability but also enhances searchability and long-term maintainability of documents.

Insurance companies possess vast data assets. But unless this data is presented in a machine-readable format, its value cannot be realized.

The shift to AI is not simply a matter of adopting new technologies. It requires a strategic transformation—standardizing information structure and creating documents that are comprehensible to both humans and machines.

Now is the time to review your internal documentation practices. Eliminate PDFs and image-based formulas, and adopt AI-friendly document structures. The true starting point of AI implementation is not the algorithm, but the document.

Solution 3: Ontology

In actuarial science, building an ontology is a critical step. Ontology involves clearly defining and structuring key concepts and terminology, allowing AI systems to better understand and process information. In the actuarial domain, ontologies significantly enhance data interoperability and structural understanding, enabling more accurate and timely decision-making.

An ontology systematically defines the concepts and relationships within a specific domain—such as insurance product structures, actuarial/statistical/financial techniques, legal and accounting regulations, and internal company rules and manuals. When this structured knowledge is embedded into a Knowledge Graph, it empowers large language models (LLMs) to respond with higher precision, better contextual understanding, and improved reasoning across related information.

For instance, if a user asks how to calculate the reserve for a specific insurance product, the LLM can leverage the Knowledge Graph to synthesize relevant regulations, mathematical methods, and similar product cases to generate a reliable response. At the same time, it can visually present which concepts and data points the response is based on—enhancing both transparency and user trust.

[Figure 3. Example of a Knowledge Graph for a Hypothetical Cancer Insurance Product]

To apply these technologies in practice, close collaboration is essential between actuaries, data scientists, and AI engineers. A phased approach to building ontologies and knowledge graphs is critical. Equally important is the development of automated techniques for extracting and updating relationships, as well as designing integrated systems that connect large language models (LLMs) with knowledge graphs.

Donghwan Lee, Head of the AI Lab at RNA Analytics, emphasized, “LLMs have the potential to dramatically improve the efficiency and accessibility of actuarial work, but ensuring their reliability is crucial for safe adoption.” He added, “High-quality data, standardized document structures, ontologies, and knowledge graphs are the key elements to overcoming AI’s current limitations and driving real transformation in actuarial processes.”

AI adoption in actuarial science is no longer just experimental. It is evolving into a strategic shift—one that enables true automation and greater information accuracy through well-structured knowledge frameworks and integrated system design.

The Most Advanced AI Model in Modern Society: LLM (Large Language Model)