Retrieval Augmented Generation - Why?

Large Language Models (LLMs) like ChatGPT have revolutionized our interactions with machines. They serve as companions, answer simple questions about life, and assist in writing documents like this one. However, LLMs are expensive to train and maintain, limiting their availability to a few capable companies. Beyond the cost, they have several notable drawbacks:

Time Cutoff: ChatGPT was initially trained on data available up to 2021, so it cannot answer questions like, “Who won the match between France and Austria in Euro 2024?” Periodic fine-tuning with the latest information is needed to address such questions.
Knowledge Cutoff: ChatGPT is pretrained on publicly available web data and cannot access or answer questions based on private documents. This limitation poses challenges for customers needing to sift through numerous internal company documents.
Factual Consistency: Users cannot always verify the source of information provided by ChatGPT. Certain platforms, such as perplexity.ai, offer citations to their sources, which can enhance user trust and confidence in the generated content.

Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (RAG) addresses the drawbacks of LLMs by retrieving relevant documents when a user queries information that the LLM cannot answer. This approach combines document retrieval with the generation capabilities of LLMs or other generative models. The potential benefits for businesses are significant, as this method can unlock numerous opportunities and use cases.

I will also explore broader questions related to RAG from scientific literature and solutions developed by other companies. I am excited to explore this avenue.