[Evening Read] Would feeding more passages through RAG to long-context LLM improve precision?

Mohamed Nabeel
2 min readOct 14, 2024

--

RAG empowers LLMs to use external information sources by selecting most relevant information pieces from large corpus of information.

Would feeding in more and more data from RAG to long-context LLMs improve performance?

Not always! It turns out the performance (precision) improves upto certain amount of RAG data but after a while it starts to go down.

Figure: As more data is retrieved, the RAG performance goes down. Note that e5 is a stronger retriever compared to BM25. (source: paper)

Specifically, researchers found that as the amount of retrieved data increased, the recall of the model continued to improve but the precision fell down.

In other words, long-context LLM had the better chance of finding the right data, but had not so good chance of finding the relevant data.

A key cause for this behavior is that as you retrieve more information, RAG tends to include more irrelevant passages. These are called hard negatives.

How robust are current long-context LLMs to hard negatives?

Figure: robustness to hard negatives (source: paper) — increasing the number of hard negatives leads to decline in RAG answer accuracy.

In light of hard negatives, can we improve the performance of RAGs with long-context LLMs?

Researchers propose three methods

  1. Retrieval reordering — this a training free method where “lost-in-the-middle” phenomenon is reduced. Think about it. Human readers also tend to read the first and last paragraphs. Similarly, giving priority to first and last paragraphs in retrieval improves the RAG performance.
  2. Data-augmented fine-tuning — expose LLMs to both good and bad (hard negatives) data at the fine-tuning time so that it effectively learn to differentiate them.
  3. Intermediate reasoning step during fine-tuning —LLM is provided with labeled reasoning paragraphs to guide its learning.

So, as a practitioner, which method would be ideal for me? In general, the RAG performance improved from 1 to 3 but the cost of doing so increased in the opposite direction.

Multi-step reasoning could further improve RAG performance.

A key take away is that more retrieved context in RAG is not always better in the world of LLMs!

Reference:

--

--

Mohamed Nabeel
Mohamed Nabeel

Written by Mohamed Nabeel

Cyber Security Researcher | Machine Learning | Crypto for everyone! LLMs: https://bit.ly/4h9XZMW AI + Cyber Security: https://bit.ly/3CwY3r2

No responses yet