Coverage-Aware Prediction of Multi-Hop RAG Answer Correctness on FRAMES

Authors

  • Callum Hughes Computer Science, University of Bristol, Bristol, BST, UK Author

DOI:

https://doi.org/10.63575/CIA.2026.40202

Keywords:

retrieval-augmented generation, FRAMES, multi-hop question answering, retrieval coverage, strict correctness, BM25, TF-IDF, calibration

Abstract

Multi-hop retrieval-augmented generation (RAG) can retrieve context that appears relevant while omitting an article needed to complete the reasoning chain. This study examines whether that evidence deficit can be detected before answer generation. The evaluation covers all 824 questions in the public FRAMES test file, which provides gold answers, reasoning labels, and relevant Wikipedia links. Because the file identifies articles but does not include article bodies, a closed-world title corpus is constructed from every parsed gold link. Four deterministic retrievers are compared: BM25 over normalized titles, word-level TF-IDF, character-level TF-IDF, and a fixed hybrid. Coverage is defined as the fraction of a question's parsed gold article set returned in the top-k. Strict coverage-gated correctness is one only when the full set is present; it therefore measures retrieval readiness rather than neural-reader accuracy. At k=10, the hybrid reaches mean coverage of 0.547 and strict correctness of 0.225; at k=25, the values rise to 0.590 and 0.273. Performance declines sharply as article count grows. In five-fold cross-validation, logistic regression using observable query and ranking signals predicts strict correctness at k=10 with ROC-AUC 0.796. Adding benchmark hop-count metadata raises ROC-AUC to 0.856. The findings show that incomplete multi-hop evidence is both a central retrieval bottleneck and a predictable risk signal that can guide additional retrieval, decomposition, or abstention before generation.

Published

2026-07-07

How to Cite

[1]
Callum Hughes, “Coverage-Aware Prediction of Multi-Hop RAG Answer Correctness on FRAMES”, Journal of Computing Innovations and Applications, vol. 4, no. 2, pp. 16–30, Jul. 2026, doi: 10.63575/CIA.2026.40202.