Metadata-Aware Multi-Hop RAG Retrieval Quality Prediction with Graph and Attention Features
DOI:
https://doi.org/10.63575/CIA.2026.40204Keywords:
retrieval-augmented generation, multi-hop retrieval, metadata-aware retrieval, evidence-path prediction, graph features, attention features, calibrationAbstract
Multi-hop retrieval-augmented generation can fail even when individual passages appear relevant because the retrieved set omits a bridge document or combines evidence that does not form a complete reasoning path. This paper presents MAGAF, a metadata-aware graph and attention feature framework for predicting retrieval sufficiency before answer generation. MAGAF represents the top-ranked context through source, category, entity, and date agreement; weighted cross-document graph cohesion; retrieval-score gaps; and attention-style summaries of score concentration. The experimental pipeline was evaluated on a controlled MultiHop-RAG-compatible collection containing 2,556 queries and 609 documents, with non-null evidence paths spanning two to four documents. Five retrievers and five predictors were compared under a strict complete-evidence target at top four. Hybrid-MetaGraph retrieval achieved Recall@4 of 0.501 and CompleteRecall@4 of 0.320, improving CompleteRecall@4 by 0.110 over TF-IDF. The calibrated MAGAF predictor achieved AUROC 0.884, F1 0.765, Brier score 0.119, and expected calibration error 0.072. Bootstrap 95% confidence intervals were 0.854-0.915 for AUROC and 0.712-0.809 for F1. The results show that metadata agreement, graph cohesion, and score dispersion provide complementary signals for deciding whether a multi-document context is sufficiently complete for downstream generation.


