An Empirical Comparison of Few-Shot Example Selection Strategies for In-Context Learning on Public Reasoning and QA Benchmarks

Xuanyi Fu; Fanyi Zhao

doi:10.63575/CIA.2025.30209

Authors

Xuanyi Fu M.S.E. in Computer Science, Johns Hopkins University, MD, USA Author
Fanyi Zhao Computer Science, Stevens Institute of Technology, NJ, USA Author

DOI:

https://doi.org/10.63575/CIA.2025.30209

Keywords:

in-context learning, few-shot prompting, demonstration selection, reasoning benchmarks.

Abstract

In-context learning allows large language models to adapt to a new task by conditioning on a small set of labelled demonstrations placed inside the prompt, and a growing body of work shows that the demonstrations chosen can shift task accuracy by more than ten absolute points. Four families of selection strategies dominate current practice: random sampling, similarity-based retrieval, diversity-based coverage, and complexity-based ranking. Their relative strengths across task types have not been examined inside a single controlled grid. This work offers an empirical comparison of six representative strategies drawn from these four families on four widely used public benchmarks — GSM8K, MMLU, BIG-Bench Hard, and CommonsenseQA — with two open-weight instruction-tuned backbones (Llama-3-8B-Instruct and Mistral-7B-Instruct-v0.2) and a robustness check on StrategyQA. Every strategy is evaluated under the same shot budget and prompt template, and stability is quantified across random seeds. No single strategy dominates the spread of tasks: similarity-based retrieval excels on commonsense QA, complexity-based ranking leads on multi-step arithmetic and algorithmic reasoning, a similarity-plus-diversity hybrid delivers the most stable average accuracy, and the gap between the best and worst strategies is moderate at 3.1 points. These findings support a task-aware view of demonstration selection and suggest that selection can be tuned at the task-type level

Author Biography

Fanyi Zhao, Computer Science, Stevens Institute of Technology, NJ, USA

An Empirical Comparison of Few-Shot Example Selection Strategies for In-Context Learning on Public Reasoning and QA Benchmarks

Authors

DOI:

Keywords:

Abstract

Author Biography

Downloads

Published

Issue

Section

License

How to Cite

Menu

Counter

Contact