An Empirical Evaluation of Prompt Injection Detection and Refusal-Usefulness Tradeoffs Using the deepset/prompt-injections Dataset

Daniel Brooks; Samuel Turner

doi:10.63575/CIA.2025.30205

Authors

Daniel Brooks Data Science, University of Huddersfield, Huddersfield, UK Author
Samuel Turner Computer Science, Clemson University, SC, USA Author

DOI:

https://doi.org/10.63575/CIA.2025.30205

Keywords:

prompt injection, jailbreaks, large language models, text classification, abstention, security evaluation

Abstract

Prompt injection is a leading security risk for large language model (LLM) applications because adversaries can embed instructions that override system intent, exfiltrate hidden prompts, or trigger unsafe tool use. This paper presents a fully empirical evaluation of prompt-injection defenses on the public deepset/prompt-injections dataset (662 labeled prompts; 399 benign, 263 injection/attack) using the official train/test split. We compare lightweight detectors that can be deployed as an input gate: a keyword-based rule system, word-level TF-IDF with Logistic Regression (LR), character-level TF-IDF with LR, calibrated linear Support Vector Machines (SVMs), and Complement Naive Bayes. We report attack success rate (ASR), detection F1, and a refusal rate–usefulness tradeoff. On the test split, the best detector is a character TF-IDF + calibrated linear SVM with F1=0.901 and ROC-AUC=0.977, substantially outperforming keyword rules (F1=0.125). When used as a refusal gate, the same family of character models reduces ASR from 1.000 (no gate) to 0.117 at 92.9% usefulness (1 - benign refusal) under a low-false-positive operating point derived from benign-score quantiles on the training split. Error analysis shows that most remaining bypasses are short, multilingual, or typo-heavy injections, indicating that robust defenses require character-level generalization and abstention tuning. Overall, our results quantify the operational tradeoffs between security and usability and provide reproducible baselines for prompt injection detection research.

Author Biography

Samuel Turner, Computer Science, Clemson University, SC, USA

An Empirical Evaluation of Prompt Injection Detection and Refusal-Usefulness Tradeoffs Using the deepset/prompt-injections Dataset

Authors

DOI:

Keywords:

Abstract

Author Biography

Downloads

Published

Issue

Section

License

How to Cite

Menu

Counter

Contact