An Empirical Evaluation of Prompt Injection Detection and Refusal-Usefulness Tradeoffs Using the deepset/prompt-injections Dataset

Authors

  • Daniel Brooks Data Science, University of Huddersfield, Huddersfield, UK Author
  • Samuel Turner Computer Science, Clemson University, SC, USA Author

DOI:

https://doi.org/10.63575/CIA.2025.30205

Keywords:

prompt injection, jailbreaks, large language models, text classification, abstention, security evaluation

Abstract

Prompt injection is a leading security risk for large language model (LLM) applications because adversaries can embed instructions that override system intent, exfiltrate hidden prompts, or trigger unsafe tool use. This paper presents a fully empirical evaluation of prompt-injection defenses on the public deepset/prompt-injections dataset (662 labeled prompts; 399 benign, 263 injection/attack) using the official train/test split. We compare lightweight detectors that can be deployed as an input gate: a keyword-based rule system, word-level TF-IDF with Logistic Regression (LR), character-level TF-IDF with LR, calibrated linear Support Vector Machines (SVMs), and Complement Naive Bayes. We report attack success rate (ASR), detection F1, and a refusal rate–usefulness tradeoff. On the test split, the best detector is a character TF-IDF + calibrated linear SVM with F1=0.901 and ROC-AUC=0.977, substantially outperforming keyword rules (F1=0.125). When used as a refusal gate, the same family of character models reduces ASR from 1.000 (no gate) to 0.117 at 92.9% usefulness (1 - benign refusal) under a low-false-positive operating point derived from benign-score quantiles on the training split. Error analysis shows that most remaining bypasses are short, multilingual, or typo-heavy injections, indicating that robust defenses require character-level generalization and abstention tuning. Overall, our results quantify the operational tradeoffs between security and usability and provide reproducible baselines for prompt injection detection research.

Author Biography

  • Samuel Turner, Computer Science, Clemson University, SC, USA

     

     

     

Published

2025-07-17

How to Cite

[1]
Daniel Brooks and Samuel Turner, “An Empirical Evaluation of Prompt Injection Detection and Refusal-Usefulness Tradeoffs Using the deepset/prompt-injections Dataset”, Journal of Computing Innovations and Applications, vol. 3, no. 2, pp. 66–84, Jul. 2025, doi: 10.63575/CIA.2025.30205.