Detecting Semantic Mismatches in XBRL Tag Mapping for SEC 10-K Filings: A Text Comparison and Historical Consistency Analysis

Authors

  • Dun Liang Business Analytics, Fordham University, New York, USA Author
  • Zijie Chen Computer Engineering, University of Toronto Master, Toronto, Canada Author
  • Chuanli Wei Computer Science, University of Southern California, CA, USA Author

DOI:

https://doi.org/10.63575/CIA.2026.40113

Keywords:

XBRL tag mapping, semantic mismatch detection, SEC financial reporting, text similarity analysis

Abstract

The accuracy of eXtensible Business Reporting Language (XBRL) tag mapping in SEC financial filings directly affects the reliability of automated financial analysis conducted by millions of investors through the EDGAR system. This study investigates semantic mismatches between financial statement line-item labels and their corresponding XBRL taxonomy elements in 10-K annual reports filed with the U.S. Securities and Exchange Commission. Drawing on SEC Financial Statement Data Sets and the XBRL US Data Quality Committee (DQC) validation rule library, this research analyzes custom tag usage patterns across filer categories and industry sectors over the period 2014–2024, with cross-sectional tabulation of filer-category rates at selected benchmark years (2014, 2017, 2019, and 2020) and aggregate trend data through 2024 drawn from SEC Office of Structured Disclosure publications. A tiered text comparison approach combining lexical similarity scoring (TF-IDF and BM25) with domain-specific contextual features is applied to evaluate the semantic alignment between reported line items and assigned taxonomy tags. Cross-period consistency analysis and SIC-code industry peer benchmarking are employed to identify anomalous tag selection changes that may indicate data quality degradation rather than substantive business changes. The findings reveal persistent heterogeneity in custom tag rates across industries and filer sizes, with specific tag mapping patterns that warrant targeted validation checkpoints. The proposed lightweight verification methods are designed for integration into existing disclosure management workflows without requiring complex computational infrastructure.

Author Biography

  • Chuanli Wei, Computer Science, University of Southern California, CA, USA

     

     

Published

2026-02-13

How to Cite

[1]
Dun Liang, Zijie Chen, and Chuanli Wei, “Detecting Semantic Mismatches in XBRL Tag Mapping for SEC 10-K Filings: A Text Comparison and Historical Consistency Analysis”, Journal of Computing Innovations and Applications, vol. 4, no. 1, pp. 154–163, Feb. 2026, doi: 10.63575/CIA.2026.40113.