An Empirical Study on the Accuracy of Large Language Models in API Documentation Understanding: A Cross-Programming Language Analysis
DOI:
https://doi.org/10.63575/Keywords:
Large Language Models, API Documentation, Code Understanding, Cross-Language AnalysisAbstract
This study presents a comprehensive empirical evaluation of Large Language Models (LLMs) in understanding API documentation across multiple programming languages. We systematically assess the accuracy and consistency of five prominent LLMs—GPT-4, GPT-3.5, Claude-3, Llama-2, and CodeT5—in interpreting API documentation for Java, Python, JavaScript, and C++. Our evaluation framework employs both automated metrics and human evaluation protocols to measure understanding accuracy, completeness, and cross-language consistency. Results indicate significant variations in LLM performance across different programming languages, with accuracy scores ranging from 67.3% to 89.7%. The study reveals that syntax complexity, documentation structure, and linguistic patterns substantially influence LLM comprehension capabilities. These findings provide critical insights for improving LLM-based code assistance tools and establishing guidelines for effective API documentation design in multi-language development environments.