TSD 2025. Scale-free Characteristics of Multilingual Legal Texts and the Limitations of LLMs

This work presents a comparative analysis of text complexity across domains using scale-free metrics. We quantify linguistic complexity via Heaps’ exponent β (vocabulary growth), Taylor’s exponent α (word-frequency fluctuation scaling), compression rate r (redundancy), and entropy. Our corpora span three domains: legal documents (statutes, cases, deeds) as a specialized domain, general natural language texts (literature, Wikipedia), and AI-generated (GPT) text. We find that legal texts exhibit slower vocabulary growth (lower $\beta$) and higher term consistency (higher α) than general texts. Within legal domain, statutory codes have the lowest β and highest α, reflecting strict drafting conventions, while cases and deeds show higher β and lowerα. In contrast, GPT-generated text shows the statistics more aligning with general language patterns. These results demonstrate that legal texts exhibit domain-specific structures and complexities, which current generative models do not fully replicate.

References

Haoyang Chen and Kumiko Tanaka-Ishii. 2025. Scale-Free Characteristics of Multilingual Legal Texts and the Limitations of LLMs. In Text, Speech, and Dialogue: 28th International Conference, TSD 2025, Erlangen, Germany, August 25–28, 2025, Proceedings, Part II. Springer-Verlag, Berlin, Heidelberg, 102–114. https://doi.org/10.1007/978-3-032-02551-7_10

Categorized in:

Uncategorized

References

Leave a Reply Cancel reply

Other Stories

NeurIPS 2025. Correlation Dimension of Autoregressive Large Language Models

🏆ACL 2025 Outstanding Paper Award. New Formulation of Zipf’s Meaning-Frequency Law

NeurIPS 2025. Correlation Dimension of Autoregressive Large Language Models

🏆ACL 2025 Outstanding Paper Award. New Formulation of Zipf’s Meaning-Frequency Law

AAAI 2025. Information-Theoretic Generative Clustering of Documents

🏆ACL 2025 Outstanding Paper Award. New Formulation of Zipf’s Meaning-Frequency Law

AAAI 2025. Information-Theoretic Generative Clustering of Documents

JSTAT 2023. Strahler number of natural language sentences in comparison with random trees

Physical Review Research 2024. Correlation dimension of natural language in a statistical manifold

Knowledge-Based Systems 2022. Modeling of financial markets under extreme risks

NeurIPS 2025. Correlation Dimension of Autoregressive Large Language Models

🏆ACL 2025 Outstanding Paper Award. New Formulation of Zipf’s Meaning-Frequency Law

AAAI 2025. Information-Theoretic Generative Clustering of Documents

Complexity of Language and Its Relation to Inference

ACL 2020. Stock Embeddings Acquired from News Articles and Price History, and an Application to Portfolio Optimization

ACL 2020. Stock Embeddings Acquired from News Articles and Price History, and an Application to Portfolio Optimization

ACM ICAIF 2023. Co-Training Realized Volatility Prediction Model with Neural Distributional Transformation

ACL 2020. Influence of textual data and communication structure on financial prices

Knowledge-Based Systems 2022. Modeling of financial markets under extreme risks

Press ESC to close

Or check our Popular Categories...

References

Leave a Reply Cancel reply

Related Articles

Other Stories

NeurIPS 2025. Correlation Dimension of Autoregressive Large Language Models

🏆ACL 2025 Outstanding Paper Award. New Formulation of Zipf’s Meaning-Frequency Law