Various metrics are considered in terms of whether they characterize different kinds of data. For example, in the case of natural language, metrics that specify the author, language, or genre have been studied. One such metric is Yule’s K, which is equivalent to Renyi’s second-order (plug-in) entropy. Yule’s K computes a value that does not depend on the data size but only on the data kind. We explore such metrics among various statistics related to scaling properties of real data and compare different kinds of data such as music, programming language sources, and natural language.

References

  • Kumiko Tanaka-Ishii, Shunsuke Aihara. Computational Constancy Measures of Texts—Yule’s K and Rényi’s Entropy. Computational Linguistics, 2015, 41.3: 481-502. [link]

Categorized in: