Clustering is a fundamental technique in machine learning and data mining, offering a powerful lens to understand self-organizing patterns in the real world. At its core, clustering is inherently information-theoretic:…
language modeling
We apply an information-theoretic perspective to reconsider generative document retrieval (GDR), in which a document x∈X is indexed by t∈T, and a neural autoregressive model is trained to map queries Q to T. GDR…
The correlation dimension of natural language is measured by applying the Grassberger-Procaccia algorithm to high-dimensional sequences produced by a large-scale language model. This method, previously studied only in a Euclidean…
The bitcoin price crash at the beginning of 2018 was caused by various social factors. The influence of news wire stories and social media was especially crucial because of the…
The theory of econophysics reveals the scaling properties of price, which explains why market crashes much more frequently than expected. A challenge of financial market modeling is to characterize the…
For mathematical models of language, their potential, limitations, and ways of improvement are investigated in terms of whether they reproduce the complex properties of language. The nature of linguistic structure…
State-of-the-art word embedding methods represent a word with a single vector and presume a linear vector space, which does not easily incorporate nonlinearity that is necessary to, for example, represent…