Language Modeling with Reduced Densities

Tai-Danae Bradley; Yiannis Vlassopoulos

doi:10.32408/compositionality-3-4

Tai-Danae Bradley ; Yiannis Vlassopoulos - Language Modeling with Reduced Densities

compositionality:13514 - Compositionality, November 30, 2021, Volume 3 (2021) - https://doi.org/10.32408/compositionality-3-4

Language Modeling with Reduced DensitiesArticle

Authors: Tai-Danae Bradley ^1,²; Yiannis Vlassopoulos ³

1 Sandbox@Alphabet, Mountain View, CA 94043, USA
2 Alphabet (United States)
3 Tunnel, New York, NY 10021, USA

This work originates from the observation that today's state-of-the-art statistical language models are impressive not only for their performance, but also - and quite crucially - because they are built entirely from correlations in unstructured text data. The latter observation prompts a fundamental question that lies at the heart of this paper: What mathematical structure exists in unstructured text data? We put forth enriched category theory as a natural answer. We show that sequences of symbols from a finite alphabet, such as those found in a corpus of text, form a category enriched over probabilities. We then address a second fundamental question: How can this information be stored and modeled in a way that preserves the categorical structure? We answer this by constructing a functor from our enriched category of text to a particular enriched category of reduced density operators. The latter leverages the Loewner order on positive semidefinite operators, which can further be interpreted as a toy example of entailment.

https://doi.org/10.32408/compositionality-3-4

Source: arXiv.org:2007.03834

Volume: Volume 3 (2021)

Published on: November 30, 2021

Imported on: May 2, 2024

Keywords: Computer Science - Computation and Language,Computer Science - Machine Learning,Mathematics - Category Theory,Quantum Physics

Licence: arXiv.org - Non-exclusive license to distribute

Bibliographic References

5 Documents citing this article

Share and export

Consultation statistics

This page has been seen 430 times.

This article's PDF has been downloaded 186 times.