Memory3: Language Modeling with Explicit Memory
The authors formalize the explicit memory by proposing $\text{Memory}^3$ which has implicit memory (model parameters), working memory (context key-values), and explicit memory.
The explicit memories can be seen as retrievable model parameters, externalized knowledge, or sparsely-activated neural circuits.
to alleviate the issue of knowledge traversal. Knowledge traversal happens when the LLM wastefully invokes all its parameters (and thus all its knowledge) each time it generates a token. As an analogy, it is unreasonable for humans to recall everything they learned whenever they write a word.
The combined cost of LLM training and inference can be seen as the cost of encoding the knowledge from text data into various memory formats, plus the cost of reading from these memories during inference.
\[\sum_{\text{knowledge } k} \min_{\text{format } m} \text{cost}_{\text{write}}(k, m) + n_k \cdot \text{cost}_{\text{read}}(k, m)\]It is note worthing that the knowledge is encoded in various shapes
plain text (RAG) → explicit memory → model parameter
LLM training consumes so much data and energy [121, 77]. We want to rescue LLMs from this poor condition by equipping it with an explicit memory mechanism as efficient as that of humans.
While reading this paper, I realized that the definitions and theories in this work are ill-defined in that the definition of knowledge or subgraphs have less meaning.
For me, the readability of this work is not enough to keep reading. Therefore, I finish the reading.