Depre

All the papers...

End-to-end memory networks.

Parameter-Efficient Transfer Learning for NLP

Injecting Domain Knowledge in Language Models for Task-oriented Dialogue Systems

Memorizing Transformers

LoRA: Low-Rank Adaptation of Large Language Models

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Locating and Editing Factual Associations in GPT

Mass Editing Memory in a Transformer (MEMIT)

Memory-Based Model Editing at Scale

Attcat: Explaining transformers via attentive class activation tokens

Quantifying attention flow in transformers

Transformer interpretability beyond attention visualization

Scaling Transformer to 1M tokens and beyond with RMT

Poisoning Language Models During Instruction Tuning

Bag of Tricks for Training Data Extraction from Language Models,

Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust

CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision NetworksDownload

The Future of Fundamental Science Led by Generative Closed-Loop Artificial Intelligence

Regarding the the contribution of generative models on scientific discoveries.

StyleDrop: Text-To-Image Generation in Any Style

Non-autoregressive modeling, Muse + Style

Muse:Text-To-Image Generation via Masked Generative Transformers

Non-autoregressive modeling

A toy model of universality: Reverse engineering how networks learn group operations

A representation is a homomorphism (weight matrix) which maps input vector to the output vector. The paper deals the universality Universality: whether different models have similar features with one-layer transformer as Transformers learn group theoretic automata .

Acquisition of chess knowledge in alphazero

Emergent world representations: Exploring a sequence model trained on a synthetic task