Depre

All the papers...

End-to-end memory networks.

Parameter-Efficient Transfer Learning for NLP

Info : [Github], Efficiency,
PEFT [github] : State-of-the-art Parameter-Efficient Fine-Tuning (PEFT) methods

Injecting Domain Knowledge in Language Models for Task-oriented Dialogue Systems

Memorizing Transformers

LoRA: Low-Rank Adaptation of Large Language Models

Info :

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Locating and Editing Factual Associations in GPT

Info : FactualGPT

Mass Editing Memory in a Transformer (MEMIT)

Memory-Based Model Editing at Scale

Attcat: Explaining transformers via attentive class activation tokens

Quantifying attention flow in transformers

Transformer interpretability beyond attention visualization

Scaling Transformer to 1M tokens and beyond with RMT

On Provable Copyright Protection for Generative Models

Poisoning Language Models During Instruction Tuning

Bag of Tricks for Training Data Extraction from Language Models,

Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust

CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision NetworksDownload

Info : [pdf], CLIP, Network Dissection,

The Future of Fundamental Science Led by Generative Closed-Loop Artificial Intelligence

2023.07

Regarding the the contribution of generative models on scientific discoveries.

StyleDrop: Text-To-Image Generation in Any Style

2023.06 [pdf]

Non-autoregressive modeling, Muse + Style

Muse:Text-To-Image Generation via Masked Generative Transformers

2023.01 [pdf]

Non-autoregressive modeling

A toy model of universality: Reverse engineering how networks learn group operations

2023.07

A representation is a homomorphism (weight matrix) which maps input vector to the output vector. The paper deals the universality Universality: whether different models have similar features with one-layer transformer as Transformers learn group theoretic automata .

Acquisition of chess knowledge in alphazero

- 2022.11

Emergent world representations: Exploring a sequence model trained on a synthetic task

2022