트랜스포머에 추가적인 입력을 같이 넣어서 생성물을 컨트롤 하는 연구
Given a transformer architecture and external memories. We link the internal representation of transformer to the external memories to fetch task information and adapt language generation. We have two types of models
Our goal is to construct the representation space so that the adapted LLM utilizes the memory and has higher performance where the task-knowledge is required.
We train an additional memory to memorize the contents So that the knowledge is stored in the model parameters. To do this, we need three data types in total
The context information
We inject memory module in the GPT layers to tune the model. In the knowledge memorization phase, the parameters are either frozen or trained.
The hidden representation of encoder is adapted to the memory-based representation.
model | info | EM | F1 | checkpoint |
---|---|---|---|---|
T5 | small | 75.39 | 83.68 | marry_go_round |
T5 | base | 81.41 | 88.80 | smell_of_sense |
T5 | large | 84.28 | 90.76 | red_pool |
T5 | 3B | 86.78 | 92.80 | bad_man |
T5 v1_1 | small | |||
T5 v1_1 | base | |||
T5 v1_1 | large | |||
T5 v1_1 | 3B | |||
T5 FLAN | small | |||
T5 FLAN | base | |||
T5 FLAN | large | |||
T5 FLAN | 3B |