The emergence of induction heads in training GPT.
This field combines genetic and optical methods to precisely target and manipulate specific neurons within the brain, enabling detailed studies of neural circuits and brain functions.
By selectively activating or inhibiting specific neurons, researchers can map the connections and functions of different neural circuits, understanding how they contribute to behavior and cognition.
Optogenetics allows for the real-time control of behavior by manipulating neural circuits. This has been used to study processes such as learning, memory, fear, reward, and social behaviors.
clamping (causal manipulations of activations throughout training)
A difference in attention weights from the query token
For two label tokens, measure the attention difference. \(A_{correct} - A_{incorrect}\)
For example, “A 0 B 1 A” case will have
$A_{5,2} - A_{5,4}$
At the position of the last token, the previous token activations in the first layer can be ablated by zero attention weight and zero values. “A 0 B 1 A”
Specifically, we note that different Layer 2 heads exhibit phase changes at different times (but all eventually learn to solve the task). Head 3 is the “quickest to learn” in Figure 4b, which may also be the reason it becomes the strongest when training with all heads (Figure 3b)—perhaps heads are racing to minimize the loss, similar to the mechanism in Saxe et al. (2022).
Once Head 3 emerges, though, the ordering of phase changes in Figure 4b does not match emergence in the full network
Reverse-S phase changes in the loss are often due to multiple interacting components