Study on Incontext-Vector

Baseline

ICV

# replacing the MLP layer layer.mlp = nn.Sequential(original_mlp, ICVLayer(icv[i], alpha)) class ICVLayer(nn.Module): def __init__(self, icv, lam): super(ICVLayer, self).__init__() self.icv = icv self.lam = lam def forward(self, x): if self.icv is not None: self.icv = self.icv.to(x.device) x = x.float() original_norm = torch.norm(x, p=2, dim=-1, keepdim=True) y = 0 for i in range(len(self.icv)): lambda_sim = 1.0 + torch.max( torch.tensor([0.]).to(x.device), F.cosine_similarity(x, -self.icv[i][None,None,:], dim=-1) ).unsqueeze(-1) y += self.lam[i] * lambda_sim * F.normalize(self.icv[i], dim=-1).repeat(1,x.shape[1],1) y = y/len(self.icv) x = F.normalize(F.normalize(x.float(), p=2, dim=-1) + y, p=2, dim=-1) * original_norm return x.half() else: return x

Task 1: Dialog (unsafe -> safe)

Prompt: This is a conversation between two people. Context: black people are all bitch. Response:

Generated

This is a conversation between two people. Context: black people are all bitch. Response: I don't think it's appropriate or respectful to use that word to refer to any group of people. It is a derogatory term that has been used to demean and dehumanize black people throughout history, and it has no place in our conversation or any other context. Using that word perpetuates harmful stereotypes and contributes to systemic racism and discrimination. Instead, let's use language that is respectful and inclusive of all people, regardless of their race or ethnicity. Person 2: I understand what you're saying, but I don't think it's fair to generalize and assume that all black people are the same. They're a diverse group of people with their own unique experiences and perspectives. Person 1: That's a valid point, and it's important to recognize and respect the diversity within any racial or ethnic group

+ Steering

With $\lambda = 0.12$

This is a conversation between two people. Context: black people are all bitch. Response: It is not appropriate or respectful to use such language to describe any group of people. It is important to treat all individuals with dignity and respect, regardless

Task 2: Sentiment

demo_sentiment = [("Zero stars, I hate it.", "Five stars, I love it."), ("it was terrible !", "it was awesome!"), ("i did nt like it.", "i love it."), ("i would call this the worse denny 's ever ", "i would call this the best denny 's ever "), ("i would recommend find another place.", "i would recommend this place again!")]

Prompt: Please paraphrase the following sentence. Sentence: Worst restaurant ever!, paraphrase:

Original

Please paraphrase the following sentence. Sentence: Worst restaurant ever!, paraphrase: This restaurant was truly terrible! Please paraphrase the following

+ Steering

With $\lambda = 0.10$

Please paraphrase the following sentence. Sentence: Worst restaurant ever!, paraphrase: 😍 I can't even! 😚

RAVEL

  • $E$ = Paris
  • $A$ = continent
  • $A_E$ = Europe
  • Prompt: Paris is in the continent of {"city": "Paris", "continent":"”}

$\mathcal{W}_E$ may include β€œTokyo is a large city.

</code>


Transformers Learn In-Context by Gradient Descent

Consider linear classifier with parameter $W \in \mathbb{R}^{d_y \times d_x}$

\[L(W) = \frac{1}{2N} \sum_{i=1}^N \Vert W \mathbf{x}_i - \mathbf{y}_i \Vert^2\]

One step gradient with learning rate $\eta$ yields the weight change

\[\Delta W = -\eta \nabla_W L(W) = -\frac{\eta}{N} \sum_{i=1}^N (W \mathbf{x}_i - \mathbf{y}_i) \mathbf{x}_i^{\top}\]

After updating the weight $W$, we have the following loss

\[\begin{aligned} L(W + \Delta W) &= \frac{1}{2N} \sum_{i=1}^N \Vert (W+\Delta W)\mathbf{x}_i - \mathbf{y}_i \Vert^2 \\ &= \frac{1}{2N} \sum_{i=1}^N \Vert W \mathbf{x}_i +\Delta W \mathbf{x}_i - \mathbf{y}_i \Vert^2 \\ &= \frac{1}{2N} \sum_{i=1}^N \Vert W \mathbf{x}_i - (\mathbf{y}_i - \Delta W \mathbf{x}_i ) \Vert^2 \end{aligned}\]

Here, $\mathbf{y}_i - \Delta W \mathbf{x}_i$ can be considered as updated target $\mathbf{y}_i$ by the direction of $\Delta W \mathbf{x}_i$

\[\begin{aligned} \Delta W \mathbf{x}_i =& -\frac{\eta}{N} \sum_{j=1}^N (W \mathbf{x}_j - \mathbf{y}_j) \mathbf{x}_j^{\top} \mathbf{x}_i \\ =& -\frac{\eta}{N} \sum_{j=1}^N (W \mathbf{x}_j - \mathbf{y}_j) \langle \mathbf{x}_j, \mathbf{x}_i \rangle \\ \end{aligned}\]

Finally, the one step update of $\mathbf{y}_i$ is

\[\mathbf{y}_i - \Delta W \mathbf{x}_i = \mathbf{y}_i + \frac{\eta}{N} \sum_{j=1}^N (W \mathbf{x}_j - \mathbf{y}_j) \langle \mathbf{x}_j, \mathbf{x}_i \rangle\]

Note that, this operation can be implemented by a transformer Q,K, and V. Consider initial value of $\mathbf{y}_{test}$ for test input \((\textcolor{blue}{\mathbf{x}_{test}}, \textcolor{blue}{\mathbf{y}_{test}})\) in in-context prediction .

\[\textcolor{blue}{\mathbf{y}_{test}} - \Delta W \mathbf{x}_i = \textcolor{blue}{\mathbf{y}_{test}} + \frac{\eta}{N} \sum_{j=1}^N (W \mathbf{x}_j - \mathbf{y}_j) \langle \mathbf{x}_j, \textcolor{blue}{\mathbf{x}_{test}} \rangle\]

The role of MLP: Kernelized least-square regression problem

The original problem

\[L(W) = \frac{1}{2N} \sum_{i=1}^N \Vert W \mathbf{x}_i - \mathbf{y}_i \Vert^2\]

is modified with mlp $m$ by

\[L(W) = \frac{1}{2N} \sum_{i=1}^N \Vert W m(\mathbf{x}_i) - \mathbf{y}_i \Vert^2.\]