-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
4 changed files
with
122 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
# Reasons to Reject? Aligning Language Models with Judgments | ||
|
||
<document-info> | ||
- tags: #論文筆記 | ||
- date: 2024/07/05 | ||
</document-info> | ||
|
||
研究首次系統性地探討了使用語言反饋(**判斷**)來對齊 LLM 的可能性,提出了 Contrastive Unlikelihood Training (CUT) 框架。 | ||
|
||
實驗結果表明,CUT 僅需 1317 筆訓練資料便能超越 175B 的 DaVinci003。並且進一步分析表明,**判斷**在LLM對齊中具有比 RL 獎勵更大的潛力。 | ||
|
||
## 問題設定 | ||
|
||
假設有一組**指令**-**回應**-**判斷**三元組 $(x, y, j)$,其中指令 $x = [x_1, \ldots, x_M]$,**回應** $y = [y_1, \ldots, y_N]$,**判斷** $j = [j_1, \ldots, j_Q]$ 為長度分別為 $M$、$N$ 和 $Q$ 的符號序列。**回應**可能存在缺陷或被認為完全滿意。**判斷**提供了對**回應**的優缺點的分析,這些分析可以由人類或 AI 模型起草。將 LLMs 與**判斷**對齊的目標是使 LLMs 保留在優點中提到的適當行為,更重要的是,解決缺點以防止未來的不當行為。 | ||
|
||
### 可能的解決方案 | ||
#### Forward Prediction | ||
依序預測**回應**及其**判斷**。 | ||
|
||
$$ | ||
\mathcal{L} = -\frac{1}{N}\sum_t \log p(y_t \mid y_{<t},x)-\frac{1}{Q} \sum_t \log p(j_t \mid j_{<t},y,x) | ||
$$ | ||
> 在 Forward Prediction 學習生成判斷並不一定會轉化為增強的回應生成,因為回應生成是在判斷生成之前的。 | ||
#### Imitation learning from language feedback (ILF) | ||
要求 LLM 根據**判斷**進行反饋,由此我們可以獲得改進的**回應** $\hat{y}$。 | ||
|
||
$$\hat{y} = \text{LLM}(x,y,j)$$ | ||
|
||
要學習改進後的**回應** $\hat{y}$ 有兩種方法: | ||
|
||
- **ILF-MLE** | ||
$$ | ||
\mathcal{L}_i^{mle} = -\frac{1}{N} \sum_t \log p(\hat{y}_t \mid \hat{y}_{<t},x) | ||
$$ | ||
|
||
- **ILF-DPO** | ||
|
||
$$ | ||
\mathcal{L}_i^{dpo} = \text{DPO}(x,y,\hat{y}) | ||
$$ | ||
|
||
> ILF中間接使用判斷限制了其發現和糾正判斷中弱點的能力。 | ||
#### Hindsight | ||
LLM 在條件序列 $[x, j]$ 下生成回應 $y$。 | ||
$$ | ||
\mathcal{L}_h = -\frac{1}{N}\sum_t \log p (y_t \mid y_{<t},x,j) | ||
$$ | ||
|
||
> Hindsight 將不滿意的回應作為最大似然估計的目標,不可避免地增加了生成不滿意回應的風險。 | ||
## Contrastive Unlikelihood Training | ||
Contrastive Unlikelihood Training (CUT) 是一個微調框架,用於使LLM與**判斷**對齊。其核心思想是通過對比不同條件下的回應生成,來確定LLM應保持的適當行為及需要調整的具體內容。 | ||
|
||
適當內容使用最大似然估計(MLE)來處理,不適當內容則使用 Unlikelihood Training (UT) 方法。 | ||
|
||
### 對齊時加入判斷 | ||
|
||
![image](./cut.png) | ||
|
||
如果指令的回應是符合人類期望($x \rightarrow y$)我們稱為**對齊**。否則將會有**判斷**來指出**回應**內的錯誤 | ||
|
||
假設任務是生成一個滿足**判斷**的**回應**,我們表示成 $[x, j] \rightarrow y$。 | ||
|
||
基於這個想法,我們構建了三種類型的對齊數據: | ||
|
||
|
||
- Align-P: $x$ 與 $y$ 的組合是令人滿意的。 | ||
- Align-N: LLM 生成**回應**的時候,犯了一些錯誤,因此需要有**判斷**$j$來指證。 | ||
- Misalign: Align-N 中真實的負面**判斷**($j^-$)被替換成了虛假的正面**判斷** $j$ ($j^+$)。 | ||
|
||
### 從對比中學習 | ||
|
||
#### Align-N vs. Misalign | ||
集合 $U$ 紀錄那些對 $j^-$反應較高的$t$,這些 tokens 被認為是不合適的: | ||
$$ | ||
U = \{ t \mid p(y_t \mid y_{<t}, x, j^{-}) - \lambda \cdot p(y_t \mid y_{<t}, x, j^{+}) > 0 \} | ||
$$ | ||
|
||
|
||
我們希望那些合適的 tokens (即$t \notin U$) 的 likehood 要比較高,並且不合適的 tokens (即$t \in U$) 需要被懲罰: | ||
$$ | ||
\begin{array}{l} | ||
\mathcal{L}_1 = -\frac{1}{N}(\sum_{t \notin U} \log p(y_t \mid y_{<t},x) \\ | ||
\hspace{1.3cm} +\sum_{t \in U} \alpha p(y_t \mid y_{<t},x,j^-)^\gamma \log(1-p(y_t \mid y_{<t},x))) | ||
\end{array} | ||
$$ | ||
|
||
#### Align-P vs. Align-N | ||
Align-P 和 Align-N 都有相同的表示 $[x,j]\rightarrow y$ 但是在考慮指令 ($x \rightarrow y$)的狀況下,便僅有 Align-P 是成立的。 | ||
|
||
首先我們需要讓模型學直接的$x\rightarrow y$關係,然後如果$x\rightarrow y$的語意關係是不成立的,需要有**判斷**介入描述 $y$ 的錯誤類型: | ||
|
||
$$ | ||
\begin{array}{l} | ||
\mathcal{L}_2 = - \frac{\mathbb{1}(x \rightarrow y)}{N} \sum_t \log p(y_t \mid y_{<t},x) \\ | ||
\hspace{1.3cm}-\frac{1-\mathbb{1}(x \rightarrow y)}{N} \sum_t \log p(y_t |y_{<t},j,x) | ||
\end{array} | ||
$$ | ||
|
||
最後我們結合兩個 Loss: $\mathcal{L}_{cut} = \mathcal{L}_1 + \mathcal{L}_2$ | ||
|
||
|
||
## Experiments | ||
|
||
#### 指令遵循 | ||
![image](./exp1.png) | ||
|
||
> For AlpacaEval, we report the winning rate of the responses generated by our models against DaVinci003 using GPT4 as the judge. | ||
#### CUT應用在不同模型 | ||
![image](./exp2.png) | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.