Potential Gating Network Coding Error (on softmax) in armo-rm Stage 2 #54

SWY666 · 2025-01-31T17:45:53Z

Thanks for sharing the great code!

In your armo-rm stage 2 code (https://github.com/RLHFlow/RLHF-Reward-Modeling/blob/main/armo-rm/stage-2_train.py) line 80, you add softmax on the output at the dim 1, which is shown as follows:

    def forward(self, x: torch.FloatTensor) -> torch.FloatTensor:
        # Apply the linear layers with ReLU and dropout
        for i, layer in enumerate(self.layers):
            x = layer(x)
            if i < len(self.layers) - 1:
                x = F.relu(x)
                if self.dropout_prob > 0:
                    x = F.dropout(x, p=self.dropout_prob, training=self.training)
        # Apply softmax with temperature scaling
        x = F.softmax(x / self.temperature, dim=1) # This line
        return x * self.logit_scale[0]

But in practice the shape of the x here has 3 dimensions (i.e., batchsize, 2, 19; the 2 here is chosen & rejected), so I wonder whether the dim=1 here should be changed to dim=-1 to make the softmax be added on the last 19 weights during the training?

Btw, do you use this code for Amom-rm for crafting RLHFlow/ArmoRM-Llama3-8B-v0.1? When I use your code (i.e., with softmax set as dim=1), almost all 19 weights look weird (they sum not equal to 1, and they are almost the same value), and I cannot reproduce the high reward score as shown in Reward Bench Leaderboard.

The text was updated successfully, but these errors were encountered:

SWY666 changed the title ~~Potential Gating Network Coding Error (on sigmoid) in armo-rm Stage 2~~ Potential Gating Network Coding Error (on softmax) in armo-rm Stage 2 Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential Gating Network Coding Error (on softmax) in armo-rm Stage 2 #54

Potential Gating Network Coding Error (on softmax) in armo-rm Stage 2 #54

SWY666 commented Jan 31, 2025 •

edited

Loading

Potential Gating Network Coding Error (on softmax) in armo-rm Stage 2 #54

Potential Gating Network Coding Error (on softmax) in armo-rm Stage 2 #54

Comments

SWY666 commented Jan 31, 2025 • edited Loading

SWY666 commented Jan 31, 2025 •

edited

Loading