Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 fix input text issue #97

Merged
merged 2 commits into from
Aug 19, 2024
Merged

🐛 fix input text issue #97

merged 2 commits into from
Aug 19, 2024

Conversation

prashantgupta24
Copy link
Contributor

@prashantgupta24 prashantgupta24 commented Aug 16, 2024

Description

To fix the recent change in vllm - since output contains the input text they do not pass in the prompt with the result anymore.

How Has This Been Tested?

Tested by locally copying code over to the dev_pod and running a request

Single request

grpcurl -plaintext -proto proto/generation.proto -d \
  '{
    "model_id": "dummy-model-name",
    "requests": [
      {
        "text": "Once upon a time,"
      }
    ],
    "params": {
      "method": "GREEDY",
      "stopping": {
        "max_new_tokens": 20
      },
      "response": {
          "input_text": true
        }
    }
  }' \
  localhost:8033 fmaas.GenerationService/Generate
{
  "responses": [
    {
      "generatedTokenCount": 20,
      "text": "Once upon a time, there was a little girl who loved to read. She loved to read so much that she would read",
      "inputTokenCount": 6,
      "stopReason": "MAX_TOKENS"
    }
  ]
}

2 requests

❯ grpcurl -plaintext -proto proto/generation.proto -d \
  '{
    "model_id": "dummy-model-name",
    "requests": [
      {
        "text": "Once upon a time,"
      },
      {
        "text": "When I was little,"
      }
    ],
    "params": {
      "method": "GREEDY",
      "stopping": {
        "max_new_tokens": 20
      },
      "response": {
          "input_text": true
        }
    }
  }' \
  localhost:8033 fmaas.GenerationService/Generate
{
  "responses": [
    {
      "generatedTokenCount": 20,
      "text": "Once upon a time, there was a little girl who loved to read. She loved to read so much that she would read",
      "inputTokenCount": 6,
      "stopReason": "MAX_TOKENS"
    },
    {
      "generatedTokenCount": 20,
      "text": "When I was little, I was a big fan of the movie, The Wizard of Oz. I loved the movie",
      "inputTokenCount": 6,
      "stopReason": "MAX_TOKENS"
    }
  ]
}

Streaming

❯ grpcurl -plaintext -proto proto/generation.proto -d \
  '{
    "model_id": "dummy-model-name",
    "request": [
      {
        "text": "Once upon a time,"
      }
    ],
    "params": {
      "method": "GREEDY",
      "stopping": {
        "max_new_tokens": 4
      },
      "response": {
          "input_text": true
        }
    }
  }' \
  localhost:8033 fmaas.GenerationService/GenerateStream
{
  "text": "Once upon a time,",
  "inputTokenCount": 6
}
{
  "generatedTokenCount": 1,
  "text": " there"
}
{
  "generatedTokenCount": 2,
  "text": " was"
}
{
  "generatedTokenCount": 3,
  "text": " a"
}
{
  "generatedTokenCount": 4,
  "text": " little",
  "stopReason": "MAX_TOKENS"
}

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@codecov-commenter
Copy link

codecov-commenter commented Aug 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 55.98%. Comparing base (537a42b) to head (76f6594).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #97      +/-   ##
==========================================
+ Coverage   55.87%   55.98%   +0.11%     
==========================================
  Files          24       24              
  Lines        1482     1486       +4     
  Branches      266      268       +2     
==========================================
+ Hits          828      832       +4     
  Misses        579      579              
  Partials       75       75              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tjohnson31415
Copy link
Contributor

I tried out this change out and noticed that the BOS token is included in the text output:

{
  "responses": [
    {
      "generatedTokenCount": 20,
      "text": "\u003cs\u003eleys, to flare upon the\ndarkness of their condition.  But, the time was not come yet; and\nevery wind that blew over France shook the rags of the scarecrows\nin vain, for the birds, fine of song and feather, took no warning.\n\nThe wine-shop was a corner shop, better than most others in its\nappearance and degage, but less comfortable inside.  It was, for the\nneighbourhood, a",
      "inputTokenCount": 85,
      "stopReason": "MAX_TOKENS"
    }
  ]
}

@prashantgupta24
Copy link
Contributor Author

@tjohnson31415 thanks a lot for the suggestion to add the prompt directly instead of going down the complicated route of decoding the tokens!

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
The assumption that it remains a valid way to correlate the request and the response is because vllm does it that way

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Copy link
Contributor

@tjohnson31415 tjohnson31415 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@prashantgupta24 prashantgupta24 added this pull request to the merge queue Aug 19, 2024
Merged via the queue into main with commit 282bfc9 Aug 19, 2024
3 checks passed
@prashantgupta24 prashantgupta24 deleted the input-text-fix branch August 19, 2024 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants