🐛 fix input text issue #97

prashantgupta24 · 2024-08-16T19:06:35Z

Description

To fix the recent change in vllm - since output contains the input text they do not pass in the prompt with the result anymore.

How Has This Been Tested?

Tested by locally copying code over to the dev_pod and running a request

Single request

grpcurl -plaintext -proto proto/generation.proto -d \
  '{
    "model_id": "dummy-model-name",
    "requests": [
      {
        "text": "Once upon a time,"
      }
    ],
    "params": {
      "method": "GREEDY",
      "stopping": {
        "max_new_tokens": 20
      },
      "response": {
          "input_text": true
        }
    }
  }' \
  localhost:8033 fmaas.GenerationService/Generate
{
  "responses": [
    {
      "generatedTokenCount": 20,
      "text": "Once upon a time, there was a little girl who loved to read. She loved to read so much that she would read",
      "inputTokenCount": 6,
      "stopReason": "MAX_TOKENS"
    }
  ]
}

2 requests

❯ grpcurl -plaintext -proto proto/generation.proto -d \
  '{
    "model_id": "dummy-model-name",
    "requests": [
      {
        "text": "Once upon a time,"
      },
      {
        "text": "When I was little,"
      }
    ],
    "params": {
      "method": "GREEDY",
      "stopping": {
        "max_new_tokens": 20
      },
      "response": {
          "input_text": true
        }
    }
  }' \
  localhost:8033 fmaas.GenerationService/Generate
{
  "responses": [
    {
      "generatedTokenCount": 20,
      "text": "Once upon a time, there was a little girl who loved to read. She loved to read so much that she would read",
      "inputTokenCount": 6,
      "stopReason": "MAX_TOKENS"
    },
    {
      "generatedTokenCount": 20,
      "text": "When I was little, I was a big fan of the movie, The Wizard of Oz. I loved the movie",
      "inputTokenCount": 6,
      "stopReason": "MAX_TOKENS"
    }
  ]
}

Streaming

❯ grpcurl -plaintext -proto proto/generation.proto -d \
  '{
    "model_id": "dummy-model-name",
    "request": [
      {
        "text": "Once upon a time,"
      }
    ],
    "params": {
      "method": "GREEDY",
      "stopping": {
        "max_new_tokens": 4
      },
      "response": {
          "input_text": true
        }
    }
  }' \
  localhost:8033 fmaas.GenerationService/GenerateStream
{
  "text": "Once upon a time,",
  "inputTokenCount": 6
}
{
  "generatedTokenCount": 1,
  "text": " there"
}
{
  "generatedTokenCount": 2,
  "text": " was"
}
{
  "generatedTokenCount": 3,
  "text": " a"
}
{
  "generatedTokenCount": 4,
  "text": " little",
  "stopReason": "MAX_TOKENS"
}

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

src/vllm_tgis_adapter/grpc/grpc_server.py

codecov-commenter · 2024-08-16T19:27:41Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 55.98%. Comparing base (537a42b) to head (76f6594).

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #97      +/-   ##
==========================================
+ Coverage   55.87%   55.98%   +0.11%     
==========================================
  Files          24       24              
  Lines        1482     1486       +4     
  Branches      266      268       +2     
==========================================
+ Hits          828      832       +4     
  Misses        579      579              
  Partials       75       75

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

maxdebayser

LGTM

tjohnson31415 · 2024-08-16T23:35:29Z

I tried out this change out and noticed that the BOS token is included in the text output:

{
  "responses": [
    {
      "generatedTokenCount": 20,
      "text": "\u003cs\u003eleys, to flare upon the\ndarkness of their condition.  But, the time was not come yet; and\nevery wind that blew over France shook the rags of the scarecrows\nin vain, for the birds, fine of song and feather, took no warning.\n\nThe wine-shop was a corner shop, better than most others in its\nappearance and degage, but less comfortable inside.  It was, for the\nneighbourhood, a",
      "inputTokenCount": 85,
      "stopReason": "MAX_TOKENS"
    }
  ]
}

prashantgupta24 · 2024-08-19T17:38:35Z

@tjohnson31415 thanks a lot for the suggestion to add the prompt directly instead of going down the complicated route of decoding the tokens!

src/vllm_tgis_adapter/grpc/grpc_server.py

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

The assumption that it remains a valid way to correlate the request and the response is because vllm does it that way Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

tjohnson31415

LGTM!

prashantgupta24 commented Aug 16, 2024

View reviewed changes

src/vllm_tgis_adapter/grpc/grpc_server.py Outdated Show resolved Hide resolved

maxdebayser reviewed Aug 16, 2024

View reviewed changes

src/vllm_tgis_adapter/grpc/grpc_server.py Outdated Show resolved Hide resolved

maxdebayser approved these changes Aug 16, 2024

View reviewed changes

prashantgupta24 requested a review from maxdebayser August 19, 2024 18:01

prashantgupta24 force-pushed the input-text-fix branch from 6e1c073 to 06347f7 Compare August 19, 2024 19:57

tjohnson31415 reviewed Aug 19, 2024

View reviewed changes

src/vllm_tgis_adapter/grpc/grpc_server.py Outdated Show resolved Hide resolved

tjohnson31415 reviewed Aug 19, 2024

View reviewed changes

src/vllm_tgis_adapter/grpc/grpc_server.py Outdated Show resolved Hide resolved

prashantgupta24 added 2 commits August 19, 2024 15:48

🐛 fix input text issue

bfff901

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

♻️ use request index instead of a map

76f6594

The assumption that it remains a valid way to correlate the request and the response is because vllm does it that way Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

prashantgupta24 force-pushed the input-text-fix branch from 70fcd0f to 76f6594 Compare August 19, 2024 22:48

prashantgupta24 enabled auto-merge August 19, 2024 22:48

tjohnson31415 approved these changes Aug 19, 2024

View reviewed changes

prashantgupta24 added this pull request to the merge queue Aug 19, 2024

Merged via the queue into main with commit 282bfc9 Aug 19, 2024
3 checks passed

prashantgupta24 deleted the input-text-fix branch August 19, 2024 23:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 fix input text issue #97

🐛 fix input text issue #97

prashantgupta24 commented Aug 16, 2024 •

edited

Loading

codecov-commenter commented Aug 16, 2024 •

edited

Loading

maxdebayser left a comment

tjohnson31415 commented Aug 16, 2024

prashantgupta24 commented Aug 19, 2024

tjohnson31415 left a comment

🐛 fix input text issue #97

🐛 fix input text issue #97

Conversation

prashantgupta24 commented Aug 16, 2024 • edited Loading

Description

How Has This Been Tested?

Single request

2 requests

Streaming

Merge criteria:

codecov-commenter commented Aug 16, 2024 • edited Loading

Codecov Report

maxdebayser left a comment

Choose a reason for hiding this comment

tjohnson31415 commented Aug 16, 2024

prashantgupta24 commented Aug 19, 2024

tjohnson31415 left a comment

Choose a reason for hiding this comment

prashantgupta24 commented Aug 16, 2024 •

edited

Loading

codecov-commenter commented Aug 16, 2024 •

edited

Loading