[Issue]: Prompt Tuning documentation is missing required argument (--config) #1012

sbhuttan · 2024-08-23T13:55:40Z

Do you need to file an issue?

I have searched the existing issues and this bug is not already filed.
My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the issue

Running prompt tuning command based on suggested examples in documentation, throws missing required argument error.

Error - python -m graphrag.prompt_tune: error: the following arguments are required: --config

Steps to reproduce

Initialize GraphRAG
Update env file and settings.yaml
Run prompttune based on suggested examples or parameters defined in documentation.

GraphRAG Config Used

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: azure_openai_chat
  model: gpt-4o
  model_supports_json: true # recommended if this is available for your model.
  # max_tokens: 4000
  # request_timeout: 180.0
  api_base: https://openai-demo-0824.openai.azure.com
  api_version: 2024-02-15-preview
  # organization: <organization_id>
  deployment_name: gpt-4o
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made
  # temperature: 0 # temperature for sampling
  # top_p: 1 # top-p sampling
  # n: 1 # Number of completions to generate

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: azure_openai_embedding
    model: text-embedding-ada-002
    api_base: https://openai-demo-0824.openai.azure.com/
    api_version: 2024-02-15-preview
    # organization: <organization_id>
    deployment_name: text-embedding-ada-002
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional
  


chunks:
  size: 1200
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents
    
input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 1

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 1

community_reports:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 8000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # llm_temperature: 0 # temperature for sampling
  # llm_top_p: 1 # top-p sampling
  # llm_n: 1 # Number of completions to generate
  # max_tokens: 12000

global_search:
  # llm_temperature: 0 # temperature for sampling
  # llm_top_p: 1 # top-p sampling
  # llm_n: 1 # Number of completions to generate
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

Logs and screenshots

Additional Information

GraphRAG Version: V0.3.1
Operating System: Win 11
Python Version: 3.12.3
Related Issues:

kkkkken33 · 2024-08-26T05:52:19Z

Same problem. Does it solved?

sbhuttan · 2024-08-26T13:13:11Z

You need to include this argument and pass the path to "settings.yaml".

AlonsoGuevara · 2024-08-26T21:16:02Z

Doc update merged. Thanks for the report.

sbhuttan added the triage Default label assignment, indicates new issue needs reviewed by a maintainer label Aug 23, 2024

natoverse added documentation Improvements or additions to documentation and removed triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Aug 23, 2024

AlonsoGuevara self-assigned this Aug 23, 2024

AlonsoGuevara mentioned this issue Aug 23, 2024

Add missing config parameter for prompt tuning docs #1017

Merged

4 tasks

AlonsoGuevara closed this as completed Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: Prompt Tuning documentation is missing required argument (--config) #1012

[Issue]: Prompt Tuning documentation is missing required argument (--config) #1012

sbhuttan commented Aug 23, 2024 •

edited

Loading

kkkkken33 commented Aug 26, 2024

sbhuttan commented Aug 26, 2024

AlonsoGuevara commented Aug 26, 2024

[Issue]: Prompt Tuning documentation is missing required argument (--config) #1012

[Issue]: Prompt Tuning documentation is missing required argument (--config) #1012

Comments

sbhuttan commented Aug 23, 2024 • edited Loading

Do you need to file an issue?

Describe the issue

Steps to reproduce

GraphRAG Config Used

Logs and screenshots

Additional Information

kkkkken33 commented Aug 26, 2024

sbhuttan commented Aug 26, 2024

AlonsoGuevara commented Aug 26, 2024

sbhuttan commented Aug 23, 2024 •

edited

Loading