Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QADataGenerator doesn't generate QA pairs in non-English languages #34099

Closed
pamelafox opened this issue Feb 1, 2024 · 4 comments
Closed

QADataGenerator doesn't generate QA pairs in non-English languages #34099

pamelafox opened this issue Feb 1, 2024 · 4 comments
Assignees
Labels
AI customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.

Comments

@pamelafox
Copy link
Member

  • Package Name: azure-ai-generative
  • Package Version: 1.0.0b2
  • Operating System: Linux/Debian
  • Python Version: 3.11

Describe the bug

When we use the QADataGenerator(model_config=openai_config) with Brazilian Portuguese texts, the QA pairs generated are always in English.

I believe that is a known issue, but it needs to be documented clearly, as customers expect LLM-based tools to work in non-English languages.

To Reproduce

Pass in Portuguese text to this code:

    result = qa_generator.generate(
        text=text,
        qa_type=QAType.LONG_ANSWER,
        num_questions=2,
    )

And get English answers instead.

Expected behavior

Portuguese answers.

@github-actions github-actions bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Feb 1, 2024
@l0lawrence l0lawrence added the Service Attention Workflow: This issue is responsible by Azure service team. label Feb 1, 2024
@github-actions github-actions bot removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Feb 1, 2024
@l0lawrence
Copy link
Member

Hi @pamelafox thank you for the feedback, forwarding your request to @azureml-github and they will get back to you asap.

@diondrapeck
Copy link
Contributor

@pamelafox Thank you for bringing this to our attention. We will update the reference documentation to make the language limitations clear.

@diondrapeck diondrapeck self-assigned this Feb 2, 2024
@xiangyan99 xiangyan99 added the AI label Mar 7, 2024
@github-actions github-actions bot added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Mar 7, 2024
@EMjetrot
Copy link

EMjetrot commented Dec 2, 2024

Hi everyone - Is there any progress on making QADataGenerator multilingual? I would be very much appreciated if that was possible :)

@diondrapeck
Copy link
Contributor

@EMjetrot - the azure-ai-generative package was deprecated earlier this year, so no updates will be made to any of the classes. Instead, it's recommended that you use the azure-ai-evaluation package. The analogous class is in that library is the Simulator (and its subclasses) which does have support for multiple non-English languages. Here's an example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

5 participants