Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor BaseEnvironment into SingleStep and MultiStep environments #1810

Draft
wants to merge 38 commits into
base: master
Choose a base branch
from

Conversation

hallerite
Copy link
Collaborator

Description

At the moment, BaseEnvironment is very general. However, if we categorize into SingleStep and MultiStep environments, we can be more efficient with our code. For example, normally, when doing RL with LLMs, the environment is symbolically single step - we sample a question, get an answer and compute a reward based on that answer. Then, we start a new episode. This is the priority for the Loong project.

MultiStep environments, i.e. those that do not follow the pattern described above, e.g. Chess or Tic Tac Toe, are a bit more complicated and should be handled separately. This PR will provide an implementation of SingleStep and MultiStep environment.

This PR builds upon #1801.

This PR closes #1736.

Checklist

Go over all the following points, and put an x in all the boxes that apply.

  • I have read the CONTRIBUTION guide (required)
  • I have linked this PR to an issue using the Development section on the right sidebar or by adding Fixes #issue-number in the PR description (required)
  • I have checked if any dependencies need to be added or updated in pyproject.toml and uv lock
  • I have updated the tests accordingly (required for a bug fix or a new feature)
  • I have updated the documentation if needed:
  • I have added examples if this is a new feature

If you are unsure about any of these, don't hesitate to ask. We are here to help!

apokryphosx and others added 30 commits March 7, 2025 13:48
initialized from HF/Pytorch/JSON/list of Dicts,
remove the need for setup call and subsequently
cleanup
instead of strings and add seed for reproducibility
between simply skipping invalid datapoints in a
seed dataset and throwing an exception
seed dataset to ensure they are defined before the
other functions are
getitem and cast len(data) to a Sized to pass mypy
tests
…of Seed Dataset to ensure proper validation & add additional logging to init from JSON
call helper functions for each type of data
initialising with PyTorch Datasets
properly cover all 4 initialization functions &
add tests for sampling
@hallerite
Copy link
Collaborator Author

TO DO:

  • Merge master into branch
  • Update Tests
  • Add MultiStepEnv

@hallerite hallerite self-assigned this Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Base Module Environment is too general
2 participants