Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16355 client: pydaos.torch module (#15475) #15536

Merged
merged 4 commits into from
Dec 3, 2024

Conversation

mjmac
Copy link
Contributor

@mjmac mjmac commented Nov 25, 2024

Introducing pydaos.torch module that allows use DAOS POSIX containers
as a datasource for pytorch framework in form of pydaos.torch.Dataset and
pydaos.torch.IterableDataset classes.

Signed-off-by: Denis Barakhtanov dbarahtanov@enakta.com

Copy link

github-actions bot commented Nov 25, 2024

Ticket title is 'pydaos.torch modules'
Status is 'Open'
https://daosio.atlassian.net/browse/DAOS-16355

@mjmac mjmac changed the title mjmac/DAOS 16355 google 2.6 DAOS-16355 client: pydaos.torch module (#15475) Nov 25, 2024
@daosbuild1
Copy link
Collaborator

@daosbuild1
Copy link
Collaborator

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15536/1/execution/node/387/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15536/1/execution/node/385/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15536/1/execution/node/295/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15536/1/execution/node/273/log

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15536/1/execution/node/317/log

This patch marks all pool and container handles as if they
were created with g2l in the child processes after fork.
It prevents misinteractions if one of the child processes
closes the handle.
The marking is done by iterating through all the pool and
container handles which was not supported by the hhash code.

This patch also:
- adds support for fork to pydaos.
- introduces daos_reinit() to be called after fork.
- fixes IL to set the atfork callback when no extra eq are used.
- remove support for creating an event queue for each pydaos
  put/get operation. This makes the global event queue the only
  option. This should probably be moved to a per-thread eq in
  the future.

Signed-off-by: Johann Lombardi <johann.lombardi@gmail.com>
Disable call to pthread_atfork and daos_reinit() in pydaos
until DAOS-16637 is understood.

Signed-off-by: Johann Lombardi <johann.lombardi@gmail.com>
@mjmac mjmac force-pushed the mjmac/DAOS-16355-google-2.6 branch from 117c94f to e50b3c3 Compare November 25, 2024 18:29
techbasset and others added 2 commits December 2, 2024 22:29
Modifications to utils/node_local_test.py to better control which
tests are run and which aren't.

* Added ability to specify multiple --test arguments
* Added ability to exclude specific tests via --exclude-test arg
* Added ability to "parameterize" test names to specify particular
  variants
* Added ability to exclude previous unnamed tests

POSIX tests were run with two variants: cached and uncached. Test
names can now optionally include "_caching_on" and "caching_off"
suffices to control which version to run or not run. Names without
suffices are automatically expanded.

Note that previously some tests used the the "_with_caching"
suffix, so this will change some reported test names.

Use of the needs_dfuse_with_opt and needs_dfuse decorators made
interposing on test lists a bit difficult; needs_dfuse_with_opt now
keeps track of all tests it and needs_dfuse interpose on and which
(caching) variants each test uses; at runtime, a list of test
exclusions is checked to see which and how many variants actually
need to be run.

Manually tested with a few variations of command line invocations.

Signed-off-by: Nicholas Murphy <ncmurphy@google.com>
Introducing pydaos.torch module that allows use DAOS POSIX containers
as a datasource for pytorch framework in form of pydaos.torch.Dataset and
pydaos.torch.IterableDataset classes.

Signed-off-by: Denis Barakhtanov <dbarahtanov@enakta.com>
@mjmac mjmac force-pushed the mjmac/DAOS-16355-google-2.6 branch from e50b3c3 to 178c4b9 Compare December 2, 2024 22:29
@mjmac mjmac requested a review from jolivier23 December 3, 2024 03:01
@mjmac mjmac requested a review from jolivier23 December 3, 2024 17:15
@mjmac
Copy link
Contributor Author

mjmac commented Dec 3, 2024

@jolivier23: This needs to be force landed due to the NLT on ubuntu issue.

@jolivier23 jolivier23 merged commit 583d9f5 into google/2.6 Dec 3, 2024
62 of 67 checks passed
@jolivier23 jolivier23 deleted the mjmac/DAOS-16355-google-2.6 branch December 3, 2024 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants