Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor transmute() on top of new "create" module #90

Merged
merged 11 commits into from
Aug 14, 2024
Merged

Conversation

dholth
Copy link
Contributor

@dholth dholth commented Jul 20, 2024

Description

I wrote this so that I could consider creating .conda from scratch using a TarFile subclass that was backed by files on the filesystem instead of a pre-existing TarFile. It also decomposes transmute to use the same streaming data structure as the rest of this library.

We also add the encoding= fix here, since it is simple and would not otherwise merge cleanly.

CHANGELOG.md Outdated Show resolved Hide resolved
@dholth dholth requested a review from jaimergp July 31, 2024 16:12
@dholth dholth linked an issue Jul 31, 2024 that may be closed by this pull request
2 tasks
@dholth dholth linked an issue Jul 31, 2024 that may be closed by this pull request
2 tasks
@dholth dholth requested review from jezdez and a team and removed request for jaimergp July 31, 2024 20:00
@dholth dholth changed the title refactor transmute() using new transmute_stream() function refactor transmute() on top of new .conda creation functions Jul 31, 2024
@dholth dholth changed the title refactor transmute() on top of new .conda creation functions refactor transmute() on top of new conda_fmt module Jul 31, 2024
Copy link

@marcoesters marcoesters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see, the refactor is equivalent to the old code. Adding a default encoding will hopefully fix the occasional encoding errors that were reported.

self.info_tar = info_tar
self.is_info = is_info

def addfile(self, tarinfo, fileobj=None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we could delegate to two other TarFile instances (info, pkg) and never call self.addfile()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think two TarFile instances are preferrable, even if a slight memory overhead it would reduce the risk of introducing some file access edge case with a custom tarfile subclass.

Copy link
Contributor Author

@dholth dholth Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File access? OTOH this subclass prevents us from assigning an unused file handle to the theoretical third wrapper TarFile

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I don't have strong feelings about it

]

intersphinx_mapping = {"python": ("https://docs.python.org/3", None)}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automatically links our documentation to the documentation of any Python standard library member we reference, either as subclass or in a type annotation.

conda_package_streaming/conda_fmt.py Outdated Show resolved Hide resolved
Copy link
Member

@jezdez jezdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions and suggestions

conda_package_streaming/conda_fmt.py Outdated Show resolved Hide resolved
conda_package_streaming/conda_fmt.py Outdated Show resolved Hide resolved


@contextmanager
def conda_builder(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def conda_builder(
def conda_v2_file(

Technically this creates a v2 version of the conda format, the first having been the tar.bz2 file format.

Copy link
Contributor Author

@dholth dholth Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO .conda is the conda format and tar.bz2 is just tar.bz2; why look backward

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure, the tar.bz2 format isn't a conda file format?

self.info_tar = info_tar
self.is_info = is_info

def addfile(self, tarinfo, fileobj=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think two TarFile instances are preferrable, even if a slight memory overhead it would reduce the risk of introducing some file access edge case with a custom tarfile subclass.

conda_package_streaming/conda_fmt.py Outdated Show resolved Hide resolved
@dholth dholth changed the title refactor transmute() on top of new conda_fmt module refactor transmute() on top of new "create" module Aug 2, 2024
@dholth dholth self-assigned this Aug 7, 2024
@dholth dholth requested a review from jezdez August 14, 2024 14:46
Comment on lines -30 to -41
# increase to reduce speed and increase compression (levels above 19 use much
# more memory)
ZSTD_COMPRESS_LEVEL = 19
# increase to reduce compression and increase speed
ZSTD_COMPRESS_THREADS = 1

CONDA_PACKAGE_FORMAT_VERSION = 2

# Account for growth from "2 GB of /dev/urandom" to not exceed ZIP64_LIMIT after
# compression
CONDA_ZIP64_LIMIT = zipfile.ZIP64_LIMIT - (1 << 18) - 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a last thought, could you check if those constants are used in any code in github search? Might be worth keeping them around

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Search turns up empty. We already import the two ZSTD_ constants here, only removing the package version and esoteric zip64 limit.

@dholth dholth merged commit 19fd0e4 into main Aug 14, 2024
8 checks passed
@dholth dholth deleted the streaming-transmute branch August 14, 2024 16:22
@dholth dholth requested review from a team and removed request for a team August 14, 2024 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

API for creating .conda files on the fly Extracting packages may fail with non-ASCII characters in file name
5 participants