Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read Egg Question #1489

Closed
andersonfrailey opened this issue Jul 21, 2017 · 6 comments
Closed

Read Egg Question #1489

andersonfrailey opened this issue Jul 21, 2017 · 6 comments
Labels

Comments

@andersonfrailey
Copy link
Collaborator

Could someone explain how the read_egg_csv and read_egg_json functions in utils.py work? I've been reading over the documentation for pkg_resources but I still haven't been able to figure out how it all works. Thanks in advance.

@martinholmer
Copy link
Collaborator

@andersonfrailey asked:

Could someone explain how the read_egg_csv and read_egg_json functions in utils.py work? I've been reading over the documentation for pkg_resources but I still haven't been able to figure out how it all works. Thanks in advance.

As you've probably figured out, Tax-Calculator reads in several CSV and JSON files both when executing from source code and when executing from a conda package. Those files are in the source code tree and are placed in the package (as an "egg", or at least this is my possibly incorrect understanding of how a Python/conda expert would put it) using information in the tax-calculator/MANIFEST.in file.

So, when reading one of those named files, tax-calculator code first checks for the file in the source code tree and, if it is there, reads it. If it can't find it in that directory location, then it assumes the code is running from the conda taxcalc package and calls either the read_egg_csv or read_egg_json function.

If you want to see how the two pkg_resources methods work together to read the file from inside the conda taxcalc package, look at this setuptools documentation using your browser to search for the documentation of the two methods used in the read_egg_* functions.

Why the read_egg_* code uses the Requirement.parse method to get a Requirement object to pass to the resource_stream method (rather than passing in the package name), I have no idea. That kind of issue is above my pay grade. There may be an important reason for doing it that way, or it may be the personal preference of the original developer of these functions. I have no idea.

The documentation says this:

In the following methods [including resource_stream], the package_or_requirement argument may be either a Python package/module name (e.g. foo.bar) or a Requirement instance.

Reading this might lead one to think the read_egg_* code could be simplified a bit.

@andersonfrailey, if you're interested in this topic, it would be a good learning experience for us all for you to try the alternative coding approach: implement the simpler code on a development branch, make a conda taxcalc package on your local computer (using the tax-calculator/conda.recipe/install_local_taxcalc_package.sh script), and test the simpler code by using the local taxcalc package outside of the tax-calculator source code tree. An example, of what you could do with the taxcalc package is this:

$ tc cps.csv 2020 --tables

If you get the same table output with the current and the simpler read_egg_* code, then it shows that the code can be simplified a bit. And, even more importantly, conducting this experiment will leave you in a more knowledgeable position to improve the docstring for each of these two read_egg_* functions. That would be appreciated by future developers, who will not need to ask questions like the one you asked in issue #1489.

@andersonfrailey
Copy link
Collaborator Author

@martinholmer, thanks for the detailed response. I will dig into the pkg_resources methods a bit and see if I can get the simpler code to run.

@martinholmer
Copy link
Collaborator

@andersonfrailey said:

I will dig into the pkg_resources methods a bit and see if I can get the simpler code to run.

Thanks. Your experimentation will help all of us understand better the read_egg_* logic.

@talumbau
Copy link
Member

Hi @andersonfrailey, here's some good background info for your research:

ospc-org/ospc.org#501

In particular, I recall that this Stack Overflow post was a helpful pointer in determining how to solve the problem of reading data files from an installed package.

http://stackoverflow.com/questions/6028000/python-how-to-read-a-static-file-from-inside-a-package

Good luck!

@martinholmer
Copy link
Collaborator

@andersonfrailey, What's your view of the status of issue #1489?

Perhaps your question was not answered in an authoritative manner. I already admitted your question was beyond my Python knowledge base.

But given there has been no comments added to #1489 for three weeks, do you think there are reasons to keep this issue open?

@andersonfrailey
Copy link
Collaborator Author

@martinholmer I've gotten enough information from this to look further into my question on my own. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants