Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform and load dependencies from setup.cfg #718

Merged
merged 8 commits into from
Nov 11, 2021

Conversation

olivia-hong
Copy link
Contributor

@olivia-hong olivia-hong commented Nov 5, 2021

Previously, requirements.txt was the only file cartography loaded dependencies from.
This PR adds support for setup.cfg, which is where many Python libraries define dependencies;
specifically in install_requires, setup_requires, and extras_require

Also tested locally by running cartography, ensuring the data was loaded correctly, and checking that I could query dependencies from setup.cfg.

Note: the GH dependency graph unfortunately does not support setup.cfg ingestion at the moment. Although setup.cfg support has been implement in dependabot, the two projects are completely separate with no timeline for merge at the moment.

@olivia-hong olivia-hong force-pushed the oh/add-setup-cfg-dependency-support branch 2 times, most recently from a5ebad1 to e590707 Compare November 5, 2021 22:02
@olivia-hong olivia-hong marked this pull request as ready for review November 5, 2021 22:09
@olivia-hong olivia-hong force-pushed the oh/add-setup-cfg-dependency-support branch from 4e397a7 to 08a9d77 Compare November 9, 2021 16:06
Copy link

@crockeo crockeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some initial thoughts

Comment on lines +249 to +250
Performs data transformations for the requirements.txt file in a GitHub repo, if available.
:param req_file_contents: Dict: The contents of the requirements.txt file.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotta love the random cleanup 🙂 , very "leave it better than you found it"

})

def _transform_setup_cfg_requirements(
setup_cfg_contents: Dict,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts on using typing-extensions to get TypedDict and defining which contents are in this Dict? or alternatively making a quick NamedTuple which contains the contents?

totally optional, this is just me pushing my static typing agenda.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contents are whatever is returned from the graphql API. Right now that's a single key value pair where the value is the entire setup.cfg file as a string, which I don't think is worth defining static typing for.

cartography/intel/github/repos.py Outdated Show resolved Hide resolved
cartography/intel/github/repos.py Outdated Show resolved Hide resolved
cartography/intel/github/repos.py Outdated Show resolved Hide resolved
cartography/intel/github/repos.py Outdated Show resolved Hide resolved
@olivia-hong olivia-hong force-pushed the oh/add-setup-cfg-dependency-support branch 3 times, most recently from 8fab58e to 5fe59b7 Compare November 9, 2021 21:02
@olivia-hong olivia-hong force-pushed the oh/add-setup-cfg-dependency-support branch from 5fe59b7 to 4bb75c3 Compare November 9, 2021 21:03
crockeo
crockeo previously approved these changes Nov 9, 2021
Copy link

@crockeo crockeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code changes look good to me. let's also get a review from cartography owners as discussed over slack.

cartography/intel/github/repos.py Outdated Show resolved Hide resolved
cartography/intel/github/repos.py Outdated Show resolved Hide resolved
crockeo
crockeo previously approved these changes Nov 9, 2021
Copy link

@crockeo crockeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-approving after change and merge

aneeshusa
aneeshusa previously approved these changes Nov 10, 2021
tests/data/github/repos.py Outdated Show resolved Hide resolved
cartography/intel/github/repos.py Outdated Show resolved Hide resolved
Comment on lines +129 to +130
_transform_requirements_txt(repo_object['requirements'], repo_object['url'], transformed_requirements_files)
_transform_setup_cfg_requirements(repo_object['setupCfg'], repo_object['url'], transformed_requirements_files)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a dep is listed in both places (e.g. no bounds in setup.cfg but pinned in requirements.txt)?
What do we want to have happen (e.g. what would be most useful for our query patterns/the https://github.com/lyft/cartography/blob/master/docs/usage/samplequeries.md sample queries)?
This is likely worth a test case.

Copy link
Contributor Author

@olivia-hong olivia-hong Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's listed in both places and has different specifiers for each usage, cartography will create two separate nodes, which I think makes sense rather than any sort of "merging" logic. This allows users to query what version(s) are being used or perhaps find out that they are specifying a dependency in multiple files when it's not needed. Added a test case for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That SGTM, thanks! Just wanted to check that we didn't have one "overwrite" the other

docs/schema/github.md Outdated Show resolved Hide resolved
@olivia-hong olivia-hong dismissed stale reviews from aneeshusa and crockeo via 6a00e95 November 10, 2021 16:31
:param repo_url: str: The URL of the GitHub repo.
:param out_requirements_files: Output array to append transformed results to.
:return: Nothing.
"""
if req_file_contents and req_file_contents.get('text'):
text_contents = req_file_contents['text']
reqs_list = text_contents.split("\n")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer this named the same as the param in the function you call below

@FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg
Copy link
Contributor

Re testing, have you run this on a few setup.cfgs from some of our bigger projects?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good; a few minor questions but good to go.

@olivia-hong
Copy link
Contributor Author

Re testing, have you run this on a few setup.cfgs from some of our bigger projects?

I tested on repos with setup.cfg's that had requirements across install_requires, setup_requires, and extras_require.

@olivia-hong olivia-hong merged commit f42d655 into master Nov 11, 2021
@olivia-hong olivia-hong deleted the oh/add-setup-cfg-dependency-support branch November 11, 2021 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants