Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better reporting of why files are included. #1120

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

cainesi
Copy link
Contributor

@cainesi cainesi commented Jun 22, 2021

Allow better reporting of why files are being included in the frozen application. Still work in progress. Ultimately this code should also allow dynamic links to be handled better on Darwin, and avoid including multiple copies of certain files.

(Having darwintools.py and darwintools2.py is temporary -- when this is all done, we should be able to get rid of the original darwintools.py.)

Adds a "--why-report" option to build_exe command that prints a report of why each file has been included.

@marcelotduarte
Copy link
Owner

Hi Ian. When I see you working on such a big code I get happy and worried. It's very difficult for me to follow your code, test it, see if it's not breaking other things.
I would ask you to follow what is in the contributing docs.
Of course, since you've done a lot, it's not right for me to ask you to send atomic patches, but as little as possible.
For example:

  • darwintools2 can be sent in a PR only with a declaration of its future usefulness.
  • filetracker, do the same.
  • the whyreport option must be a PR that justifies it, which implements it.
  • As for the freezer changes, I need to be more atomic, so I can follow your reasoning and go testing and seeing that nothing has broken.

Furthermore, I am implementing the use of pathlib.Path as a default, because the code when it comes to path/file is hard to read and those normpath/normcase seem like a nightmare.
It's simplifying a lot. As a result, a _norm_path function will be unnecessary, for example.
Another thing I don't know is if I understand the use, to_relative, it must be an idea just like the Path.relative_to.
Another thing is to think more about the code to be good for the macOS part. One of the reasons I wanted to split into 3 Freezers and you did it really well was because the 3 systems have some glaring differences and keeping them together was tricky. So, if you add the filetracker for Linux and Windows (I think I don't understand it yet), it will hinder me to follow my ideas that I have to improve these 2 parts of the freezer.

@cainesi
Copy link
Contributor Author

cainesi commented Jul 24, 2021

Hi Marcelo.

Sorry for the scary PR. It's still in progress, and I've been too busy at work to look at it for the last month or so, but plan to get back to it when I have a chance.

Let me try to explain what is going on, since I think the changes (particularly in freezer.py) are probably less scary then they look at first sight, and I think it should be relatively easy to confirm that the changes do not break the rest of the freeze process.

The key changes are in freezer.py. Those changes are:

  1. Add self._file_tracker to the Freezer. This is a FileTracker object that tracks information about the files that are being copied, and why, so that a report can be printed at the end (if desired). It does not affect anything else in the freezing process.
  2. Replacing all the calls to _copy_file() with calls to _record_file_copy(). _record_file_copy() does two things: calls _copy_file() (i.e., exactly what the code originally did), and also calls self._file_tracker.mark_file(), to record that the file has been copied. (Actually, there is one spot in DarwinFreezer where there is a direct call to _file_copy(), because of some difficulty with how darwintools.py works now. That will hopefully get cleaned-up in a subsequent PR.)
  3. There was one spot (in _write_modules) where files were instead being copied by shutil.copytree(). For that, I created a new function _record_tree_copy() that goes through the directory tree and copies the files one-by-one using _record_file_copy(). I needed to do that so that the FileTracker would know all the individual files being copied.

(freezer.py also has some changes to variable names and to the import statements, which are only there to make things clearer and stop my IDE from reporting warnings--I could put those changes into a separate PR if you want.)

The FileTracker object just collects information and prints a report, so does not have any effect on the rest of the freeze process. darwintools2.py is a simplified version of darwintools.py for use in the FileTracker object, so it also does not affect anything else in the freeze process. (It is not actually necessary to have darwintools2.py at this point, so I could leave it out of this PR for now, and include it in a subsequent PR with some Darwin-specific fixes.)

FYI, my longer-term goal (not in this PR) is to use the FileTracker to resolve dynamically linked libraries on Darwin, which I think would solve some of the issues that have been coming up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants