Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip is_dirty check from matlab tools #48

Merged
merged 7 commits into from
Aug 14, 2023
Merged

Skip is_dirty check from matlab tools #48

merged 7 commits into from
Aug 14, 2023

Conversation

jeanconn
Copy link
Contributor

@jeanconn jeanconn commented Aug 13, 2023

Description

Skip is_dirty check from matlab tools. The gitpython is_dirty appears to use a lock that touches files in the index. This conflicts with SVN management of the chandra_models repo (and marks the repo as modified for matlab users). That behavior is undesireable. This PR skips the is_dirty check if THERMAL_MODELS_FOR_MATLAB_TOOLS_SW is set.

Fixes #44

Interface impacts

Testing

Unit tests

  • Mac

Independent check of unit tests by [REVIEWER NAME]

  • [PLATFORM]:

Functional tests

Made a new unit test for functional testing.

I directly confirmed in matlab testing that using ska_helpers 0.10.3 and doing 'ska_helpers.chandra_models.get_data("chandra_models/xija/aca/aca_spec.json")'
marks the chandra_models-3.49 directory with an svn red exclamation mark, and doing the same operation with this branch in the pyexec python path does not (leaves it with the svn green check).

@jeanconn jeanconn changed the title WIP: Skip is_dirty check from matlab tools Skip is_dirty check from matlab tools Aug 13, 2023
@jeanconn jeanconn marked this pull request as ready for review August 13, 2023 13:46
@taldcroft
Copy link
Member

This seems generally fine. @jeanconn - do you understand the root cause of the index being touched? I ask because I am unable to reproduce the problem in my Windows VM in ska3. But I do note that my git in that env seems to be not corrupted in the sense that git status from the command line works fine, whereas with a corrupted install that command returns no output for me.

This is more-or-less repeating your earlier point that this might be an opportunity to fix some underlying problems in the MATLAB Ska core installation.

@jeanconn
Copy link
Contributor Author

I thought the root cause of the index being touched is that gitpython is configured such that it is doing the same thing as "git status" and using optional locks and thus does modify the index when is_dirty is used. I found git status did the same thing in #44 (comment) unless --no-optional-locks was configured but I was strictly looking at the svn status not a shasum. As such, I didn't think this was caused by broken git -- I thought this was expected behavior. I'll try to find a non-svn way to show the behavior in windows.

@jeanconn
Copy link
Contributor Author

jeanconn commented Aug 13, 2023

This is just with my windows miniconda git

$ git --version
git version 2.34.1.windows.1

change to thermal models repo

user@DESKTOP-E8FD0KK MINGW64 ~/Documents/MATLAB/FOT_Tools/Thermal_Models/chandra_models-3.49 ((3.49))
$ which git
/c/Users/unksm/miniconda3/Library/bin/git

user@DESKTOP-E8FD0KK MINGW64 ~/Documents/MATLAB/FOT_Tools/Thermal_Models/chandra_models-3.49 ((3.49))
$ shasum .git/index
862f863f233440d6a644745c0cc20fe41b8f9f57 *.git/index

user@DESKTOP-E8FD0KK MINGW64 ~/Documents/MATLAB/FOT_Tools/Thermal_Models/chandra_models-3.49 ((3.49))
$ git status
Not currently on any branch.
nothing to commit, working tree clean

user@DESKTOP-E8FD0KK MINGW64 ~/Documents/MATLAB/FOT_Tools/Thermal_Models/chandra_models-3.49 ((3.49))
$ shasum .git/index
5b731090061f26f1fc3e71b79dbd57a5dda8820d *.git/index

user@DESKTOP-E8FD0KK MINGW64 ~/Documents/MATLAB/FOT_Tools/Thermal_Models/chandra_models-3.49 ((3.49))
$ git status
Not currently on any branch.
nothing to commit, working tree clean

user@DESKTOP-E8FD0KK MINGW64 ~/Documents/MATLAB/FOT_Tools/Thermal_Models/chandra_models-3.49 ((3.49))
$ shasum .git/index
5b731090061f26f1fc3e71b79dbd57a5dda8820d *.git/index

It looks like it is "stable" at that shasum after getting status in this directory, so I suppose for an overall underlying reason for an issue that probably speaks to @jskrist's thought that maybe doing something else before checking in the SVN would make sense "If it is, perhaps @matthewdahmer could make sure the aca model is always run before he commits the thermal models to the SVN repository to resolve this issue.". I don't know what state information the index file stores or what state the chandra_models-3.49 repo is in and it is a binary file so it isn't really trivial to ask the file. If I just fetch a clean chandra_models repo from git and run "git status" on that, there are no changes to the index. I just attached the index of chandra_models-3.49 zipped in this comment.

index.zip

@jeanconn
Copy link
Contributor Author

So I think we want to fix git in the matlab installation such that "git clone" works, but I don't think that fix would fix this.

@jskrist
Copy link
Member

jskrist commented Aug 13, 2023

I'm wondering now if the index file stores a hash of the location of the repo in it. That path would point to the creators path, then get updated when a status is run on the repo? That would explain the behavior I've seen so far.

@jeanconn
Copy link
Contributor Author

I don't see a full path in there just looking for strings in the index (before or after git status).

@jeanconn
Copy link
Contributor Author

From various googlings, looking at https://learn.microsoft.com/en-us/archive/msdn-magazine/2017/august/devops-git-internals-architecture-and-index-files and the text "Next are five 4-byte values (device, inode, mode, user id and group id) of file-attribute metadata related to the host OS." it would not surprise me if git wrote back some data related to user and host into the file with git status and then maybe just doesn't report it in a useful fashion with "git ls-files -s" which is what I'm looking at for git's display of the index.

@taldcroft
Copy link
Member

@jeanconn - one thing that strikes me is your git version of 2.34.1.windows.1. From what I can see the current ska3-matlab should have ska3-core=2023.1, which specifies git version 2.39.1:

https://github.com/sot/skare3/blob/2023.1/pkg_defs/ska3-core/meta.yaml

I'm not sure if this version issue is at all related to index problems, but it seems worth understanding.

@taldcroft
Copy link
Member

I'm wondering now if the index file stores a hash of the location of the repo in it.

Bingo, this looks quite promising since I'm able to reproduce it on Mac and Windows.

$ cd ~/git
$ cp -rp chandra_models chandra_models-copy
$ cd chandra_models-copy
$ ls -l .git/index
$ md5 .git/index
-rw-r--r--  1 aldcroft  staff  6508 Jul 11 10:06 .git/index
MD5 (.git/index) = b3e5f7335cbb528cd0bae3bb863f4e78

$ git status

# same ls and md5 commands now
-rw-r--r--  1 aldcroft  staff  6508 Aug 14 05:47 .git/index
MD5 (.git/index) = 282cea338c0b41f8178cc82292f2dfae

And as @jeanconn noted this appears to be stable after the first update.

@jskrist's thought that maybe doing something else before checking in the SVN would make sense

YES! It looks like a simple git status before the SVN checkin should be sufficient?

@taldcroft
Copy link
Member

And I've been going back and forth about whether we should still merge this PR even when the process issue gets fixed.

On the one hand @jeanconn noted that the SVN checksum should provide visibility into any issue in the chandra_models repo. On the other hand, if the repo really is dirty, then running any flight tools using the dirty repo is extremely bad (hence the original idea of always raising an exception).

Thoughts?

@jeanconn
Copy link
Contributor Author

Regarding "I'm not sure if this version issue is at all related to index problems, but it seems worth understanding." Sorry, I should have used more words. When I said "This is just with my windows miniconda git", I meant, that was with a git in git-bash unrelated to the matlab python environment. So the index issue was reproducible without needing to have the same git as in the matlab python environment (which you can get from svn).

@jeanconn
Copy link
Contributor Author

jeanconn commented Aug 14, 2023

Regarding "YES! It looks like a simple git status before the SVN checkin should be sufficient?", I don't think we've learned that?

IF it looked like we got the same checksum after git status, I'd think that was a possibility, but I haven't seen that we have a matching stable checksum yet. So if the issue is "full path / inode" or "host user" that is the thing saved in the index, then running git status before svn checkin is still going to solve the problem only for the user who does that status and check in.

@jeanconn
Copy link
Contributor Author

jeanconn commented Aug 14, 2023

On the one hand @jeanconn noted that the SVN checksum should provide visibility into any issue in the chandra_models repo. On the other hand, if the repo really is dirty, then running any flight tools using the dirty repo is extremely bad (hence the original idea of always raising an exception).

We've got a little bit of the problem now that we're training the fot users to ignore the SVN warning they are seeing. So that isn't great. Overall I don't see that it is likely that a fot user will have a custom chandra models without being explicit about it.

But if it is a concern, like I said in the other thread, "git status --no-optional-locks" didn't write back to the index for me in my first experiments, but there was no equivalent to that flag for "is_dirty" and is_dirty did not respect the environment variable to do that. So another option would be to do this check using subprocess calling "git status" and parsing its output for example. I don't think it is quite worth it. (update - and I can't find --no-optional-locks in the git-bash git, so maybe it isn't available on windows git anyway?)

Another option would be to update the matlab tools do more than warn on the thermal models svn being modified.

@jskrist
Copy link
Member

jskrist commented Aug 14, 2023

I agree with @jeanconn that this the index file will likely get updated for each individual host/user, even if we run a git status before committing it to SVN. One total "hack" we could potentially do is to:

  1. copy the index file
  2. run the git command
  3. replace the modified index file with the original copy

wrap that series of steps into a helper function and comment it well.

We could also rely on SVN to warn users that the chandra models is modified, instead of doing that check via python, but as @jeanconn mentioned, we are currently training people to ignore this warning, so some retraining would be necessary.

@jeanconn
Copy link
Contributor Author

In our meeting today @taldcroft had the good question -- "but does the PR code work to get the version without updating the index". I hadn't checked the returned version so just went back in to do that and it looks OK.

>> pyexec('model, info = ska_helpers.chandra_models.get_data("chandra_models/xija/aca/aca_spec.json")')
>> pyexec('print(info)')
PYPROC: {'call_args': {'file_path': 'chandra_models/xija/aca/aca_spec.json', 'version': None, 'repo_path': 'None', 'require_latest_version': False, 'timeout': 5, 'read_func': 'None', 'read_func_kwargs': None}, 'version': '3.49', 'commit': '1efcc5806e914d14155ebfe69ccd61df123a5701', 'data_file_path': 'C:\\Users\\unksm\\Documents\\MATLAB\\FOT_Tools\\Thermal_Models\\chandra_models-3.49\\chandra_models\\xija\\aca\\aca_spec.json', 'repo_path': 'C:\\Users\\unksm\\Documents\\MATLAB\\FOT_Tools\\Thermal_Models\\chandra_models-3.49', 'md5': '0f300771293ac64b232f7ed82c73a6e0', 'CHANDRA_MODELS_REPO_DIR': None, 'CHANDRA_MODELS_DEFAULT_VERSION': None, 'THERMAL_MODELS_DIR_FOR_MATLAB_TOOLS_SW': 'C:\\Users\\unksm\\Documents\\MATLAB\\FOT_Tools\\Thermal_Models\\chandra_models-3.49'}
>> 

Copy link
Member

@taldcroft taldcroft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this looks good now. The only thing missing is documentation in the doc string of how the get_data function changes behavior slightly if THERMAL_MODELS_DIR_FOR_MATLAB_TOOLS_SW is defined.

@jeanconn
Copy link
Contributor Author

I was thinking this for docs in get_data. Does this read OK to you @taldcroft ?

    ``THERMAL_MODESL_DIR_FOR_MATLAB_TOOLS_SW`` is used to define the chandra_models repository
    location when running in the MATLAB tools software environment.  If this environment
    variable is set, then the git is_dirty() check of the chandra_models directory is skipped
    as the chandra_models repository is verified via SVN in the MATLAB tools software environment.
    Users in the FOT Matlab tools should exercise caution if using locally-modified files
    for testing, as the version information reported by this function in that case will not
    be correct.

Copy link
Member

@taldcroft taldcroft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a lien on adding the doc update in the comment.

@taldcroft
Copy link
Member

Looks good @jeanconn

@jeanconn jeanconn merged commit 4e495e1 into master Aug 14, 2023
2 checks passed
@jeanconn jeanconn deleted the less-dirty branch August 14, 2023 19:58
@javierggt javierggt mentioned this pull request Sep 6, 2023
@javierggt javierggt mentioned this pull request Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Checking if the git repo is dirty makes the SVN repo dirty on FOT Windows setup
3 participants