Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Follow symlinks and exclude hidden directories #36

Merged
merged 10 commits into from
Jul 11, 2020
Merged

[WIP] Follow symlinks and exclude hidden directories #36

merged 10 commits into from
Jul 11, 2020

Conversation

albertmichaelj
Copy link
Contributor

This pull request adds two things. First, it makes the archive download process follow symlinks. I user symlinks to give the users of my JupyterHub deployment shared folders, and I want them to easily be able to extract all symlinked folders as well. This is the default behavior of using the zip command line utility, so I think it should also be the default behavior for this extension (though I'm not sure about other compression utilities).

The second behavior is more controversial. I exclude hidden directories. I personally think this is improved behavior because I have students use nbgitpuller to pull down content. When they download that folder to their own computer, they don't need the git folders. However, I'm not sure that this should be the default behavior for all possible use cases.

Ideally, I think both of these behaviors should be a setting with the default behavior being follow symlinks and download hidden files and folders. However, while I understand the Python code well, I have no experience with javascript (typescript?), and I can't figure out how to add the settings. If someone could work with me to add the settings component (or give me some directions), I think that would make this pull request make the most sense.

If I had boolean variables follow_symlinks and download_hidden (or similar), my code could very easily be updated in order to respect those settings.

@hadim
Copy link
Contributor

hadim commented Jul 1, 2020

Thanks for the PR.

I am fine having the default behavior excluding hidden directories as long as there is an option to disable it. And same for following symlinks.

That being said I am trying to integrate this extension upstream (see jupyter-server/jupyter_server#251) and it will probably take some time since jupyter_server is still in the early stage.

In the meantime, I don't have a lot of bandwidth to help with this PR. I'm sorry about that.

I am also a Python dev (much more than a JS dev). If you have some time to dedicate to JLab extension development, the doc is very helpful: https://jupyterlab.readthedocs.io/en/stable/developer/extension_dev.html

@albertmichaelj
Copy link
Contributor Author

This makes sense. I have also contributed to the NBGitPuller extension, and in that extension, they set environmental variables to determine settings. Given that these are settings that do not impact the user visible portion, does it make sense to do something similar and just set environmental variables to override the default behavior?

If not, could you confirm my understanding? Is the only way to get the config from the javascript portion to the Python portion via a url argument? If so, I can probably figure out some way to hack the javascript to get variables from the settings into the url that is triggered when the downloadArchiveRequest is called, but there is no guarantee that I will follow best practices. This is literally the first time I've written any javascript at all.

@hadim
Copy link
Contributor

hadim commented Jul 2, 2020

I am not sure env variables are the best way to do that since the jupyter server could be running somewhere else than the machine running the frontend (better ask a jupyter dev for this).

Yes, the only way to send a "config" (or parameters) to the backend from the frontend is using URL arguments. The extension already uses it to send the archive format and a token (see https://github.com/hadim/jupyter-archive/blob/77ee922b5daba406814a5c9f0fca69730f08e9ed/src/index.ts#L54).

If you want to give it a first try maybe you can hardcode the arguments in the JS at the moment and I'll help you add it properly as a JLab setting later:

fullurl.searchParams.append("archiveToken", token(20));
fullurl.searchParams.append("archiveFormat", archiveFormat);
fullurl.searchParams.append("followSymlinks", true);
fullurl.searchParams.append("downloadHidden", false);

Or you can even skip entirely the JS to start with and only work on the backend Python extension adding your logic there (see https://github.com/hadim/jupyter-archive/blob/77ee922b5daba406814a5c9f0fca69730f08e9ed/jupyter_archive/handlers.py#L95).

Good luck!

@albertmichaelj
Copy link
Contributor Author

@hadim Thanks for working with me on this! I'll work on getting a more complete pull request together (though based on how far down the typescript rabbit hole I go, I'm not sure if I will figure out how to pull settings in). I'll at least get it to work with hard coded values, and I'll create schema for the new setting variables.

It may take a day or two to find the time.

@albertmichaelj
Copy link
Contributor Author

Sorry for the delay in getting back to this, but it took me a little longer than I expected to find the time. I think I have everything working correctly. I've created settings where you can turn on or off following symlinks and downloading hidden directories using the Advanced Settings Editor. It behaves exactly as expected.

I also got excited and I updated the test_download in the tests to test for these things. The tests seem to be working correctly.

I made the choice of throwing up a 400 error if a URL tries to pass invalid parameters for followSymlinks and downloadHidden. The only valid parameters are "true" and "false" (the settings editor checks to make sure it's one of those), but if someone hand crafted a URL, they would get a 400 error if they put something else in. In the python code, I set a default value of "true" for followSymlinks and "false" for downloadHidden. These default values are the default if they are not encoded in the URL string. I'm not sure if this is correct or not. If you want to require that the URL string always has these parameters (which it should given that the frontend will always send them with this code), then we can remove the default value.

I am not at all convinced that my code is following best practices, so please feel free to criticize and make suggestions. I am also happy to change whatever to get this included. This functionality is exactly what I've been needing for my use case for JupyterLab, and I'm very happy to push it forward.

@hadim
Copy link
Contributor

hadim commented Jul 11, 2020

Awesome work @albertmichaelj. I reviewed it and I everything looks ok to me. I am fine with the default you've made and the 404 error.

I think it's best to allow the URL args to NOT have those options and use default in that case.

I will merge this and release a new version. The conda package should come soon after that.

FYI: this extension might get integrated into jupyter_server: jupyter-server/jupyter_server#252. I'll try to add those options in it.

Thanks again!

@hadim hadim merged commit 2fa8904 into jupyterlab-contrib:master Jul 11, 2020
@hadim
Copy link
Contributor

hadim commented Jul 11, 2020

And thanks for the tests too! This is very important and useful.

@albertmichaelj
Copy link
Contributor Author

Wonderful! Thanks! Let me know if you need anything from me related to the jupyter_server stuff!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants