Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperlink in markdown cell to pdf document stopped working #3652

Open
kdeleeuw11 opened this issue May 31, 2018 · 16 comments
Open

Hyperlink in markdown cell to pdf document stopped working #3652

kdeleeuw11 opened this issue May 31, 2018 · 16 comments

Comments

@kdeleeuw11
Copy link

I have a large number of Jupyter Notebooks and in many of them I have hyperlinks to locally stored pdf documents. Today on my iMac the links stopped working. When clicking on a link, a new tab is opened with the proper address, but the page is just black. When I do this on my MacBook with exactly the same Jupyter Notebook, it works ok. Up to yesterday I had no problems. I have tried a number of things to resolve this, amongst others I normally work with Google Chrome, but I switched to Safari and had the same problem. When opening the pdf in either Chrome or Safari from Finder, it works fine. So it looks like Jupyter Notebook issue. When executing the hyperlink in the notebook, I get the following entry in the log file:
[I 21:56:01.222 NotebookApp] 302 GET /notebooks/Cookbooks/Git%20%26%20GitHub/books/Pro_Git.pdf (::1) 1.01ms

I get the same entry on MacBook where it works ok.

A screenshot of the page after trying to load the pdf is attached
screen shot 2018-05-31 at 10 03 17 pm

@takluyver
Copy link
Member

Any messages in the browser's Javascript console?

@kdeleeuw11
Copy link
Author

I found this in the Javascript console:
Failed to load 'http://localhost:8888/files/Cookbooks/Git%20%26%20GitHub/books/Pro_Git.pdf' as a plugin, because the frame into which the plugin is loading is sandboxed.

This must be the cause of the problem. I have no idea how to address this. Can you help?

@bryango
Copy link

bryango commented Aug 8, 2018

Same issue! No idea what's happening... Tried launching another simple HTTP server, PDF links worked just fine there, so it shouldn't be a browser issue. PDF.js extension (firefox) works fine though.

jupyter-troubleshoot attached:
jupyter-troubleshoot.log

@bryango
Copy link

bryango commented Aug 14, 2018

@takluyver I have zero experience in web development, but after some googling, I believe it's some kind of cross origin request issue... This PR: #3341 seems to be related?
@kdeleeuw11 Have you found any solution to this? PDF documents really matters to me too.

@kdeleeuw11
Copy link
Author

I got it to work in Google Chrome by installing the PDF Viewer extension. I am not very technical and I have no idea why it initially stopped working in Google Chrome and Safari. But at least I have it working again. Google Chrome is my default browser.

@bryango
Copy link

bryango commented Oct 6, 2018

@takluyver Now I'm confident that this issue is indeed caused by #3341. After manually remove the lines included in #3341 from my conda installation ([...]/anaconda3/lib/python3.7/site-packages/notebook), my pdf links work perfectly again.

FYI, These are the lines I removed:

Subject: [PATCH] UN-patch #3341

---
 base/handlers.py  | 7 -------
 files/handlers.py | 7 -------
 2 files changed, 14 deletions(-)

diff --git a/base/handlers.py b/base/handlers.py
index e3fbddc..72677c9 100644
--- a/base/handlers.py
+++ b/base/handlers.py
@@ -640,13 +640,6 @@ class Template404(IPythonHandler):
 class AuthenticatedFileHandler(IPythonHandler, web.StaticFileHandler):
     """static files should only be accessible when logged in"""
 
-    @property
-    def content_security_policy(self):
-        # In case we're serving HTML/SVG, confine any Javascript to a unique
-        # origin so it can't interact with the notebook server.
-        return super(AuthenticatedFileHandler, self).content_security_policy + \
-                "; sandbox allow-scripts"
-
     @web.authenticated
     def get(self, path):
         if os.path.splitext(path)[1] == '.ipynb' or self.get_argument("download", False):
diff --git a/files/handlers.py b/files/handlers.py
index 7973fd6..b942149 100644
--- a/files/handlers.py
+++ b/files/handlers.py
@@ -26,13 +26,6 @@ class FilesHandler(IPythonHandler):
     a subclass of StaticFileHandler.
     """
 
-    @property
-    def content_security_policy(self):
-        # In case we're serving HTML/SVG, confine any Javascript to a unique
-        # origin so it can't interact with the notebook server.
-        return super(FilesHandler, self).content_security_policy + \
-               "; sandbox allow-scripts"
-
     @web.authenticated
     def head(self, path):
         self.get(path, include_body=False)
-- 
2.18.0

@takluyver
Copy link
Member

This works correctly for me in Firefox, but fails in Chromium with the error Failed to load 'http://localhost:8889/(...).pdf' as a plugin, because the frame into which the plugin is loading is sandboxed.

It is sandboxed, and quite deliberately so. And you're right that #3341 is where the sandboxing was introduced. This is a security measure, so we can't just disable it again. If you're interested, I'd suggest someone research what relaxations of the sandbox would be needed to let Chrome display a PDF.

CSP sandboxing docs: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy/sandbox

@matanox
Copy link

matanox commented Dec 24, 2018

I think this is also the case when trying to display a PDF inline in a notebook a la

from IPython.display import IFrame
IFrame("foo.pdf", width=900, height=800)

Could be nice if this worked again even in Chrome.

@bryango
Copy link

bryango commented Dec 24, 2018

This works correctly for me in Firefox, but fails in Chromium with the error Failed to load 'http://localhost:8889/(...).pdf' as a plugin, because the frame into which the plugin is loading is sandboxed.

It is sandboxed, and quite deliberately so. And you're right that #3341 is where the sandboxing was introduced. This is a security measure, so we can't just disable it again. If you're interested, I'd suggest someone research what relaxations of the sandbox would be needed to let Chrome display a PDF.

CSP sandboxing docs: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy/sandbox

@takluyver I suppose that as a security measure this is somehow meaningful, but since we already allow the kernel (e.g. python) to do anything to the filesystem, isn't it sort of pointless to have this kind of sandboxing? 😜

I do hope this bug can be resolved sooner. Sometimes PDF.js extension feels too clumsy for me... Unfortunately I don't have the necessary expertise to contribute, but I was able to (kind of?) circumvent this by reading the PDF as binary from the python kernel, then embedding it with a server side PDF.js engine - which is even clumsier, but at least I don't have to ask every one of my collaborators to install a PDF.js extension. 😉

@matanster If you really want PDF in your ipynb, you can try something like this. 😂

@kav2k
Copy link

kav2k commented Feb 11, 2019

(previous post was wrong and was deleted)

Relevant Chromium bug: https://bugs.chromium.org/p/chromium/issues/detail?id=413851
Note that it's currently WontFix.

It boils down to "there's nothing in the standard to allow plugins to operate in sandbox; there's no allow-plugins rule".

It seems like Chrome and Firefox take different approaches to handling this. Chrome just straight up disallows it.

@takluyver
Copy link
Member

since we already allow the kernel (e.g. python) to do anything to the filesystem, isn't it sort of pointless to have this kind of sandboxing? 😜

The model we've got is that code you deliberately run can do anything (within the context of where the kernel runs), but opening a file should never be able to execute arbitrary code on your system. People don't expect that opening a document (whether that's a notebook, an HTML page, or a PDF) can start running code outside a sandbox. See also: word macro viruses.

The technical implication of this is that any pages served by the notebook server where we don't entirely control the content must either be sandboxed (so they can't talk to kernels) or sanitised (so they can't run Javascript).

We sanitise untrusted notebooks, because the notebook page has to be able to talk to the kernel. But sanitisation is tricky, edge cases can be missed (we had a CVE because of an interaction between our sanitisation engine and jQuery), and it breaks a lot of rich content. So we sandbox when serving (non-notebook) files - they can run Javascript, but the browser's cross-origin security mechanisms stop them talking to kernels.

@zhyiyu
Copy link

zhyiyu commented Jan 29, 2021

I got it to work in Google Chrome by installing the PDF Viewer extension. I am not very technical and I have no idea why it initially stopped working in Google Chrome and Safari. But at least I have it working again. Google Chrome is my default browser.

The error message I got (for Google Chrome) is ERR_BLOCKED_BY_CLIENT.

I installed an extension called PDF Viewer and it now works (though not perfect).

Hope this issue can be fixed soon.

@meeseeksmachine
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/what-is-the-best-method-of-importing-pdf-files-into-a-notebook/8324/1

@jliu1999
Copy link

I got it to work in Google Chrome by installing the PDF Viewer extension. I am not very technical and I have no idea why it initially stopped working in Google Chrome and Safari. But at least I have it working again. Google Chrome is my default browser.

The error message in my case (Google Chrome) is ERR_BLOCKED_BY_CLIENT, I installed an extension called PDF Viewer and it now works. The only discomfort I have is that you can not refresh on the PDF viewing page.

Hope this issue can be fixed soon.

Thanks, it's working now.

@gdbassett
Copy link

I have a similar problem.

I generate HTML files using an airflow (work automation) workflow. Those HTML files I access through jupyter with the goal of triggering additional workflows by API. Unfortunately, the sandbox prevents this.

Would it be possible to get a 'trust' button on HTML files as well to remove the sandbox?

@gdbassett
Copy link

This also creates problems with linking back to other files on the jupyter server because the request can't carry the auth tokens through and the auth isn't allowed in the iframe. A 'trust' button to remove the iframe and sandbox would be very helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants