-
-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic Auth not working #1495
Comments
lychee ignores all response headers and robots.txt because we're not indexing the page. The problem must be elsewhere. Can you save the html into a file and use that as the input?
If that works, it's because the website doesn't serve the HTML to lychee. In that case you can try curl as a user agent. If that doesn't work, lychee might have issues passing the URLs from the document. In that case, could you post an expert of the HTML file? |
Hi @mre — thanks for the quick response! LocallyI tried it on the index.html file locally, with the exact command you provided, and it yields:
The errors come from the static html usign the password-protected URLs, so this is the result I was expecting from the local version. cURL user-agentI tried using lychee with the cURL user-agent like described, but it still yields:
But using curl like this works and returns the HTML correctly:
So the way I see it, cURL itself is working, but lychee isn’t, even when using cURL as a user agent. Basic auth problem?I suspect that the basic auth of lychee is the source of the problem. I tested on other staging sites that had basic auth and they all came back empty-handed. The output is the same as when I omit
Other tools workI used linkchecker as an alternative, as it also provides basic-auth functionality, and it works correctly on the same URL:
Either I’m using lychee’s |
Oh, right, I should have read your initial message correctly. Basic auth syntax is actually: We should probably add a documentation page or document the syntax here: https://lychee.cli.rs/troubleshooting/network-errors/ |
@mre I have the exact same issue. I'm using the basic auth param in the right order, tried both with https:// and without. Works fine on sites without auth. |
Indeed. I tried it myself and it doesn't work as advertised; sorry for the inconvenience. Here's what I did:
import http.server
import socketserver
import base64
import os
# Set username and password for basic auth
USERNAME = 'testuser'
PASSWORD = 'testpass'
class BasicAuthHandler(http.server.SimpleHTTPRequestHandler):
def do_GET(self):
# Check for Authorization header
auth_header = self.headers.get('Authorization')
if auth_header is None:
self.send_response(401)
self.send_header('WWW-Authenticate', 'Basic realm="Test realm"')
self.end_headers()
elif auth_header.startswith('Basic '):
# Verify credentials
credentials = base64.b64decode(auth_header[6:]).decode('utf-8')
username, password = credentials.split(':')
if username == USERNAME and password == PASSWORD:
# Serve the requested file
return http.server.SimpleHTTPRequestHandler.do_GET(self)
self.send_response(401)
self.end_headers()
# Create a simple HTML file with links
html_content = """
<!DOCTYPE html>
<html>
<body>
<h1>Test Links</h1>
<ul>
<li><a href="https://www.example.com">Example.com</a></li>
<li><a href="https://www.google.com">Google.com</a></li>
<li><a href="https://www.github.com">GitHub.com</a></li>
</ul>
</body>
</html>
"""
# Write the HTML content to a file
with open('index.html', 'w') as f:
f.write(html_content)
# Set up and start the server
PORT = 8000
Handler = BasicAuthHandler
with socketserver.TCPServer(("", PORT), Handler) as httpd:
print(f"Serving at port {PORT}")
print(f"Username: {USERNAME}")
print(f"Password: {PASSWORD}")
httpd.serve_forever() Then I started the server
and then I ran lychee
I saw an error on the Python server:
curl works as expected
So, something is off. Either it doesn't work at all, or I forgot how to use it. It's strange, because we have tests for it: lychee/lychee-bin/tests/cli.rs Lines 1370 to 1429 in 53d234d
That said, the tests could be better, though. We don't have any negative tests (e.g. when the credentials are not provided) and we also don't check the return code, which should be 200 in case of success and 401 in case of error. |
I found this thread after I tried lychee on multiple of my sites when deploying them to a staging environment and always had problems of no results.
Basic auth is a must in these situations, so I was happy that lychee supports it.
As an additional complication, the CMS we’re using sends an
X-Robots-Tag: none
response header in staging environments, as it is deliberately not supposed to be indexed. Is that something that lychee supports, or does it ignore that header? From the messages in the above thread, I could not find out if robots.txt is ignored at the moment, or not.Right now, I get the following response:
The format I’m using is:
There are about 500 links on that page, I verified with curl that the basic auth is working correctly. It returns the HTML response.
The text was updated successfully, but these errors were encountered: