Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError when Downloading Novels from Arabic Sources with Non-ASCII Characters in URLs #2308

Closed
LSXAxeller opened this issue Mar 20, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@LSXAxeller
Copy link

Describe the bug

"When attempting to download a novel from an Arabic source containing Arabic characters in the URL, a UnicodeEncodeError is raised with the message 'ascii' codec can't encode characters in position XX-XX: ordinal not in range(128). This error occurs due to the presence of non-ASCII characters in the URL.

Example novel links causing the issue:

Log:

 ! Error: 'ascii' codec can't encode characters in position 28-32: ordinal not in range(128)
<class 'UnicodeEncodeError'>
File "lncrawl\core\scraper.py", line 306, in get_soup
    response = self.get_response(url, **kwargs)
  File "lncrawl\core\scraper.py", line 201, in get_response
    return self.__process_request(
  File "lncrawl\core\scraper.py", line 107, in __process_request
    kwargs["headers"] = {
  File "lncrawl\core\scraper.py", line 108, in <dictcomp>
    str(k).encode("ascii"): str(v).encode("ascii")

This error originates from attempting to encode non-ASCII characters into ASCII during the scraping process."


Let us know

App source: EXE
App version: 3.5.0
Your OS: Windows 11 23H2 22631.2506

@LSXAxeller LSXAxeller added the bug Something isn't working label Mar 20, 2024
@LSXAxeller LSXAxeller changed the title Fix this bug UnicodeEncodeError when Downloading Novels from Arabic Sources with Non-ASCII Characters in URLs Mar 20, 2024
@LSXAxeller
Copy link
Author

I've resolved the issue by updating the code in lncrawl\core\scraper.py. Specifically, I modified line 108 from:

str(k).encode("ascii"): str(v).encode("ascii")

to:

str(k).encode("utf-8"): str(v).encode("utf-8")

This change ensures that headers are now encoded using UTF-8 instead of ASCII encoding. I'm keeping this issue open for anyone who may encounter a similar error or to investigate further.

@zGadli
Copy link
Contributor

zGadli commented Mar 20, 2024

I'll make a new PR for this no need to keep this issue open.

@zGadli
Copy link
Contributor

zGadli commented Mar 20, 2024

Can you test the change with the links? It doesn't work for me with both links.

@LSXAxeller
Copy link
Author

Can you test the change with the links? It doesn't work for me with both links.

Working fine with me, I forgot to mention that I did the change in the PIP version not exe, since EXE version extracting a new scraper.py on each launch.

C:\Users\RI>python -m lncrawl -s https://kolnovel.com/series/القوس-المحنون-كول/ --format epub
================================================================================
                          [#] Lightnovel Crawler v3.5.0
                  https://github.com/dipu-bd/lightnovel-crawler
--------------------------------------------------------------------------------

-> Press  Ctrl + C  to exit

Retrieving novel info...

[#] القس&المجنون&Kol
24 volumes and 2372 chapters found.
- https://kolnovel.com/series/القوس-المحنون-كول/

? Enter output directory: C:\Users\RI\Lightnovels\Master of Gu - Reverted Insanity
? Which chapters to download? Everything! (2372 chapters)
? 2372 chapters selected Continue
? How many files to generate? Pack everything into a single file
Chapters:   2%|█                                                                   | 37/2372 [00:20<21:06,  1.84item/s]

if the links doesn't work maybe you need vpn, and copy the links directly from issue not after opening in new browser tab since it will redirect to new domain kolnovel.org instead kolnovel.com and ar-novel.com instead arnovel.me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants