Fixed '[Errno 36] File name too long' issue making it impossible to save comment scrapes with long titles. #19
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Summary
Added
_check_len()
function to theExport.NameFile()
class to ensure generated filenames are not too long (and thus causing an error when trying to write scrapes to files). Added_check_len()
call to Subreddit, Redditor, and comments scraping. These changes should prevent scrapes from failing due to overly long generated filenames.Motivation/Context
When trying to use the comment scrape option, scrapes are automatically written to a file which includes the title of the comment thread in the filename. In a case where a thread title is rather long (140 chars+) it is possible that the overly long filename will cause an error and the scrape will fail to write to the designated file. In a nutshell, when scraping comment threads with long titles you would sit and wait for it to finish only to find out that your data was lost due to a bad filename :).
New Dependencies
None
Issue Fix or Enhancement Request
Not applicable
Type of Change
Breaking Change
Not applicable (I have included some scrape logs for reference anyways)
List All Changes That Have Been Made
_check_len()
function to theExport.NameFile()
class_check_len()
call to all scrape types right before incorrect char validationHow Has This Been Tested?
*Comment url was for a post with a lengthy title (> 140 chars).
+ Ran
python3 Urs.py -c https://www.reddit.com/r/AskReddit/comments/j5jb71/how_do_you_deal_with_an_overly_friendly_neighbor/ 0 --json
.* output:
python [2020-10-05 22:49:19,876] [CRITICAL]: AN ERROR HAS OCCURED WHILE EXPORTING SCRAPED DATA. [2020-10-05 22:49:19,877] [CRITICAL]: [Errno 36] File name too long: '../scrapes/10-05-2020/c-How do you deal with an overly friendly neighbor who asks too many questions about your life when you happen to be outdoors at the same time_-RAW.json'
Test Configuration
Python version: 3.8.2
Running on Linux Mint 20 Ulyana
Dependencies
astroid==2.4.1
attrs==19.3.0
certifi==2020.4.5.1
chardet==3.0.4
colorama==0.4.3
coverage==5.1
idna==2.9
isort==4.3.21
lazy-object-proxy==1.4.3
mccabe==0.6.1
more-itertools==8.3.0
packaging==20.4
pluggy==0.13.1
praw==7.0.0
prawcore==1.3.0
prettytable==0.7.2
py==1.8.1
pylint==2.5.2
pyparsing==2.4.7
pytest==5.4.3
pytest-cov==2.10.0
requests==2.23.0
six==1.14.0
toml==0.10.0
update-checker==0.17
urllib3==1.25.9
wcwidth==0.2.4
websocket-client==0.57.0
wrapt==1.12.1
Checklist
Tip: You can check off items by writing an "x" in the brackets, e.g.
[x]
.