Skip to content

URS v3.2.1

Compare
Choose a tag to compare
@JosephLai241 JosephLai241 released this 28 Mar 04:53
· 339 commits to master since this release

Release date: March 28, 2021

Summary

  • Structured comments export has been upgraded to include comments of all levels.
    • Structured comments are now the default export format. Exporting to raw format requires including the --raw flag.
  • Tons of metadata has been added to all scrapers. See the Full Changelog section for a full list of attributes that have been added.
  • Credentials.py has been deprecated in favor of .env to avoid hard-coding API credentials.
  • Added more terminal eye candy - Halo has been implemented to spice up the output.

Full Changelog

Added

  • User interface
    • Added Halo to spice up the output while maintaining minimalism.
  • Source code
    • Created a comment Forest and accompanying CommentNode.
      • The Forest contains methods for inserting CommentNodes, including a depth-first search algorithm to do so.
    • Subreddit.py has been refactored and submission metadata has been added to scrape files:
      • "author"
      • "created_utc"
      • "distinguished"
      • "edited"
      • "id"
      • "is_original_content"
      • "is_self"
      • "link_flair_text"
      • "locked"
      • "name"
      • "num_comments"
      • "nsfw"
      • "permalink"
      • "score"
      • "selftext"
      • "spoiler"
      • "stickied"
      • "title"
      • "upvote_ratio"
      • "url"
    • Comments.py has been refactored and submission comments now include the following metadata:
      • "author"
      • "body"
      • "body_html"
      • "created_utc"
      • "distinguished"
      • "edited"
      • "id"
      • "is_submitter"
      • "link_id"
      • "parent_id"
      • "score"
      • "stickied"
    • Major refactor for Redditor.py on top of adding additional metadata.
      • Additional Redditor information has been added to scrape files:
        • "has_verified_email"
        • "icon_img"
        • "subreddit"
        • "trophies"
      • Additional Redditor comment, submission, and multireddit metadata has been added to scrape files:
        • subreddit objects are nested within comment and submission objects and contain the following metadata:
          • "can_assign_link_flair"
          • "can_assign_user_flair"
          • "created_utc"
          • "description"
          • "description_html"
          • "display_name"
          • "id"
          • "name"
          • "nsfw"
          • "public_description"
          • "spoilers_enabled"
          • "subscribers"
          • "user_is_banned"
          • "user_is_moderator"
          • "user_is_subscriber"
        • comment objects will contain the following metadata:
          • "type"
          • "body"
          • "body_html"
          • "created_utc"
          • "distinguished"
          • "edited"
          • "id"
          • "is_submitter"
          • "link_id"
          • "parent_id"
          • "score"
          • "stickied"
          • "submission" - contains additional metadata
          • "subreddit_id"
        • submission objects will contain the following metadata:
          • "type"
          • "author"
          • "created_utc"
          • "distinguished"
          • "edited"
          • "id"
          • "is_original_content"
          • "is_self"
          • "link_flair_text"
          • "locked"
          • "name"
          • "num_comments"
          • "nsfw"
          • "permalink"
          • "score"
          • "selftext"
          • "spoiler"
          • "stickied"
          • "subreddit" - contains additional metadata
          • "title"
          • "upvote_ratio"
          • "url"
        • multireddit objects will contain the following metadata:
          • "can_edit"
          • "copied_from"
          • "created_utc"
          • "description_html"
          • "description_md"
          • "display_name"
          • "name"
          • "nsfw"
          • "subreddits"
          • "visibility"
      • interactions are now sorted in alphabetical order.
    • CLI
      • Flags
        • --raw - Export comments in raw format instead (structure format is the default)
    • Created a new .env file to store API credentials.
  • README
    • Added new bullet point for The Forest Markdown file.
  • Tests
    • Added a new test for the Status class in Global.py.
  • Repository documents
    • Added "The Forest".
      • This Markdown file is just a place where I describe how I implemented the Forest.

Changed

  • User interface
    • Submission comments scraping parameters have changed due to the improvements made in this pull request.
      • Structured comments is now the default format.
        • Users will have to include the new --raw flag to export to raw format.
      • Both structured and raw formats can now scrape all comments from a submission.
  • Source code
    • The submission comments JSON file's structure has been modified to fit the new submission_metadata dictionary. "data" is now a dictionary that contains the submission metadata dictionary and scraped comments list. Comments are now stored in the "comments" field within "data".
    • Exporting Redditor or submission comments to CSV is now forbidden.
      • URS will ignore the --csv flag if it is present while trying to use either scraper.
    • The created_utc field for each Subreddit rule is now converted to readable time.
    • requirements.txt has been updated.
      • As of v1.20.0, numpy has dropped support for Python 3.6, which means Python 3.7+ is required for URS.
        • .travis.yml has been modified to exclude Python 3.6. Added Python 3.9 to test configuration.
        • Note: Older versions of Python can still be used by downgrading to numpy<=1.19.5.
    • Reddit object validation block has been refactored.
      • A new reusable module has been defined at the bottom of Validation.py.
    • Urs.py no longer pulls API credentials from Credentials.py as it is now deprecated.
      • Credentials are now read from the .env file.
    • Minor refactoring within Validation.py to ensure an extra Halo line is not rendered on failed credential validation.
  • README
    • Updated the Comments section to reflect new changes to comments scraper UI.
  • Repository documents
    • Updated How to Get PRAW Credentials.md to reflect new changes.
  • Tests
    • Updated CLI usage and examples tests.
    • Updated c_fname() test because submission comments scrapes now follow a different naming convention.

Deprecated

  • User interface
    • Specifying 0 comments does not only export all comments to raw format anymore. Defaults to structured format.
  • Source code
    • Deprecated many global variables defined in Global.py:
      • eo
      • options
      • s_t
      • analytical_tools
    • Credentials.py has been replaced with the .env file.
    • The LogError.log_login decorator has been deprecated due to the refactor within Validation.py.