Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: adapter new html format on nodejs.org/dist #728

Merged
merged 1 commit into from
Nov 14, 2024
Merged

Conversation

fengmk2
Copy link
Member

@fengmk2 fengmk2 commented Nov 14, 2024

Summary by CodeRabbit

  • New Features

    • Enhanced file fetching capabilities with updated logic to handle a broader range of file formats and attributes.
    • Introduced a new index HTML page for Node.js distribution version 18.15.0, listing downloadable files with metadata.
  • Bug Fixes

    • Improved regex for matching HTML anchor tags to accurately capture additional file types and structures.
  • Tests

    • Added new test cases for the fetch() method to verify functionality against the Node.js distribution version 18.15.0.

Copy link
Contributor

coderabbitai bot commented Nov 14, 2024

Walkthrough

The changes involve modifications to the NodeBinary class's fetch method to enhance its ability to handle varying HTML formats and file attributes. A new test case has been added to validate the functionality of fetching data from a specific Node.js distribution subdirectory. Additionally, a new HTML index file for Node.js version 18.15.0 has been introduced, providing a structured list of downloadable files with associated metadata.

Changes

File Change Summary
app/common/adapter/binary/NodeBinary.ts Updated the fetch method's regex for HTML anchor tags, modified variable declarations and logic for handling directories and file attributes.
test/common/adapter/binary/NodeBinary.test.ts Added a new test case for the fetch() method to verify functionality with the Node.js distribution version 18.15.0 subdirectory.
test/fixtures/nodejs.org/site/v18.15.0/index.html Created a new HTML file as an index for the Node.js distribution version 18.15.0, including links to various downloadable files.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant NodeBinary
    participant HTTPClient
    participant HTMLParser

    User->>NodeBinary: fetch()
    NodeBinary->>HTTPClient: Request HTML from Node.js distribution
    HTTPClient-->>NodeBinary: HTML Response
    NodeBinary->>HTMLParser: Parse HTML for file links
    HTMLParser-->>NodeBinary: Extracted file data
    NodeBinary-->>User: Return file data
Loading

🐰 "In the land of code where bunnies hop,
New regex and tests, we never stop!
With files galore and links to find,
Our fetch method's now one of a kind!
So let’s celebrate with a joyful cheer,
For version eighteen point fifteen is here!" 🐇


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

codecov bot commented Nov 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.46%. Comparing base (d6f0e1d) to head (a9e6b9f).
Report is 8 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #728      +/-   ##
==========================================
- Coverage   96.46%   96.46%   -0.01%     
==========================================
  Files         188      188              
  Lines       18803    18879      +76     
  Branches     2463     2481      +18     
==========================================
+ Hits        18139    18211      +72     
- Misses        664      668       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (4)
app/common/adapter/binary/NodeBinary.ts (3)

25-32: Improve format documentation clarity

The comments showing HTML format examples use a future date (2024). Consider:

  1. Using actual dates from nodejs.org
  2. Structuring the comments to clearly separate old vs new formats
  3. Adding JSDoc-style documentation explaining the format differences

37-42: LGTM! Consider adding type annotation

The directory handling logic is correct. The use of basename properly handles both relative and absolute paths.

Consider adding a type annotation for clarity:

-      let name = m[1];
+      let name: string = m[1];

42-46: Consider parsing size values with units

The size handling could be improved to parse values with units (KB, B) into a standardized format.

Consider adding a helper function:

function parseSize(sizeStr: string): number | null {
  if (sizeStr === '-') return null;
  const match = sizeStr.trim().match(/^([\d.]+)\s*([KMG]B|B)?$/i);
  if (!match) return null;
  const [, value, unit] = match;
  const baseValue = parseFloat(value);
  const multipliers = { B: 1, KB: 1024, MB: 1024 * 1024, GB: 1024 * 1024 * 1024 };
  return baseValue * (unit ? multipliers[unit.toUpperCase()] : 1);
}
test/common/adapter/binary/NodeBinary.test.ts (1)

87-103: Maintain consistent assertion style across tests

The new test uses assert.equal() while other tests use the assert() with direct comparison. Consider updating the assertions to match the existing style for consistency:

-          assert.equal(item.date, '-');
-          assert.equal(item.isDir, true);
-          assert.equal(item.size, '-');
+          assert(item.date === '-');
+          assert(item.isDir === true);
+          assert(item.size === '-');
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ac4709a and a9e6b9f.

📒 Files selected for processing (3)
  • app/common/adapter/binary/NodeBinary.ts (2 hunks)
  • test/common/adapter/binary/NodeBinary.test.ts (1 hunks)
  • test/fixtures/nodejs.org/site/v18.15.0/index.html (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • test/fixtures/nodejs.org/site/v18.15.0/index.html
🔇 Additional comments (2)
app/common/adapter/binary/NodeBinary.ts (2)

49-57: LGTM! Clean implementation

The item construction is well-structured and properly handles optional configuration.


33-33: Verify regex pattern robustness

The regex pattern handles both old and new formats, but let's verify it handles all edge cases correctly.

✅ Verification successful

Regex pattern is correctly handling all edge cases

The regex pattern has been thoroughly verified through the test cases which cover:

  • Both old and new HTML formats from nodejs.org
  • Directory entries with "-" size and various date formats
  • File entries with different size formats (e.g., "3.2 KB", "20416228")
  • Various date formats including "-" and "DD-MMM-YYYY HH:MM"
  • Both relative ("/dist/v18.15.0/file") and direct ("file") href formats
  • Zero-size file exclusion

The test suite demonstrates the pattern successfully parses:

  • Root directory listings
  • Version subdirectories (v16, v18)
  • Nightly build directories
  • Different file types (docs, binaries, checksums)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Test regex pattern against actual nodejs.org directory listings
# to ensure it handles all possible formats correctly.

# Fetch a sample of actual HTML content from different Node.js versions
curl -s "https://nodejs.org/dist/latest/" > latest.html
curl -s "https://nodejs.org/dist/latest-v18.x/" > v18.html
curl -s "https://nodejs.org/dist/latest-v16.x/" > v16.html

# Test the regex pattern against actual content
cat latest.html v18.html v16.html | grep -P '<a href="[^\"]+?"[^>]*?>[^<]+?</a>\s+?((?:[\w\-]+? \w{2}\:\d{2})|\-)\s+?([\d\.\-\s\w]+)'

# Clean up
rm latest.html v18.html v16.html

Length of output: 308


Script:

#!/bin/bash
# Let's analyze the regex pattern usage in the codebase to understand its context and test cases

# First, find where this regex is used
rg -l "re = /<a href=\"" --type ts

# Then look for any test files that might validate this regex
fd "test|spec" -e ts -e js | xargs rg "NodeBinary|href"

# Look for the complete implementation to understand the parsing logic
ast-grep --pattern 'const re = /<a href=' -A 10

Length of output: 2044


Script:

#!/bin/bash
# Let's examine the test file to understand the expected behavior and test cases
cat test/common/adapter/binary/NodeBinary.test.ts

# Also get the full implementation context
cat app/common/adapter/binary/NodeBinary.ts

Length of output: 14300

matchDir = true;
}
if (item.name === 'SHASUMS256.txt') {
assert.equal(item.date, '04-Nov-2024 17:29');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Test fixtures contain future dates that need updating

The future dates are coming from the test fixtures file (test/fixtures/nodejs.org/site/v18.15.0/index.html). These dates are used to mock HTTP responses in tests. While the tests will pass today, they will start failing when we reach and pass these dates (October/November 2024).

Recommendations:

  • Update the fixture file to use past dates (e.g., March 2023)
  • Update the corresponding test assertions in NodeBinary.test.ts
  • Consider using relative dates or date mocking utilities for more maintainable tests
🔗 Analysis chain

Verify test dates: Future dates detected

The test uses dates from November and October 2024, which are in the future. This could cause test failures when these dates become past dates. Consider:

  1. Verifying if these dates are correct
  2. Using past dates for more reliable testing
  3. Using date assertions that are resilient to time passage

Also applies to: 100-100

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if the dates in the test match the actual nodejs.org distribution dates

# Fetch the actual page to verify dates
curl -sI "https://nodejs.org/dist/v18.15.0/SHASUMS256.txt" | grep "Last-Modified:"
curl -sI "https://nodejs.org/dist/v18.15.0/node-v18.15.0-win-x64.zip" | grep "Last-Modified:"

Length of output: 172


Script:

#!/bin/bash
# Let's examine the test file to understand the context of these dates
cat test/common/adapter/binary/NodeBinary.test.ts

Length of output: 11115


Script:

#!/bin/bash
# Let's check the fixture file to understand if these dates are part of test fixtures
cat test/fixtures/nodejs.org/site/v18.15.0/index.html

Length of output: 5069

if (item.name === 'SHASUMS256.txt') {
assert.equal(item.date, '04-Nov-2024 17:29');
assert.equal(item.isDir, false);
assert.equal(item.size, '3.2 KB');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

🛠️ Refactor suggestion

Test coverage for size format parsing is essential

Based on the implementation and test fixtures, the review comment is valid. The NodeBinary implementation directly uses the size string from the HTML without proper parsing or validation, while the test now expects formatted sizes ("3.2 KB", "29 MB"). This change needs proper test coverage as:

  1. The implementation in NodeBinary.ts simply uses m[3].trim() without any size format parsing
  2. The nodejs.org fixture shows various size formats (B, KB, MB) that need to be handled
  3. Current tests don't verify edge cases in size parsing

Key areas needing test coverage:

  • Various size units (B, KB, MB, GB)
  • Decimal places handling (e.g., "3.2 KB", "8.6 MB")
  • Edge cases (e.g., "478 KB", "310 B")
  • Invalid formats
🔗 Analysis chain

Add test coverage for size format parsing

The test introduces human-readable sizes ("3.2 KB", "29 MB") while other tests use byte counts. This suggests a change in the NodeBinary.fetch() implementation to handle formatted sizes. Consider adding test cases for:

  1. Edge cases in size formatting (e.g., "1.0 KB", "0.9 MB")
  2. Various size units (B, KB, MB, GB)
  3. Invalid size formats

Also applies to: 102-102

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if the size format change is consistent across the nodejs.org distribution site

# Search for different size format patterns in the HTML files
echo "Searching for size formats in HTML files..."
fd -e html -x rg -l '\d+(\.\d+)?\s*(B|KB|MB|GB)' {} \;

# Check if any files still use byte counts
echo "Checking for byte count formats..."
fd -e html -x rg -l '\d{4,}(?!\s*[KMGT]?B)' {} \;

Length of output: 3464


Script:

#!/bin/bash
# Let's check the implementation and test files to understand the size format handling

# Search for size-related code in NodeBinary implementation
echo "Searching for size handling in NodeBinary implementation..."
ast-grep --pattern 'size'

# Check all test files for size assertions
echo "Checking test files for size assertions..."
fd -e test.ts -x rg -A 2 'assert.*size' {} \;

# Look for size format parsing implementation
echo "Looking for size format parsing implementation..."
rg -l "size.*format|format.*size|parseSize|formatSize"

Length of output: 13444


Script:

#!/bin/bash
# Let's check the NodeBinary implementation for size formatting logic

# Check NodeBinary implementation
echo "Checking NodeBinary implementation..."
rg -A 10 'size.*=.*m\[3\]' app/common/adapter/binary/NodeBinary.ts

# Check the test fixture to understand the format change
echo "Checking nodejs.org test fixture..."
cat test/fixtures/nodejs.org/site/v18.15.0/index.html

Length of output: 5622

@fengmk2 fengmk2 merged commit 914b59c into master Nov 14, 2024
12 of 13 checks passed
@fengmk2 fengmk2 deleted the adapter-html branch November 14, 2024 15:13
fengmk2 pushed a commit that referenced this pull request Nov 14, 2024
[skip ci]

## [3.67.1](v3.67.0...v3.67.1) (2024-11-14)

### Bug Fixes

* adapter new html format on nodejs.org/dist ([#728](#728)) ([914b59c](914b59c))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant