Skip to content

Commit

Permalink
Merge pull request clearlydefined#538 from lumaxis/update-eslint
Browse files Browse the repository at this point in the history
Improve formatting and linting setup
  • Loading branch information
lumaxis authored May 6, 2024
2 parents bcb96f6 + b97c085 commit 473a560
Show file tree
Hide file tree
Showing 102 changed files with 1,301 additions and 1,112 deletions.
18 changes: 0 additions & 18 deletions .eslintrc.json

This file was deleted.

3 changes: 0 additions & 3 deletions .prettierrc

This file was deleted.

7 changes: 7 additions & 0 deletions .prettierrc.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"arrowParens": "avoid",
"printWidth": 120,
"singleQuote": true,
"semi": false,
"trailingComma": "none"
}
3 changes: 3 additions & 0 deletions .vscode/extensions.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"recommendations": ["esbenp.prettier-vscode", "dbaeumer.vscode-eslint"]
}
4 changes: 1 addition & 3 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
// Place your settings in this file to overwrite default and user settings.
{
"jshint.options": {
"esnext": true
},
"editor.folding": false,
"editor.tabSize": 2,
"editor.defaultFormatter": "esbenp.prettier-vscode",
"editor.detectIndentation": false,
"editor.formatOnSave": false,
"editor.formatOnType": true,
Expand Down
20 changes: 10 additions & 10 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,19 @@ In the interest of fostering an open and welcoming environment, we as contributo

Examples of behavior that contributes to creating a positive environment include:

* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
- Using welcoming and inclusive language
- Being respectful of differing viewpoints and experiences
- Gracefully accepting constructive criticism
- Focusing on what is best for the community
- Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

* The use of sexualized language or imagery and unwelcome sexual attention or advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a professional setting
- The use of sexualized language or imagery and unwelcome sexual attention or advances
- Trolling, insulting/derogatory comments, and personal or political attacks
- Public or private harassment
- Publishing others' private information, such as a physical or electronic address, without explicit permission
- Other conduct which could reasonably be considered inappropriate in a professional setting

## Our Responsibilities

Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ The ClearlyDefined project welcomes your suggestions and contributions! Before o

## Contribution Licensing

Most of our code is distributed under the terms of the [MIT license](LICENSE), and when you contribute code that you wrote to our repositories,
Most of our code is distributed under the terms of the [MIT license](LICENSE), and when you contribute code that you wrote to our repositories,
you agree that you are contributing under those same terms. In addition, by submitting your contributions you are indicating that
you have the right to submit those contributions under those terms.

Expand Down
13 changes: 9 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,16 +81,20 @@ Process the source, if any:
The crawler's output is stored for use by the rest of the ClearlyDefined infrastructure -- it is not intended to be used directly by humans. Note that each tool's output is stored separately and the results of processing the component and the component source are also separated.

### <a id="more-on-type"></a>More on `type`

The `type` in the request object typically corresponds to an internal processor in CD.
1. `component` is the most generic type. Internally, it is converted to a `package` or `source` request by the component processor.
2. `package` request is processed by the package processor and is further converted to a request with a specific type (`crate`, `deb`, `gem`, `go`, `maven`, `npm`, `nuget`, `composer`, `pod`, `pypi`). For a `package` typed request, if the mentioned specific binary package type is known, the specific type (e.g. `npm`) can be used (instead of `package`) in the harvest request and skip the conversion step. For example,

1. `component` is the most generic type. Internally, it is converted to a `package` or `source` request by the component processor.
2. `package` request is processed by the package processor and is further converted to a request with a specific type (`crate`, `deb`, `gem`, `go`, `maven`, `npm`, `nuget`, `composer`, `pod`, `pypi`). For a `package` typed request, if the mentioned specific binary package type is known, the specific type (e.g. `npm`) can be used (instead of `package`) in the harvest request and skip the conversion step. For example,

```json
{
"type": "npm",
"url": "cd:/npm/npmjs/-/redie/0.3.0"
}
```
3. `source` requests are processed by the source processor, which subsequently dispatches a `clearlydefined` typed request for the supported source types and other requests (one for each scanning tool). These are the more advanced scenarios where the request type and the coordinate type differ.

3. `source` requests are processed by the source processor, which subsequently dispatches a `clearlydefined` typed request for the supported source types and other requests (one for each scanning tool). These are the more advanced scenarios where the request type and the coordinate type differ.

# Configuration

Expand Down Expand Up @@ -238,7 +242,8 @@ Make sure you started the container with the 5000 port forwarded for this to wor
-X POST \
http://crawler:5000/requests

On windows:
On Windows:

curl -d "{\"type\":\"npm\", \"url\":\"cd:/npm/npmjs/-/redie/0.3.0\"}" -H "Content-Type: application/json" -H "X-token: secret" -X POST http://localhost:5000/requests

Expose dashboard port:
Expand Down
2 changes: 1 addition & 1 deletion dev-scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@ run.foo -- Runs the docker container, killing a previous run if it exists. Hosts

### Extra:

debug.foo -- Does everything run does, but also pauses execution until a debugger is attached. Attach using vscode's profile.
debug.foo -- Does everything run does, but also pauses execution until a debugger is attached. Attach using vscode's profile.
66 changes: 35 additions & 31 deletions docs/rampup.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,58 +5,61 @@ These are suggested steps / tips to get familiar with the codebase:
- Two branches: master/prod correspond to dev/prod

0. Clone the repo
0. Run `npm install`
0. `npm test` to run tests
0. Try `npm audit fix` for a simple contribution
0. Open a PR to master
- AzDo will run clearlydefined.crawler pipeline: npm install / npm test
- After merge, crawler-pipeline will run, builds and pushes to ACR
- Release step: deploys to cdcrawler-dev app service, restarts (dev crawler is still app service)
0. After successful dev deploy, can merge and push to prod branch
- Prod build pipeline will build and push to Docker Hub, no actual deploymen.
1. Run `npm install`
2. `npm test` to run tests
3. Try `npm audit fix` for a simple contribution
4. Open a PR to master
- AzDo will run clearlydefined.crawler pipeline: npm install / npm test
- After merge, crawler-pipeline will run, builds and pushes to ACR
- Release step: deploys to cdcrawler-dev app service, restarts (dev crawler is still app service)
5. After successful dev deploy, can merge and push to prod branch
- Prod build pipeline will build and push to Docker Hub, no actual deploymen.

## Dockerfile
- based on node
- installs scancode/lincesee, installs Ruby (for licensee),
- Sets all env vars
- Npm install with production
- Then starts


- based on node
- installs scancode/lincesee, installs Ruby (for licensee),
- Sets all env vars
- Npm install with production
- Then starts

## Deployment

- Image is pushed to: https://hub.docker.com/r/clearlydefined/crawler
- Webhooks in docker hub for donated crawler resources, signals them to re-pull cthe crawler Docker image
- There are also donated crawler resources that don't have a webhook. These poll, monitor, or pull the image regularly.
- In effect: once crawler is pushed, will be deployed “eventually consistent” not all at once. Some versions of the old crawler and new crawler will be running at the same time.

[Tools repo: run.sh](https://github.com/clearlydefined/tools/blob/master/run.sh)

- Can be used for VM based crawlers
- Cron job that checks for new docker crawler image, if new image: restart crawlers
- Hardcoded # of docker containers, based on vcpu, based on experimentation
- Where doe secrets come from? Not sure, need to investigate
- Cron job that checks for new docker crawler image, if new image: restart crawlers
- Hardcoded # of docker containers, based on vcpu, based on experimentation
- Where doe secrets come from? Not sure, need to investigate

## Local dev

- If you want to run locally, you’ll need to install scancode/licensee on your local machine with paths/etc. Easier to run docker image.
- There is a linux Dockerfile to build a container, that is the target environment
- Look at quick start in [README](/README.md#quick-start)
- Template.env.json has minimal settings: file storage provider, memory incoming queue
- “Queueing work with crawler”: instructions once crawler is running
- Could bring up service and crawler, and send harvest to service
- Easier to work with just crawler, example post message in readme
- Could bring up service and crawler, and send harvest to service
- Easier to work with just crawler, example post message in readme
- See “Build and run docker image locally” in readme, need config file
- Run docker build command
- To get dev config: go to portal: cdcrawler-dev, Settings->Configuration
- Uses docker’s “env-file”, key/value environment vars, different than env.json
- From dev, change *crawler/harvest azblob_container_name, queue prefix, harvests, queue name, to be your own personal names
- From dev, change \*crawler/harvest azblob_container_name, queue prefix, harvests, queue name, to be your own personal names
- Crawler_service_auth_token: the token needed for the harvest queue curl command
- When you use the curl command directly on the crawler, it puts a message on its own queue. You could just copy an existing harvest message from the storage queue, and put on your own named harvest queue

## Code

- Background:
- ghcrawler was used to crawl github and store data
- CD crawler pulled ghcrawler in as a dependency, was then forked/modified in an upstream branch
- Now just a directory: ghcrawler/ with unused upstream code removed, refactored
- ghcrawler was used to crawl github and store data
- CD crawler pulled ghcrawler in as a dependency, was then forked/modified in an upstream branch
- Now just a directory: ghcrawler/ with unused upstream code removed, refactored
- Important directories: providers/, providers/fetch, providers/process
- Map.js: maps package types to code
- First queues is a “component” type, then either queues “package” and/or “source” type
Expand All @@ -69,15 +72,16 @@ This project uses two tools to monitor (and fix) vulnerabilities in this project

### Dependabot

* [Dependabot](https://docs.github.com/en/free-pro-team@latest/github/managing-security-vulnerabilities/about-dependabot-security-updates) is a GitHub Security Feature. It tracks vulnerabilities in several languages including JavaScript.
* When Dependabot detects any vulnerabilities in the [GitHub Advisory Database](https://docs.github.com/en/free-pro-team@latest/github/managing-security-vulnerabilities/browsing-security-vulnerabilities-in-the-github-advisory-database), it sends a notification and may also open a pull request to fix the vulnerability.
* Only project maintainers can see Dependabot alerts
- [Dependabot](https://docs.github.com/en/free-pro-team@latest/github/managing-security-vulnerabilities/about-dependabot-security-updates) is a GitHub Security Feature. It tracks vulnerabilities in several languages including JavaScript.
- When Dependabot detects any vulnerabilities in the [GitHub Advisory Database](https://docs.github.com/en/free-pro-team@latest/github/managing-security-vulnerabilities/browsing-security-vulnerabilities-in-the-github-advisory-database), it sends a notification and may also open a pull request to fix the vulnerability.
- Only project maintainers can see Dependabot alerts

### Snyk
* [Synk Open Source](https://solutions.snyk.io/snyk-academy/open-source) is similar to Dependabot, though not GitHub specific. It also tracks vulnerabilities in dependencies.
* When Synk detects a vulnerability in the [Synk Intel Vulnerability Database](https://snyk.io/product/vulnerability-database/), it also opens a pull request with a fix for the vulnerability.
* Everyone can see pull requests opened by Snyk, but only members of the Clearly Defined organization on Snyk can see details of the vulnerability.
* If you do not have access to the Clearly Defined Snyk organization, reach out to @nellshamrell

- [Synk Open Source](https://solutions.snyk.io/snyk-academy/open-source) is similar to Dependabot, though not GitHub specific. It also tracks vulnerabilities in dependencies.
- When Synk detects a vulnerability in the [Synk Intel Vulnerability Database](https://snyk.io/product/vulnerability-database/), it also opens a pull request with a fix for the vulnerability.
- Everyone can see pull requests opened by Snyk, but only members of the Clearly Defined organization on Snyk can see details of the vulnerability.
- If you do not have access to the Clearly Defined Snyk organization, reach out to @nellshamrell

### Why both?

Expand Down
22 changes: 22 additions & 0 deletions eslint.config.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
const js = require('@eslint/js')
const globals = require('globals')
const eslintConfigPrettier = require('eslint-config-prettier')

module.exports = [
js.configs.recommended,
{
languageOptions: {
globals: {
...globals.node,
...globals.mocha
},
parserOptions: {
sourceType: 'module'
}
},
rules: {
'no-console': 'off'
}
},
eslintConfigPrettier
]
10 changes: 6 additions & 4 deletions ghcrawler/bin/www.js
Original file line number Diff line number Diff line change
Expand Up @@ -100,14 +100,16 @@ function run(service, logger) {
* Event listener for HTTP server 'close' event.
*/
function onClose() {
service.stop()
.then(() => {
service.stop().then(
() => {
console.log('Server closed.')
process.exit(0)
}, error => {
},
error => {
console.error(`Closing server: ${error}`)
process.exit(1)
})
}
)
}

/**
Expand Down
2 changes: 1 addition & 1 deletion ghcrawler/crawlerFactory.js
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ class CrawlerFactory {
}

static createNolock() {
return { lock: () => null, unlock: () => { } }
return { lock: () => null, unlock: () => {} }
}

static createQueues(options, provider = options.provider) {
Expand Down
7 changes: 4 additions & 3 deletions ghcrawler/lib/crawler.js
Original file line number Diff line number Diff line change
Expand Up @@ -280,7 +280,8 @@ class Crawler {
result => {
completedPromises++
debug(
`_completeRequest(${loopName}:${request.toUniqueString()}): completed ${completedPromises} of ${trackedPromises.length
`_completeRequest(${loopName}:${request.toUniqueString()}): completed ${completedPromises} of ${
trackedPromises.length
} promises (${failedPromises} failed)`
)
return result
Expand All @@ -289,7 +290,8 @@ class Crawler {
completedPromises++
failedPromises++
debug(
`_completeRequest(${loopName}:${request.toUniqueString()}): completed ${completedPromises} of ${trackedPromises.length
`_completeRequest(${loopName}:${request.toUniqueString()}): completed ${completedPromises} of ${
trackedPromises.length
} promises (${failedPromises} failed)`
)
throw error
Expand Down Expand Up @@ -512,7 +514,6 @@ class Crawler {
request.outcome = request.outcome || 'Traversed'
}
return request

}

async _logStartEnd(name, request, work) {
Expand Down
3 changes: 1 addition & 2 deletions ghcrawler/lib/crawlerService.js
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,7 @@ class CrawlerService {
}

stop() {
return this.ensureLoops(0)
.then(() => this.crawler.done())
return this.ensureLoops(0).then(() => this.crawler.done())
}

queues() {
Expand Down
5 changes: 1 addition & 4 deletions ghcrawler/lib/traversalPolicy.js
Original file line number Diff line number Diff line change
Expand Up @@ -87,10 +87,7 @@ class TraversalPolicy {
}

static _hasExpired(processedAt, expiration = 0, unit = 'hours') {
return (
!processedAt ||
DateTime.now().diff(DateTime.fromISO(processedAt), unit)[unit] > expiration
)
return !processedAt || DateTime.now().diff(DateTime.fromISO(processedAt), unit)[unit] > expiration
}
/**
* A policy spec has the following form: <policyName>[:<[scenario/]mapName[@path]]. That means a spec can be just
Expand Down
2 changes: 1 addition & 1 deletion ghcrawler/middleware/sendHelper.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
const htmlencode = require('htmlencode').htmlEncode

function create() {
return function(request, response, next) {
return function (request, response, next) {
response.helpers = response.helpers || {}
response.helpers.send = {
context: {
Expand Down
11 changes: 5 additions & 6 deletions ghcrawler/providers/queuing/attenuatedQueue.js
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,11 @@ class AttenuatedQueue extends NestedQueue {
}

done(request) {
return super.done(request)
.then(() => {
const key = this._getCacheKey(request)
const deleted = memoryCache.del(key)
if (deleted) this.logger.verbose(`Deleted ${key}`)
})
return super.done(request).then(() => {
const key = this._getCacheKey(request)
const deleted = memoryCache.del(key)
if (deleted) this.logger.verbose(`Deleted ${key}`)
})
}

push(requests) {
Expand Down
2 changes: 1 addition & 1 deletion ghcrawler/providers/queuing/queueSet.js
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ class QueueSet {
}

async _pop(queue, request = null) {
const result = request || await queue.pop()
const result = request || (await queue.pop())
if (result && !result._originQueue) {
result._originQueue = queue
}
Expand Down
Loading

0 comments on commit 473a560

Please sign in to comment.