Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backend Scraper #87
Backend Scraper #87
Changes from 1 commit
29879c6
56c3e7d
d6ccd83
4f70864
6b3a569
14c7c1b
3ca9d7c
34a99c9
b68663e
91c6899
7166a87
6fb93a4
003bbc8
9563e73
ae3615e
e5369a1
53f1866
337cf32
127f3d7
450614e
651595b
1a4d346
e87151c
de6bd42
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on how you were planning to trigger this job? I had envisioned this working w/o any kind of web server to trigger jobs. For example, using a "cron job" (but the Heroku equivalent) to queue a job every N minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Netlify hook should call this endpoint. Even cron job (which will kick off for periodic batch re-scraping of all organisations' data, for example), will need to call some simplistic web endpoint which just puts a job into a queue.
I don't see how more direct it could be. Push jobs directly to PostgreSQL via a query? graphile/worker supports this, but not pg-boss (at least, it doesn't document this), so it's a slippery slope. But I also don't see this as a problem.
Even with direct SQL query to create a job, you need some basic auth/protection, I suppose, eventually, so you can't really avoid having a thin web layer.
Cron batch processing in Heroku can also be done by starting a dyno, which will do the batch processing and finish. But this won't work for first-time scraping of newly added orgs, there should be a
worker
listening in the background. And since we need this type of worker anyway, it can handle batch scraping, too.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way we usually do this is to run a script within the project that has the job of simply queueing the jobs you want to run. To elaborate, the script would work like this:
$ npm run queue-jobs
The script runs in a couple seconds and is only responsible for adding the right jobs to the queue. The worker does the rest. This avoids needing the web interface, keeps the logic of which orgs to scrape internal to the app, and removes the dependency on Zapier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we can't just use
NODE_ENV
to find out if we're running in production?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it requires more setup actions by the developers locally (
export NODE_ENV=dev
) and Heroku setup (heroku config:set NODE_ENV="production"
), imagining some developer wants to test the deployment in their own free Heroku account (like I do right now, but other developers may want to do this, too).The above line doesn't require anything like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leventov Heroku sets that variable by default, and I would assume we never set it locally. It's standard practice to use
NODE_ENV
in NodeJS apps.