Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post parsing/sync microservice #1828

Merged
merged 19 commits into from
Apr 9, 2021
Merged

Post parsing/sync microservice #1828

merged 19 commits into from
Apr 9, 2021

Conversation

c3ho
Copy link
Contributor

@c3ho c3ho commented Feb 25, 2021

Issue This PR Addresses

Fixes #1736

Type of Change

  • Bugfix: Change which fixes an issue
  • New Feature: Change which adds functionality
  • Documentation Update: Change which improves documentation
  • UI: Change which improves UI

Description

Still a WIP. Creates a post parsing microservice from our current code. There's a ton of files and I just want to get this up here for now. I opted to use the JS version of bull-queue, whether that is the correct choice or not I have no idea, but it shouldn't be too hard to make the switch to TS version in a future PR.

I started a small change in processAllFeeds() within the index.js file which will make a GET all user feeds request (that are not flagged) to the User ms.. I'll slowly improve this to do it in a pagination way.

I checked the Post api, there's no PUT routes yet to save all the Post data as per the most recent discussion.

As satellite is waiting for Redis PR to be put in, this container will crash since I can't call Redis() yet. I'll also remove the test route I set up at parserURL/

Lists of tasks for this PR. I'll mark them off (maybe with a commit #) to track progress.

  • Actually test this
  • Write tests
  • Create readme
  • Create Dockerfile
  • Update Feeds to store less info in Redis (need to doublecheck what we need and don't need)

Checklist

  • Quality: This PR builds and passes our npm test and works locally
  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Screenshots: This PR includes screenshots or GIFs of the changes made or an explanation of why it does not (if applicable)
  • Documentation: This PR includes updated/added documentation to user exposed functionality or configuration variables are added/changed or an explanation of why it does not(if applicable)

Copy link
Member

@manekenpix manekenpix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need to rebase. docker-compose-api-production.yml doesn't exist anymore. Now it's production.yml.
Also, make sure you use the env files for port number or anything the microservice needs.

@HyperTHD HyperTHD mentioned this pull request Feb 26, 2021
8 tasks
@humphd
Copy link
Contributor

humphd commented Feb 28, 2021

We need to consider the role of this service within the larger system we're building. We have:

This service should essentially manage the scheduling and processing of Feeds from to Posts. A bunch of the code is already written, but it needs to be evolved a bit. I think we need

  • A Job Queue on top of Redis (we have this)
  • A Job Worker process for extracting parsing jobs from the Job Queue, doing them, and sending results to the Posts service (we have most of this, but it needs to be taught to talk HTTP)
  • A Job Scheduler, something that figures how when to go get Feeds and put them into the Job Queue to be processed again. I'm not 100% clear on the best way to do this. It could be time based, could be event based, ...

We used to use the Wiki as a source of truth, and that's being replaced by the User service.

@c3ho I see that you're using your favourite "Draft" style PR for this, but I think I'm going to need this to be done more collaboratively with the authors of the other services so that we can figure out our boundaries. Probably this one is easiest to write last (after Users and Posts are merged). That should happen for the 1.8 release in a few weeks.

Until then, maybe you can help review those other services with us as they come in, so we are all on the same page.

@rjayroso rjayroso added this to the 1.9 Release milestone Mar 9, 2021
@rjayroso
Copy link
Contributor

rjayroso commented Mar 9, 2021

Based on our triage today regarding the user service and post-service it would make sense that we could land this for 1.9

@HyperTHD
Copy link
Contributor

Posts service has landed, so you can continue working on this service, and move out any Feed related stuff to your service, @c3ho

@c3ho
Copy link
Contributor Author

c3ho commented Mar 16, 2021

@humphd @HyperTHD I don't think I should be managing any Post info here right?

How I think the service should work:

  1. Send a request to User ms to get all feed urls.
  2. Create queue and add feed urls to job list to process feed urls
  3. Process feed urls and send the processed Post info to the Post ms to index in ElasticSearch/Redis
  4. Cache processed feeds in Redis
  5. Rinse and repeat perpetually

In this case, I shouldn't care about how Post is implemented aside from sending parsed blog posts to the Post ms to store/index right? So in this case I am guessing I make a change to articlesToPosts function in processor.js to make
a POST request to the Post ms instead of what it is doing currently?

@humphd
Copy link
Contributor

humphd commented Mar 18, 2021

@c3ho, I'm still not 100% sure of the proper design, but here are my thoughts:

The major thing this service does is maintain a job queue, and co-ordinate the creation of jobs. How we do that is a bit of an open question. We could poll, like we do now. We could use repeatable jobs (but we'd still have to maintain sync with the users/feeds in Firebase). Somehow we have to get the feeds into this queue and process them, and repeat that. So the easiest first method would be to get all the feeds from Firebase, yes. It's probably not smart to "download all" like we do with the wiki, so maybe we iterate through the pages and add the jobs. There might be a more "Firebase" way to do this, not sure.

Next we process the feeds like we do now, essentially, and when we're done, I think we put them directly into Redis/Elastic from this service, vs. involving the Posts service (it's read-only).

We need to keep Redis (cache) and Elastic (search) in sync with what's in Firebase. Our feeds won't live in Redis anymore, we'll just cache posts there (with feed details that a post needs). One way to do this would be to add one more key to Redis to track the current ordered set we're using to index posts: we can be showing the old set in the timeline while we build the new one, then delete the old one and continue the process.

We can iterate on this once the service exists. If it mostly works like we have now, with User service vs. wiki feed list, it will be good.

@c3ho
Copy link
Contributor Author

c3ho commented Mar 18, 2021

@humphd Thanks for the clarification. You're right, let's get the service working first then we'll reiterate / refactor.

@HyperTHD
Copy link
Contributor

@c3ho Are you still interested in working on this PR? We're gonna need this in for 2.0 so someone else can take this issue if you're busy with something else

@c3ho
Copy link
Contributor Author

c3ho commented Mar 23, 2021

I’ll have a reviewable PR for Friday

################################################################################

# Parser Service Port (default is 8888)
PARSER_PORT=8888
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a lot of things running on 8***, let's pick something else, 10000?

config/env.development Show resolved Hide resolved
PARSER_PORT=8888

# Parser Service URL
PARSER_URL=https://localhost/v1/parser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://api.telescope.cdot.systems/v1/parser

PARSER_PORT=8888

# Parser Service URL
PARSER_URL=https://localhost/v1/parser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://dev.api.telescope.cdot.systems/v1/parser

- traefik
- redis
- users
- posts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you'll need posts.

@@ -0,0 +1,10 @@
const path = require('path');
const result = require('dotenv').config({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has been done in the base config (cc @HyperTHD).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this file is not needed anymore and can be removed

},
"homepage": "https://github.com/Seneca-CDOT/telescope#readme",
"dependencies": {
"@senecacdot/satellite": "^1.8.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"^1.x"

@@ -0,0 +1,111 @@
const { Satellite } = require('@senecacdot/satellite');
const { router } = require('bull-board');
// May need in next revision
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was supposed to have line 4 commented out as I wasn't sure if we'll need other routes aside from the bull-board

* @param {Object} feedData - feed data parsed from the wiki feed list.
* Returns Promise<Feed>, with the most appropriate Feed Object to use.
*/
async function updateFeed(feedData) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're doing too much in this file. You should split things up, so that this file just sets up the microservice with Satellite, and all this extra functionality can come from other modules.

const router = Router();

router.get('/', (req, res) => {
res.status(200).send('hi');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems not worth doing. Should we move what you have at /queues to this instead? That is, going to http://localhost/v1/parser would show me the Bull Queues.

This also implies that we need some authentication/authorization (see Satellite, which has middleware you can use for this). We either need to require admin access, or maybe just authenticated users can see it? I'm not sure how sensitive this is.

Copy link
Contributor Author

@c3ho c3ho Mar 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can disregard the whole parser.js file and just do service.router.use('/', bullRouter) in index.js.


parser:
environment:
# For development and testing, the Auth service needs to contact the users
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the Parser service..."

src/api/parser/jest.config.js Show resolved Hide resolved

# dev mode with automatic restarts

npm run dev
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only have a start script above, better change this to start.

* Flags the feed, preventing posts from the feed to be displayed
* Returns a Promise<Boolean>
*/
flag() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stuff will probably need to be changed later to use what's in the Users data vs. what's in Redis. For now, this is fine, but FYI so we don't forget.

src/api/parser/src/lib/queue.js Show resolved Hide resolved
"name": "parser",
"version": "1.0.0",
"description": "A service for parsing posts",
"main": "index.js",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed.


const service = new Satellite();

startParserQueue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's comment this out so that we can land on prod/staging today, and not have it compete with the existing backend that is also running a Bull queue in Redis.

@humphd humphd merged commit e7df2e3 into Seneca-CDOT:master Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a "post parsing/sync" microservice
7 participants