Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

245: add image comments index and image comments filter #267

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

JesseLeung97
Copy link
Collaborator

Context

Implements tnc-ca-geo/animl-frontend#245

Add comments filter so users can filter images by the content of comments on the image. Implemented using the $search $text operator in MongoDB. Note: the $search filter needs to be part of the first $match of the pipeline.

https://www.mongodb.com/docs/manual/reference/operator/query/text/#mongodb-query-op.-text

Fixes
Noticed the LSP was showing a warning that await had no effect on MongoPaging.aggregate

Related PRs

animl-frontend

@JesseLeung97 JesseLeung97 added enhancement New feature or request In Progress PR is still in progress and should not be merged labels Sep 17, 2024
@JesseLeung97 JesseLeung97 removed the In Progress PR is still in progress and should not be merged label Sep 18, 2024
Copy link
Member

@nathanielrindlaub nathanielrindlaub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JesseLeung97, save that one comment I have in there about awaiting the Mongo aggregation, this code looks good and it's all behaving well in my tests. That said I started to dig into the docs a bit and I'm a bit concerned about adding a text search index in prod (it seems like it's really expensive and can impact write speed).

I should have done more homework before setting you off on this, but it got me curious about Atlas Search (we do use MongoDB Atlas, so that would be an option). It sounds far more robust and offers a lot more functionality than Mongo DB's $text search, but I'm also concerned and unsure of what the RAM implications are for the Lucene indexes it manages. I'm also concerned about locking us into Atlas Search as it's tied to Atlas and would prevent moving to a self-managed or local MongoDB instance down the road if that was ever desirable. It also just might be overkill for what we need.

At any rate, I think we should pause on this and meet to talk about this and some of the other tasks you're looking to take on. I'd be curious if you have an opinion on Atlas Search and whether you took a look at it while implementing the $text search. Here's one blog post that compares the difference but I'm sure there are more.

I'll reach out via email and try to get a meeting set up.

@@ -124,7 +124,7 @@ export class ImageModel {
hasNext: false,
} as AggregationOutput<ImageSchema>;
}
return await MongoPaging.aggregate(Image.collection, {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JesseLeung97, can you elaborate on the warning you were seeing and where it was coming from? It looks like the MongoPaging.aggregate() function is asynchronous, and this would make sense as it's a DB query, so I do think awaiting it is important.

pipeline.push({
$match: {
$text: {
$search: `"${comments}"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell here if we need to wrap the comments in an additional set of escaped double quotes like in the example in the docs in order for it to search for the whole phrase or not.

@nathanielrindlaub
Copy link
Member

Perhaps not surprisingly, MongoDB pushes for using Atlas Search over $text and $regex in most cases unless you're self-hosting your DB. Their decision framework is helpful, although probably a bit biased.

Having read that I'm now leaning towards Atlas Search slightly because it does more performant and scalable:

When you add a database index in MongoDB, you should consider tradeoffs to write performance in cases where database write performance is important. Search Indexes don’t degrade cluster write performance.

and...

If you have lots of documents, your queries will linearly get slower. In Atlas Search, the inverted index enables fast document retrieval at very large scales.

The main downsides are (a) getting ourselves locked into Atlas and (b) not being able to create Atlas Search Indexes from code (e.g. w/ Mongoose) or infra as code. I'm not totally positive on this last point but I think you have to create indexes via the MongoDB Atlas user interface. That said, our MongoDB Atlas setup is already outside of our Serverless/Cloudformation config files so anyone trying to spin up Animl on their own would have to manually muck around with the AWS console and MongoDB Atlas app anyhow.

There also seem to be some indexing pitfalls with Atlas Search we should be aware of and try to avoid. Setting alerts would be a good idea too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants