Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make all chats searchable #309

Closed
marceloschmidt opened this issue Jul 16, 2015 · 16 comments
Closed

Make all chats searchable #309

marceloschmidt opened this issue Jul 16, 2015 · 16 comments
Milestone

Comments

@marceloschmidt
Copy link
Member

Maybe we could have this on channels history

@engelgabriel engelgabriel added this to the v1.0 milestone Jul 16, 2015
@rodrigok rodrigok modified the milestones: v1.0, Next Aug 15, 2015
@engelgabriel engelgabriel modified the milestones: v1.0, Next Aug 19, 2015
@gsaslis
Copy link

gsaslis commented Aug 21, 2015

👍
imho this is a must-have feature (in fact, I was surprised to see that the Quick Search in the right menu is actually NOT about searching in chats... : /

don't forget this is also one of the reasons many ppl leave slack... they go over the 10k limit, they then need to search back into their search history and they can't ... ;)

@rodrigok
Copy link
Member

How we can do this with a great performance?

@Sing-Li
Copy link
Member

Sing-Li commented Aug 22, 2015

Some ideas. All with limitations and trade-offs.

https://www.sqlite.org/fts3.html

https://github.com/olivernn/lunr.js + webworkers

A background doc (explores problem domain): https://docs.google.com/document/d/1sAk00RsxZHFgyKomKq_n01rHbvsghS8RkQjulBZ3mpI/edit?pli=1#heading=h.p4r6t7cyneha

Our problem is compounded by:

  • need client side implementation against very limited client side storage quota
  • need some server side implementation and a way to transition between client and server impl
  • server side impl == free web storage per user; and there has to be a limit - slack's 10k is actually quite generous already
  • if we don't do an adequate job for 22+ languages. someone will complain 😏

@gsaslis
Copy link

gsaslis commented Aug 22, 2015

i have to say I know very little about the domain (very interesting doc, thanks for sharing @Sing-Li! ), but i've been hearing really good things about elasticsearch.. including the fact that it's ridiculously easy to setup and get started with..

might be worth a look?

p.s. i was always referring to server-side search

@rodrigok
Copy link
Member

The problem with elastic search is the application setup and data duplication. Mongodb has text search by collection, maybe we can use it and keep the setup easy.

We have plans to add support for postgresql that has text search too.

Em 22 de ago de 2015, às 14:21, Yorgos Saslis notifications@github.com escreveu:

i have to say I know very little about the domain (very interesting doc, thanks for sharing @Sing-Li! ), but i've been hearing really good things about elasticsearch.. including the fact that it's ridiculously easy to setup and get started with..

might be worth a look?

p.s. i was always referring to server-side search


Reply to this email directly or view it on GitHub.

@gsaslis
Copy link

gsaslis commented Aug 23, 2015

hmmm not sure I follow you with regards to 'application setup and data
duplication'... would you care to elaborate on that?

I am not sure if I understand correctly what you mean about using the
built-in text search features in mongo or postgres, but if you're thinking
about using things like like or ilike in postgres for example, I
really, really do not think it would be the optimal solution in the long
term as it's not going to scale very well...

Yorgos Saslis
Software Engineer

On 22 August 2015 at 20:51, Rodrigo Nascimento notifications@github.com
wrote:

The problem with elastic search is the application setup and data
duplication. Mongodb has text search by collection, maybe we can use it and
keep the setup easy.

We have plans to add support for postgresql that has text search too.

Em 22 de ago de 2015, às 14:21, Yorgos Saslis notifications@github.com
escreveu:

i have to say I know very little about the domain (very interesting doc,
thanks for sharing @Sing-Li! ), but i've been hearing really good things
about elasticsearch.. including the fact that it's ridiculously easy to
setup and get started with..

might be worth a look?

p.s. i was always referring to server-side search


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub
#309 (comment)
.

@rodrigok
Copy link
Member

@gsaslis Adding elastic search to the application means 1 more program to setup, keep running, consuming memory and disc (we need to send all data to search to be indexed by elastic search, so we have data duplication).

MongoDB has text index per collection: http://docs.mongodb.org/v3.0/core/index-text/
About PostgreSQL I don't know how it works but I know that full text search exists too http://www.postgresql.org/docs/8.3/static/textsearch.html

With ElasticSearch we can create more powerful searches, probably global searches with 1 query, but we are adding more complexity to app setup and hardware requirements.

IHMO we need both options, but start with the most simple for end users keeping the ability to clone and just run the application with meteor run allowing a good search option. And then we can work in a ElasticSearch integration as well other solutions.

@gsaslis
Copy link

gsaslis commented Aug 24, 2015

@rodrigok agreed on extra maintenance work required. also agreed on data duplication. (as long as you don't replace mongodb with elasticsearch that is, of course.)

Regardless, as mentioned in the document shared by @Sing-Li, search has 3 important challenges: tokenization, selection and ranking. From what little I've read so far, neither pg nor mongo (attempt to) solve all of them, while ES does.. (main difference in particular is ranking, and ES also offers highlighting).

though I appreciate your point about implementing some basic search functionality first, then going into the full-blown solution as a staged approach, I have to say that it feels to me this will lead you down a path of constantly having to 'fix' the 'broken' / non-performant basic search... (i.e. how well will this scale when you have teams with tens of millions of messages? how much time will you need to put into database optimization? performance tuning is a very time consuming task..)
This would be time better spent / saved implementing the ES approach from the start.. It is built to scale and handle large data and search queries volumes.

A final point to consider is how important the search functionality is for rocket.chat users... This is something I don't really know yet (though I personally feel it is, and Slack has it on their home page as the #3 value proposition) but depending on the answer here, the search feature might be more important than the extra setup cost for the sysadmin... ; )

@rodrigok
Copy link
Member

If users don't install the ES we will lost the search functionality that we can do using only the database already in use?

IMHO, if users wants a better search they can install and integrate ES, but I don't think ES as a requirement to have Rocket.Chat running but search is a requirement.

I'm missing the opinion of @marceloschmidt, @sampaiodiego and @engelgabriel in this thread.

@sampaiodiego
Copy link
Member

so, I agree with @rodrigok that we do not have elasticsearch as a requirement to run rocket.chat
I'm also agree with @gsaslis that we need to support ES from the beginning, since it is a good differential.

so, I think we need to design a search abstraction layer.. so the search API will be unique, even using ES or not.. ES will be a "driver" or an option of this API.. this way we can start develop the "native" and ES support together, and support any other "search lib" in the future. =)

@gsaslis
Copy link

gsaslis commented Aug 24, 2015

hmm @sampaiodiego does the abstraction approach really sound suitable for your underlying search infrastructure.. ?

I like your general approach, but don't you think that making the sysadmin (who is setting up rocket.chat for his team) have to make a decision on what search capabilities he will need - during the installation process - is a little worse than a clear instruction to install/setup one or the other solution? : )

I really don't mean to be pushing you guys to adopt ES, so please don't take this the wrong way! : ) I'm simply trying to make a point -- and this is very much related to the long-term goal/vision for rocket.chat ... I don't know that myself, so I'm looking for your guidance here on what would be expected from the users (btw, a user is not the person setting up rocket.chat, right? it's the guys and girls using it everyday) .. ; )

Thank you all for your input and consideration!!!

@Sing-Li
Copy link
Member

Sing-Li commented Aug 24, 2015

@sampaiodiego 👍 provider can be mongo, sql, esearch, client-side (client side is important because my new phone has 8 cores and 4g of ram servicing me only ......our server has 4 shared cores and 2g of ram servicing 10,000 users.....and the trend continues)
@gsaslis Explicit provider config is not an issue when you consider actual usecases. So most dev or casual chat installs only want rough searches back to 'memorable past' ...so esearch is overkill and costly.... this can be our default, most installs are of this nature. But for audited enterprises, or lawyer office installs, search through all of history, including every revision (mod/delete) of every message should be possible ...and esearch might be the best alternative.
The abstration must be simple yet does not fall prey to 'minimum denominator syndrome' - a challenging task. JMHO

@marceloschmidt marceloschmidt removed their assignment Aug 25, 2015
@mitar
Copy link
Contributor

mitar commented Aug 28, 2015

With new Meteor using 2.6 MongoDB, you could just use full text search support of MongoDB. That would be easiest to implement and it would just work without any extra external services.

@engelgabriel
Copy link
Member

@rodrigok implemented just what @mitar said. I believe we can give that a good push and see if it is enough. If it is not, we can add ES later. It can be definitely a feature that can be added after initial installation, there is no problem there. I've seen a lot of companies doing just that. All is needed is a initial script that will read all existing messages to kickstart the index and follow the oplog from there.

To test the current implementation, you just need to start typing on the search area on the right tab.

It is very basic, only searches the current channel. But i'll close the issue and open more specific ones for improvements.

@Cu57arD
Copy link

Cu57arD commented Jul 31, 2017

Anybody willing to upgrade the bounty, I just added $200

https://www.bountysource.com/issues/28998888-global-search-across-channels

@mfisher35
Copy link

https://github.com/mfisher35/rc_search

I made this search api (just does a text search to mongodb and is extremely fast) only issue is it is truly global, users may get results back from channels they are not supposed to see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants