Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slow to get result when use Mongo Database Adapter #747

Closed
dyf6372 opened this issue May 17, 2017 · 11 comments
Closed

slow to get result when use Mongo Database Adapter #747

dyf6372 opened this issue May 17, 2017 · 11 comments

Comments

@dyf6372
Copy link

dyf6372 commented May 17, 2017

mybot = ChatBot('Terminal',
storage_adapter='chatterbot.storage.MongoDatabaseAdapter',
database='db'
)

2017-05-18,00:17:08.864 INFO {input_adapter} [process_input_statement] Recieved input statement: abc
2017-05-18,00:17:08.866 INFO {input_adapter} [process_input_statement] "abc" is not a known statement
2017-05-18,00:17:16.571 INFO {best_match} [process] Using "abc" as a close match to "abcd"

it takes a long time go get best match

@vkosuri
Copy link
Collaborator

vkosuri commented May 17, 2017

An improvement PR is here #738

@vkosuri
Copy link
Collaborator

vkosuri commented May 17, 2017

Please checkout latest changes and experiment let me know your inputs how best we can speedup.

@dyf6372
Copy link
Author

dyf6372 commented May 17, 2017

@vkosuri PR #738 is not work for me.
I checkout the latest changes using Mongo Database Adapter, training about 120000 lines.
I have tried on other machines, still can't get the response in 1s.

[I 170518 01:10:18 input_adapter:22] Recieved input statement: abc
[I 170518 01:10:18 input_adapter:30] "abc" is not a known statement
[I 170518 01:10:20 best_match:51] Using "abc" as a close match to "abcd"

@vkosuri
Copy link
Collaborator

vkosuri commented May 17, 2017

Thanks for your inputs.

@dyf6372
Copy link
Author

dyf6372 commented May 18, 2017

I debug the program and find out that it takes a long time in function get_response_statements in mongodb.py.

def get_response_statements(self):
    """
    Return only statements that are in response to another statement.
    A statement must exist which lists the closest matching statement in the
    in_response_to field. Otherwise, the logic adapter may find a closest
    matching statement that does not have a known response.
    """
    response_query = self.statements.distinct('in_response_to.text')

    _statement_query = {
        'text': {
            '$in': response_query
        }
    }

    _statement_query.update(self.base_query.value())

    statement_query = self.statements.find(_statement_query)

    statement_objects = []

    for statement in list(statement_query):
        statement_objects.append(self.mongo_to_object(statement))

    return statement_objects

response_query is a huge list contains 50000+ elements.

Is there a way to cache the data in memory?

@telkomops
Copy link

issue is with, response_query = self.statements.distinct('in_response_to.text');
Need to find a solution to generate distinct text and add it to a different collection if possible.
fails with ubuntu corpus with 16MB error when invoking the .distinct.
Using the MongoDB aggregation framework and use disk also fails as the collection exceeds 16 MB.
Any solutions?

@gunthercox
Copy link
Owner

@telkomops I'm working on a solution that involves caching and additional filtering. The solution to this issue likely wont be available until the next major release of ChatterBot.

@dyf6372
Copy link
Author

dyf6372 commented Jun 9, 2017

@gunthercox I have cached the result in memory so i don't need to get from MongoDB every time.

@gunthercox
Copy link
Owner

@dyf6372 Database-level caching is a good start. When I mentioned caching I was referring to the various search and comparison algorithms that the chat bot uses to analyse statements and select responses. These often calculate values and these calculations can be costly when it comes to time.

@gunthercox
Copy link
Owner

Closing this as a duplicate of #697

@lock
Copy link

lock bot commented Mar 10, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants