Skip to content

Latest commit

 

History

History
58 lines (37 loc) · 4.92 KB

MANUAL.md

File metadata and controls

58 lines (37 loc) · 4.92 KB

Velascobot: Manual

Some notes:

  • Scriptorium version: Velasco v4.X (from the "Big Overhaul Update" on 27 Mar, 2019 until the 2nd Overhaul)
    • Recognizable because Readers are Scribes and stored in a big dictionary called the Scriptorium, among others
  • Overhaul 2 version: starting with Velasco v5.0

Updating to Overhaul 2

If you have a Velasco clone or fork from the Scriptorium version, you should follow these steps:

  1. First of all, update all your chat files to CARD=v4 format. You can do this by making a script that imports the Archivist, and then loading and saving all files.
  2. Then, pull the update.
  3. To convert files to the new unescaped UTF-16 encoding (previously the default, escaped UTF-8, was used), edit the get_reader(...) function in the Archivist so it uses load_reader_old(...) instead of load_reader(...).
  4. Make a script that imports the Archivist and calls the update(...) function (it loads and saves all files).
  5. Revert the get_reader(...) edit.

And voilà! You're up to date. Unless you want to switch to the mongodb branch (WIP).

Mechanisms

Markov chains

This bot uses Markov chains of 3 words for message generation. For each 3 consecutive words read, it will store the 3rd one as the word that follows the first 2 combined. This way, whenever it is generating a new sentence, it will always pick at random one of the stored words that follow the last 2 words of the message generated so far, combined.

Storing

The actual messages aren't stored. After they're processed and all the words have been assigned to lists under combinations of 2 words, the message is discarded, and only the dictionary with the lists of "following words" is stored. The words said in a chat may be visible, but from a certain point onwards its impossible to recreate with accuracy the exact messages said in a chat.

The storing action is made sometimes when a configuration value is changed, and whenever the bot sends a message. If the bot crashes, all the words processed from the messages since the last one from Velascobot will be lost. For high period values, this could be a considerable amount, but for small ones this is negligible. Still, the bot is not expected to crash often.

Speaker's Memory

The memory of a Speaker is a small cache of the C most recently modified Readers (where C is set through a flag; default is 20). A modified Reader is one where the metadata was changed through a command, or a new message has been read. When a new Readeris modified that goes over the memory limit, the oldest modified Reader is pushed out and saved into its file.

Reader's Short Term and Long Term Memory

When a message is read, it gets stored in a temporal cache. It will only be processed into the vocabulary Generator when the Reader is asked to generate a new message, or whenever the Reader gets saved into a file. This allows the bot to answer to other recent messages, and not just the last one, when the periodic message is a reply.

File hierarchy

  • Generator is the object class that holds a vocabulary dictionary and can generate new messages
  • Metadata is the object class that holds one chat's configuration flags and other miscellaneous information.
    • Some times the file where the metadata is saved is called a card.
  • Readeris an object class that holds a Metadatainstance and a Generator instance, and is associated with a specific chat.
  • Archivistis the object class that handles persistence: reading and loading from files.
  • Speaker is the object class that handles all (or most of) the functions for the commands that Velasco has
    • Holds a limited set of Readers that it loads and saves through some Archivist functions (borrowed during Speaker initialization).
  • velasco.py is the main file, in charge of starting up the telegram bot itself.

TODO

After managing to get Velasco back to being somewhat usable, I've already stated in the News channel that I will focus on rewriting the code into a different language. Thus, I will add no improvements to the Python version from that point onwards. If you're interested of picking this project up and continue development for Python, here's a few suggestions:

  • The speaker.py is too big. It would be useful to separate it into 2 files, one that has surface command handling, and another one that does all the speech handling (doing checks for restricted and silenced flags, the period, the random chances, ...).
  • For a while now, Telegram allows to download a full chat history in a compressed file. Being able to send the compressed file, making sure that it is a Telegram chat history compressed file, and then unpacking and loading it into the chat's Generator would be cool.
  • The most active chats have files that are too massive to keep in the process' memory. I will probably add a local database in MongoDB to solve that, but it will be a simple local one. Expanding it could be a good idea.