Skip to content

Conserver Architecture

Thomas Howe edited this page Nov 30, 2022 · 2 revisions

The conserver is a data platform designed to extract conversations from business phone systems, transform them into actionable insights, and send that data into common business tools such as spreadsheets, Salesforce and no code toolsets. An open core product, the conserver enables data engineering teams to supply a reliable source of information for AI, ML and operational software in cloud, premise and hybrid contexts. The core for many of the business cases enabled by the conserver is the smart capture, redaction and lifecycle management of recorded customer conversations and customer journeys, recently accelerated by FTC and GDPR regulations and by increasing investments into AI and ML.

From a system perspective, shown above, the Conserver attaches to information systems like Web Chat and call center queues, and extracts information from them after conversations are ended. This information is then cleaned and transformed into actionable data. For instance, a distributed call center might extract conversations from a group of sales agents, convert them into text, then filter those conversations looking for times when customers objected to a sale. These objections are then pushed into database tables and Google Sheets as a data self-service option for any business team. The conserver supports multiple data pipelines, each one extracting data from a number of systems, performing transformations such as translations, transcriptions and redactions, and then pushing the prepared data into applications to be used.

In contrast to other data platforms, the Conserver is dedicated to managing the particular complexities of real time conversational sources. For instance, the amount of bandwidth and storage required to manage an hour long audio recording is an order of magnitude larger than managing a typical business object like a PDF. However, even this is just a start. Video is a few orders of magnitude greater than that, and the data creation for service providers such as Zoom and Skype are magnitudes of order still greater. From a legal perspective, regulatory compliance for customer data protections are particular for recorded conversations, and require support for tracking data’s use by automations, and for tracking deletion from a “Right to be Forgotten” request.

Tech Stack

The Conserver is built off of two core platforms: a python API framework FASTAPI, and a REDIS real time database. The conserver itself is written in Python, and uses the standard vCon Python library to create and modify vCons.

REDIS is responsible for storing the conversations, while FAST API coordinates the application software that manages them. Each conversation is stored as a REDIS JSON object in the standard vCon format. In practice, each vCon is stored in REDIS by the UUID of the vCon, making them easy to discover and fast to process. Instead of copying the conversation as it’s built and transformed, it stays stored in REDIS, and the ID to the vCon is passed, optimizing processing efficiency even at very large data sizes. REDIS also provides inter task communication using a series of PUB/SUB channels, coordinating the activities of the conserver for both local software (that inside the conserver itself) but also for external software such as Lambdas or exporting onto other systems like Apache Kafka. Also, third party and hardware enabled systems can use REDIS as a data interchange system, loading and unloading large media files in coordination with the data pipeline.

FAST API provides the application infrastructure for the conserver. The transformation steps are developed as Python modules and loaded as tasks managed by FAST API. As each task finishes, it notifies other system elements by publishing UUID of the vCon. Other tasks wait on these notifications, and when they receive the notification, they can act on that same vCon for whatever purpose they may have. In addition, FAST API provides a REST API to the store of vCons, and a simple UI to manage the conserver.

Day in the Life of a vCon

To illustrate the normal operation of the conserver, let’s follow along as a conversation is extracted, transformed and the data is provided to a business team. For this example, we’ll assume the Conserver is started and configured to take conversations from a Freeswitch system, transcribe them, look for a particular subject (recalls) and send those to a PostGres table for the operations team.

  1. A customer and an agent has a conversation using Freeswitch. A Freeswitch adapter is running in the conserver that monitors calls and requesting recordings. For context, refer to https://developer.signalwire.com/compatibility-api/xml/ to see the kinds of call events and recording options.
  2. When the call on Freeswitch ends, the adapter uses the data from the call (parties, recordings) to create a vCon. This vCon is stored in REDIS as a JSON object under the UUID of the newly created vCon. By convention, the key is named with the pattern “vcon:uuid” (like vcon:7665-343535-58575-333).
  3. In addition to the standard parts of a vCon, the dialog and parties, the adapter adds a new attachment (to the attachments section of the vCon standard) that details what adapter created the vCon, details important for debugging, etc. This attachment travels inside the vCon throughout it’s life, unless it is explicitly stripped off later on.
  4. As a final action by the adapter, the ID of the newly created vCon is published through a REDIS PUB/SUB channel. Other processing tasks listen to these channels, waiting for work to do. When they receive the published message, the UUID of the vCon, they can start their work on that vCon. On configuration, these tasks are setup as pipelines, such that when one task completes, a single task is listening for its turn to process that vCon. (However, it is also common to have multiple tasks listening to the same channel, for parallel processing and fanout scenarios.)
  5. A plugin called “transcription” is listening for new conversations to transcribe. Plugins, unlike adapters, expect a vCon as an input, and produce vCons as outputs. Adapters make the original vCons, and plugins ammend, delete, copy and push them. This configuration allows configurations of pipelines of plugins, the output of one feeding the input of the next.
  6. The transcription plugin (currently there are two versions to choose from, Whisper.ai and Deepgram) take the dialog section of the vCon (which holds the recorded voice) and transcribe them. This transcription is added to the vCon in the “analysis” section, and normally contains information like a complete transcription, and a confidence score and a time stamp for every word transcribed. REDIS then updates the stored vCon with this new analysis, using the JSON module to avoid reading or copying the large data objects in the dialog.
  7. Like when the adapter finished created the vCon, the transcription then publishes a message as the same vCon ID as was originally created by the adapter, but on a different PUB/SUB channel - this channel configured to carry information from the transcription block to the filter block.
  8. A plugin called “recall finder” is listening on the transcription output filter. When it receives the notification, it loads the transcription attachment and looks for the word “recall” in the conversation. If it does not find the word, it can simply exit without creating any message for the downstream plugin, effectively ending the processing of it.
  9. At this point, the vCon has been created, captured, transcribed and identified as having the information we want: it’s a recall conversation. For information systems that want a native JSON representative, the vCon can now be sent for consumption. For instance, it could now be sent via a web hook (HTTP POST) to any API endpoint. In like manner, it can be stored in a Mongo Database, sent to BigQuery, stored in S3 or in a local file system.
  10. If the final destination has a fixed schema, like a Postgres database, a Google Spreadsheet or a no code tool, we need to create a “projection” for this data before the “recall finder” is done. A projection is a simple key-value store of the important information, determined by the use case. For illustration, assume we are interested in sending the transcription, the identity of the agent and the customer, and when the conversation happened. This projection, which directly corresponds to a single row in a database, with four columns (transcription, agent, customer, created at), will be added to the vCon, just as the transcription analysis was. At this point, the original vCon now has an attachment from the adapter, an analysis by the transcriber, and this new transcription analysis. As always, the vCon ID is pushed into a PUB/SUB channel to the last element, where one more plugin is waiting.
  11. The final plugin is a PostGres projection. When it runs, it looks for projections on a vCon, then takes that information and uses it to add (or upsert) a new row with the information from the projection into a configured PostGres table. From the perspective of the business users of the data, they simply see rows of transcribed conversations that match a criterion. Data projections, like adapters, handle the differences between destinations: unique data projections are required for different kinds of relational database, no code tools, Google Sheets, etc.
Clone this wiki locally