-
Notifications
You must be signed in to change notification settings - Fork 1
Home
CaT is a black-box content-aware tracing and analysis framework. It analyzes distributed systems in a non-intrusive way, highlighting how their components interact with each other and how data flows through the system.
Its design enables the capture of detailed information related to I/O network and disk events, such as the context of the request and the data processed by the event. With this information, CaT proposes an analysis of the event's content based on their similarity, allowing the detection of data flow patterns that are not visible when inspecting only the context of events.
- Content-based tracing: A novel algorithm that captures and analyzes the context and content of applications’ I/O requests. CaT can identify duplicate data (with a similarity degree of 100%), as well as near-similar data (with a high degree of similarity (e.g., > 80%)) that suffered slight modifications while flowing through different components (e.g., messages that include the same payload but have a different metadata header).
- Black-box tracing: The CaTracer component uses two kernel-level tracing tools (Strace and eBPF) for capturing storage and network I/O requests in a non-intrusive fashion. These two technologies provide different trade-offs in terms of resources usage (e.g., CPU, RAM and disk space), accuracy (amount of collected information), and I/O performance.
CaT extends Falcon’s architecture to analyze data in transit and at rest while providing further information about the targeted system.
Namely, CaT includes the following main components:
- CaTracer responsible for tracing events of interest (e.g. start, end, send, receive, read, write, etc) at runtime. There are currently two different implementations of CaTrace: one based on the eBPF technology (CatBpf) and another based on the Strace tool (CatStrace).
- Falcon-Solver combines the events into a global execution trace that preserves causality. This is achieved by i) building a symbolic constraint model that encodes the happens-before relationships between events, ii) using an SMT solver to solve the constraints and assign a logical clock to each event such that all causal dependencies are satisfied. This component is identical to the one provided by the Falcon project, with the exception of some minor design modifications to support the CaTlog file as input and to include the parsing of storage events metadata.
- CaSolver applies data similarity estimation algorithms to find events that have a high probability of operating over the same data flow. CaSolver was implemented in two different languages: Go (CaSolver-Go - to use when tracing with CatBpf) and Python (CaSolver-Py - to use when tracing with CatStrace).
- Falcon-visualizer draws a space-time diagram that enables the visual analysis of the whole execution. This component is a modified version of the visualizer provided by the Falcon project.
For additional details about CaT, please check our Middleware'21 paper.