Skip to content

Commit

Permalink
content: update heading tags hierarchy
Browse files Browse the repository at this point in the history
  • Loading branch information
shah-iq committed Oct 10, 2024
1 parent 1e755ba commit c95a86e
Show file tree
Hide file tree
Showing 5 changed files with 81 additions and 69 deletions.
12 changes: 6 additions & 6 deletions src/content/blog/en/power-of-sql-and-sql-views.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ tags: ["SQL","Views" , "RSSD" ]

In the world of data integration and processing, flexibility and extensibility are paramount. The ability to easily prepare, integrate, and analyze data from multiple sources is critical. `surveilr`, with its stateful, local-first, and edge-based architecture, stands out as a powerful solution for these challenges. One of the key strengths of `surveilr` is its SQL-centric nature, making it both flexible and extendable. And when it comes to maximizing this power, there’s no better tool in the SQL toolbox than SQL views.

### The Foundation: SQL in ``surveilr``
## The Foundation: SQL in ``surveilr``

At the core of ``surveilr`` is its **SQL-centric approach**. Every piece of data it processes is queryable using SQL, allowing users to manipulate and organize information in a way that best fits their workflow. This SQL-centric design makes it easy to set up data pipelines, perform complex transformations, and create relationships between disparate data sources. Whether you're working with clinical operations data, auditing evidence collection and reporting, pharmacy records, billing information, or any other type of clinical or non-clinical data, SQL provides a robust foundation to access and modify that data seamlessly and effortlessly.

However, while basic SQL queries can deliver tremendous value, `surveilr`’s real potential can be unleashed with the use of **SQL views**.

### What are SQL Views?
## What are SQL Views?

An **SQL view** is a virtual table defined by a query. It does not store data itself but acts as a window through which you can view and interact with data stored in underlying tables. Essentially, a view abstracts away the complexity of a query, letting users interact with data as though it were a single unified table.

Expand All @@ -31,7 +31,7 @@ Here’s why views are so powerful in the context of surveilr:
- **Data Abstraction**: Views provide a layer of abstraction, allowing you to hide certain complexities or fields from users who may not need access to all the underlying data. For example, you could create a view that shows only anonymized or deidentified data for specific use cases, ensuring HIPAA compliance without sacrificing usability.
- **Data Consistency**: By defining a view, you ensure that everyone accessing the data sees the same results based on a consistent underlying query. This reduces errors and ensures that reports or analyses built on top of those views are based on uniform data.

### Extending `surveilr`’s Power with SQL Views
## Extending `surveilr`’s Power with SQL Views

`surveilr` already excels at integrating data from multiple sources—be it clinical records, billing data, or operational logs. But the real power of `surveilr` comes when you use SQL views to extend its capabilities. Let’s explore how SQL views make `surveilr` even more powerful:

Expand All @@ -48,7 +48,7 @@ SQL views enable the creation of custom datasets that can be fed directly into B
Views provide a way to control what data different users or systems can see. By setting up views that show only the fields or records that a user needs, `surveilr` users can maintain security and regulatory compliance. For example, you could create views that display anonymized patient data for non-clinical staff while allowing full access for medical personnel.


### Real-World Examples of Extending `surveilr` with SQL Views
## Real-World Examples of Extending `surveilr` with SQL Views

- **Customized Reporting**: Create views to aggregate data from multiple sources, generating customized reports.
- **Data Validation**: Use views to validate data against specific criteria, ensuring data quality and integrity.
Expand All @@ -59,7 +59,7 @@ Views provide a way to control what data different users or systems can see. By
- **Improved Security**: Views enable fine-grained access control, ensuring sensitive data is only accessible to authorized users.


### Real-World Example: Using SQL Views for Healthcare Data Integration
## Real-World Example: Using SQL Views for Healthcare Data Integration

Imagine a healthcare provider using ``surveilr`` to integrate data from different departments—clinical records, pharmacy, and billing. The provider wants to track patient progress and costs without exposing sensitive information unnecessarily.

Expand All @@ -72,7 +72,7 @@ Using SQL views, the provider can create:
These views can be reused across the organization, ensuring that each department gets exactly what it needs while maintaining security and consistency across all datasets.


### Conclusion
## Conclusion

`surveilr`'s SQL-centric architecture is already a game-changer for integrating and analyzing data from multiple systems. However, its potential truly shines when extended using SQL views. Views allow you to simplify complex queries, combine data in powerful ways, and enhance both security and compliance. They enable you to preprocess and transform data effortlessly, all while keeping the underlying system flexible and scalable.

Expand Down
26 changes: 13 additions & 13 deletions src/content/blog/en/rssd-excel-portability-sql-power.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,22 +36,22 @@ Surveillance State Database (RSSD)** as something similar to **Microsoft
Excel**—a tool most of us have used at some point. Here’s a breakdown of how the
RSSD works, using an Excel analogy:

### **RSSD is Like an Excel Workbook**
### RSSD is Like an Excel Workbook

Just like an Excel workbook is a **single file** that contains all of your data,
an RSSD is also a **single file** that contains all of the data surveilr is
managing. This single file can store everything from simple numbers to complex
records that need to be tracked and queried.

### **Tables are Like Excel Worksheets**
### Tables are Like Excel Worksheets

In Excel, you organize data into **worksheets**. Similarly, in the RSSD, the
data is organized into **tables**. Just as a worksheet holds rows and columns of
data, a table in the RSSD holds **rows of records and columns of fields**. For
example, you might have one table for customer data, another for transactions,
and yet another for logs.

### **SQL is Like Excel Formulas**
### SQL is Like Excel Formulas

In Excel, you use **formulas** to manipulate your data. These formulas allow you
to perform calculations, look up values, or summarize data across your
Expand All @@ -63,15 +63,15 @@ Just like in Excel, where you can create simple to complex formulas depending on
your needs, the RSSD allows you to extract insights from your data using
flexible SQL queries that work across different tables of information.

### **Flexibility and Power**
### Flexibility and Power

Just as Excel gives you the ability to manipulate, organize, and analyze your
data in many different ways, the RSSD allows you to do all of this too—only it
uses SQL, which is more powerful when working with large datasets. For example,
while Excel might slow down with very large workbooks, the RSSD, thanks to
SQLite, can handle **millions of records** without breaking a sweat.

### **Portable and Self-Contained**
### Portable and Self-Contained

In the same way that you can take an Excel file and send it to someone else (and
they’ll have access to all the worksheets and data), the RSSD is a
Expand All @@ -80,7 +80,7 @@ with all its tables and data, simply by copying the RSSD file to another
location. There’s no need for a complex setup or configuration—just open it and
start working with the data.

### How RSSD Works as a SQLite Database
## How RSSD Works as a SQLite Database

The **Resource Surveillance State Database (RSSD)** leverages **SQLite**, a
fully-featured relational database that is known for being:
Expand All @@ -103,7 +103,7 @@ fully-featured relational database that is known for being:
can move the entire state of your data from one environment to another by
copying a single file.

### Why SQLite?
## Why SQLite?

1. **Local-First Processing**: SQLite's small footprint and self-contained
nature make it an ideal choice for **local-first** and **edge-based** data
Expand All @@ -122,38 +122,38 @@ flexibility and performance in a portable, easy-to-manage format.

## Why RSSD makes Data Integration easier for those without IT departments

### **No Server Setup Required**
### No Server Setup Required

One of the biggest advantages of using **SQLite** for the RSSD is that there’s
no need for a dedicated database server. Everything happens locally within a
single file. This simplifies setup, reduces costs, and minimizes dependencies on
external infrastructure, which is especially beneficial for smaller
organizations that may not have large IT departments.

### **Fast and Lightweight**
### Fast and Lightweight

Because the RSSD is built on **SQLite**, it’s designed to be **fast and
lightweight**. This is critical for local-first operations, where data needs to
be processed efficiently on edge devices or local machines before being
synchronized with a central system. Despite being lightweight, the RSSD can
handle a high volume of data with **excellent performance**.

### **SQL for All Data Operations**
### SQL for All Data Operations

By standardizing all data operations with **SQL**, the RSSD makes it easy for
non-technical users who are familiar with SQL (or even just comfortable with
Excel formulas) to work with the data. SQL is a widely-known language that
allows users to run **queries**, **generate reports**, and **analyze data**
without needing to learn a new, proprietary system.

### **Reliability and Durability**
### Reliability and Durability

The RSSD ensures data consistency through its **ACID-compliant** transactions,
meaning you can trust that your data is safe, even during system failures. Every
change made to the RSSD is guaranteed to be completed fully or not at all, so
you never end up with incomplete or corrupted data.

### **Portable and Easy to Backup**
### Portable and Easy to Backup

Because the entire database is stored as a single file, **backing up** and
**restoring** data is as simple as copying the RSSD file. This simplicity makes
Expand All @@ -171,4 +171,4 @@ For non-technical users, it’s as easy to understand as working with an Excel
workbook. For technical users, the flexibility and power of SQL make it a robust
solution for handling complex data operations. The RSSD delivers the
performance, reliability, and simplicity that modern organizations need to
thrive in today’s data-centric landscape.
thrive in today’s data-centric landscape.
32 changes: 16 additions & 16 deletions src/content/blog/en/sql-based-etl-elt.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ If you're an SQL engineer trying to learn the ropes of data engineering, you mig

Our specific use case will involve aggregating patient remote monitoring data from various devices into a single unified view for Continuous Glucose Monitoring (CGM) tracings. We'll break down each step in a way that's approachable and practical, giving you the tools to work with real data while keeping the infrastructure lightweight.

### **Background on ELT and Why We Use It**
## Background on ELT and Why We Use It

The classic ETL strategy involves transforming data before loading it into your storage system, which typically requires more complex workflows, external tools, and a lot of up-front work. In contrast, ELT lets you extract the data as-is, load it into your database, and then transform it *in place*, often with SQL views, which makes it great for exploratory work or environments with less infrastructure.

SQLite is a great fit here because it's lightweight, widely supported, and doesn't require complex setup—perfect for small to medium datasets or rapid prototyping.

### **Setting the Scene: Ingesting the Data**
## Setting the Scene: Ingesting the Data

Imagine we have data from multiple devices—perhaps CGMs, smartwatches, and other monitoring devices—that all capture remote patient monitoring data. After ingesting these data sources, we end up with tables like `table_1`, `table_2`, `table_3`, and `table_4` in our SQLite database. Each table represents a different device and has different columns, even though they all describe patient data for similar remote monitoring purposes.

Expand All @@ -32,7 +32,7 @@ For example:
- **table_3** has columns like `pat_id`, `recorded_at`, `glucose_reading`, `sensor`
- **table_4** has columns like `identifier`, `time_taken`, `patient_ref`, `sugar_level`

### **The Challenge: Creating a Unified View**
## The Challenge: Creating a Unified View

We need to create a single unified view called `patient_rpm_mode_cgm` that gives us all CGM tracings in a common format. Since ELT focuses on transforming data in place, we will write SQL to transform and union the data from these disparate tables. Our ultimate goal is to create a view that presents common column names—let's standardize them to:

Expand All @@ -42,7 +42,7 @@ We need to create a single unified view called `patient_rpm_mode_cgm` that gives
- `device_type` (a new column that does not exist in the physical tables)
- `source` (a new column to indicate the origin table)

### **Step 1: Understanding the Source Tables**
### Step 1: Understanding the Source Tables

The first step in transforming this data is to understand how each source table maps to our target columns. To standardize the columns:

Expand All @@ -53,7 +53,7 @@ The first step in transforming this data is to understand how each source table
| table_3 | `pat_id`, `recorded_at`, `glucose_reading` | `patient_id`, `timestamp`, `glucose_level`, 'CGM' AS `device_type`, 'table_3' AS `source` |
| table_4 | `patient_ref`, `time_taken`, `sugar_level` | `patient_id`, `timestamp`, `glucose_level`, 'CGM' AS `device_type`, 'table_4' AS `source` |

### **Step 2: Writing the Transformation Queries**
### Step 2: Writing the Transformation Queries

We need to write queries that extract the relevant fields from each table, aliasing the columns to standardize their names, and adding new columns as needed.

Expand Down Expand Up @@ -83,7 +83,7 @@ FROM
table_3;
```

### **Step 3: Combining the Queries with UNION**
### Step 3: Combining the Queries with UNION

Next, we need to combine these transformed queries using `UNION ALL`. Using `UNION ALL` is appropriate here because it ensures we retain all records, even if they have duplicate values (which may be necessary for auditing or detailed analysis).

Expand Down Expand Up @@ -143,11 +143,11 @@ SELECT
'synthetic' AS source;
```

### **Step 4: Adding More Transformations with Views**
### Step 4: Adding More Transformations with Views

One of the key benefits of ELT using views is the ability to easily add more transformations without altering the raw data or writing complex ETL pipelines. Here are some additional common transformations that are better handled through views:

#### **1. Standardizing Data Formats**
#### 1. Standardizing Data Formats

In many cases, different tables may store data in different formats. For example, timestamps might be stored in different formats or time zones. Using a view, you can standardize these formats:

Expand All @@ -165,7 +165,7 @@ FROM

This view ensures that all timestamps are in the same format, making downstream analysis much easier.

#### **2. Filtering and Cleaning Data**
#### 2. Filtering and Cleaning Data

You may want to exclude certain rows from analysis, such as rows with missing or invalid data. Views are a great way to create a “clean” dataset:

Expand All @@ -185,7 +185,7 @@ WHERE

This view filters out any rows where `glucose_level` is 0 or negative, which may represent invalid data.

#### **3. Aggregating Data**
#### 3. Aggregating Data

You can also use views to create aggregate data that can be used for reporting or analysis. For example, creating a view that provides daily average glucose levels for each patient:

Expand All @@ -203,7 +203,7 @@ GROUP BY

This aggregated view makes it easy to analyze trends over time without needing to write aggregation queries repeatedly.

#### **4. Creating Derived Metrics**
#### 4. Creating Derived Metrics

If you need to create new metrics based on existing columns, views are a great way to handle this. For example, you might want to create a derived metric called `glucose_category` to categorize glucose levels:

Expand All @@ -227,7 +227,7 @@ FROM

This view adds a new column that categorizes glucose levels into 'Low', 'Normal', or 'High'.

### **Step 5: Validating the Unified View**
### Step 5: Validating the Unified View

After creating the view, it’s always good practice to validate the results. You can use a `SELECT` query to make sure everything looks right:

Expand All @@ -237,7 +237,7 @@ SELECT * FROM patient_rpm_mode_cgm LIMIT 10;

Review the data to ensure that the column names are standardized and that the values align as expected. Pay particular attention to the `timestamp` column to ensure formats are consistent.

### **Step 6: Leveraging the View for Downstream Analysis**
### Step 6: Leveraging the View for Downstream Analysis

With the `patient_rpm_mode_cgm` view in place, downstream processes can now treat this data as a consistent and unified source. Analysts can run queries like:

Expand All @@ -253,20 +253,20 @@ GROUP BY

This allows for seamless analysis without needing to worry about device-specific table structures.

### **Why ELT with SQLite?**
## Why ELT with SQLite?

You might wonder why ELT is a good fit for this scenario. Here are some reasons:

- **Flexibility**: ELT allows you to load data as-is and apply transformations later when you have a better understanding of the data.
- **Simplicity**: SQLite is simple to set up, and using views means that transformations are written declaratively with SQL, which is easy for teams to understand and modify.
- **Lightweight**: No heavyweight ETL tools are required, which makes this approach perfect for small datasets or prototyping.

### **Conclusion**
## Conclusion

By using SQLite and SQL views, we've demonstrated a lightweight and modern approach to ETL (or, more precisely, ELT) that helps simplify the process of integrating data from multiple sources. This approach allows for greater flexibility, and by leveraging SQL views, we can keep the transformation logic declarative and transparent.

Whether you're prototyping a new data pipeline, working with smaller datasets, or need a low-maintenance integration solution, this ELT strategy with SQLite is an excellent option. We hope this guide helps you get started on your journey towards modern data engineering!

### **Next Steps**
## Next Steps

To deepen your understanding, try adding more transformations or aggregations to the `patient_rpm_mode_cgm` view. You could, for example, normalize the `timestamp` formats or add additional metadata to the view to help with analysis. Feel free to experiment and explore how SQLite's capabilities can further simplify your data engineering workflow.
Loading

0 comments on commit c95a86e

Please sign in to comment.