Skip to content

Taint Analysis for PostreSQL

Patrick Spiegel edited this page Dec 3, 2018 · 1 revision

Node RASP includes the PostgreSQL parser library libpg_query to run taint-lexical analysis on queries right before sending them to the database. The basic idea is, that a user should only be in control of a single, lexical token. A user should not be able to add further tokens after the one given control of. Therefore, taint ranges should always be in between single, lexical tokens with one exception - comments.

The default lexer resp. the scanner of libpg_query and PostgreSQL treat all comments as whitespaces since they contain no semantical information for the database. For the use case of PostgreSQL injection detection, comment tokens are crucial. Let's take the following example:

db.query("SELECT * FROM tab WHERE foo='" + param1 + "' AND tenant=42;");

By injecting a comment -- as param1 only a single lexical token is created, but further conditions are also commented out. As a result, the semantics of our query has changed and the attack was successful. Therefore, we require the lexer to also return comment tokens.

Building a custom Scanner lib

Our approach here is to start with the scanner generation within PostgreSQL itself as libpg_query does. PostgreSQL uses the tool Flex to generate the C implementation of the scanner scan.c from a Flex file scan.f.

1. Checkout PostgreSQL Version

In the first step, we check out the version of PostgreSQL we would like the scanner to be extracted from.

git clone https://github.com/postgres/postgres.git
git checkout <tag>  // currently REL_10_1

2. Extract the Flex scanner definition

Since we later want to integrate into the current release of libpg_query only the Flex file scan.l is required. Copy it to your working directory.

cp src/backend/parser/scan.l <yourWorkingDirectory>

3. Run Patches from libpg_query

According to the build of libpg_query we first run some patches on the Flex file.

patch -p1 scan.l ~/libpg_query/patches/01_parse_replacement_char.patch

4. Adapt scanner definition

At this step the changes to the actual scanner behavior are made. In our case, the scanner should return comment tokens. This can be done by adding the following in scan.l:

{comment}   {
     SET_YYLLOC();
     return COMMENT;
}

This change can be seen in commit. A definition for the returned COMMENT token already exists.

5. Run Flex to create the scanner

flex -CF -o scan.c scan.l

6. Merge with libpg_query

Since libpg_query makes further modifications to the scanner during the build process, we only want to copy the transition table as well as the token switch from scan.c to src/postgres/src_backend_parser_scan.c. Unfortunately, this has currently be done manually.

7. Build and adapt tests

Build your libpg_query with the adapted scanner and adapt test conflicting with the new scanner output.

Benchmark

Statement used for benchmark:

const stmt = "SELECT * FROM Customers WHERE tenant=1 AND (name ~ '" +
             search + "' OR notes ~ '" +
             search + "' OR product ~ '" +
             search + "');";

Results

Node Version Description Results Deviation
v8.11.1 localhost, no taint ~310,000 ops/sec ±50%
v8.11.1-taint localhost, no taint ~310,000 ops/sec ±50%
v8.11.1-taint localhost, taint on search - no attack ~18,000 ops/sec ±4%
v8.11.1-taint localhost, taint on entire query - injection ~7,000 ops/sec ±4%
v8.11.1-taint localhost, pg_check.hasInjection(query) - no taint ~21,000,000 ops/sec ±1%
v8.11.1-taint localhost, pg_check.hasInjection(query) - taint ~21,000 ops/sec ±1%
v8.11.1-taint localhost, libpg_query.getLexicalTokens(query) ~24,000 ops/sec ±1%

Previous test were run asynchronously, so the latency and database processing wan't part of the benchmark!

Node Version Description Results Deviation
v8.11.1 remote database, no taint ~83,000 ops/sec ±43%
v8.11.1-taint remote database, no taint ~80,000 ops/sec ±39%
v8.11.1-taint remote database, taint on search - no attack ~19,000 ops/sec ±5%
v8.11.1-taint remote database, taint on search - injection ~4,500 ops/sec ±3%

Outlook

Since for the taint-lexical analysis only a small part of the parser resp. libpg_query is required, it may be reasonable to create a smaller lib only containing the scanner e.g. libpg_scanner.