Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing $qb->setIdentifier() to support literal-string #9047

Closed
wants to merge 1 commit into from

Conversation

craigfrancis
Copy link
Contributor

Not ready yet. Just trying to start a discussion on supporting Identifiers (table and field names), in a similar way to how Parameters work.

This would allow several QueryBuilder methods to use literal-string values (strings written by the developer), so anyone using static-analysis will be prevented from creating Injection Vulnerabilities (ref #8933).

The first commit just adds the public methods to set/get Identifiers. I'm not too sure where the Identifier values should be added to the SQL, maybe in Query::parse(), using $platform->quoteIdentifier()?

Just adding the public methods to set/get Identifiers, to start the discussion on Issue doctrine#8933.
@derrabus derrabus added this to the 2.10.0 milestone Sep 29, 2021
@derrabus derrabus changed the base branch from 2.9.x to 2.10.x September 29, 2021 18:19
@derrabus derrabus changed the base branch from 2.10.x to 2.9.x September 29, 2021 18:19
@greg0ire greg0ire requested a review from beberlei October 2, 2021 12:24
@greg0ire greg0ire assigned alcaeus and unassigned alcaeus Oct 2, 2021
@greg0ire greg0ire requested a review from alcaeus October 2, 2021 12:24
@derrabus derrabus changed the base branch from 2.9.x to 2.10.x October 2, 2021 14:16
@derrabus
Copy link
Member

derrabus commented Oct 2, 2021

Thank you for the PR. I have changed the target branch to 2.10 because this change does not really pass as a bugfix.

I'm afraid, I'm not really familiar with the literal-string feature. Can you give some examples, how your PR improves the DX when using the query builder?

@derrabus derrabus closed this Oct 2, 2021
@derrabus derrabus reopened this Oct 2, 2021
@greg0ire
Copy link
Member

greg0ire commented Oct 2, 2021

I'm not too sure where the Identifier values should be added to the SQL, maybe in Query::parse(), using $platform->quoteIdentifier()?

🤔 I'm confused, this is the ORM, so the identifiers in this PR should not designate SQL objects such as tables

Just trying to start a discussion on supporting Identifiers (table and field names)

"field names" as in properties of a PHP class sounds correct, but "table" does not. People writing DQL are not supposed to deal with tables directly, are they?

Also, we're discussing this internally, and there are several concerns with this:

@greg0ire greg0ire removed this from the 2.10.0 milestone Oct 3, 2021
@craigfrancis
Copy link
Contributor Author

I must confess, I'm not too familiar with Doctrine. I'm more of a code auditor, where I work with fairly typical software development teams to identify where they can make improvements. Last year I audited a couple of projects using Doctrine, and found the usual security mistakes, which resulted in SQL Injection vulnerabilities.

While I appreciate Doctrine has documented these issues, developers aren't guaranteed to read or understand (especially junior developers); and even then, mistakes happen.

@derrabus, thanks for updating the target branch. The literal-string feature is a way to check the variable contains a string that's been written by the developer (i.e. written in the source code, not using untrusted input from the user).

@greg0ire, thanks for the clarification, I have been incorrectly reading this as SQL, as it acted like SQL when I found those injection vulnerabilities.

Interestingly your example is very close to one of the vulnerabilities I found (slightly altered from the original, just to keep it simple, and avoid pointing fingers):

$result = $qb
  ->select('u')
  ->from('User', 'u')
  ->where('u.id = ' . $_GET['id']); // INSECURE

I believe this code should follow "binding parameters to your query", which uses a literal-string for where(), and a parameter for the unsafe variable:

$result = $qb
  ->select('u')
  ->from('User', 'u')
  ->where('u.id = :id')
  ->setParameter('id', $_GET['id']);

I appreciate integers cannot introduce an SQL Injection vulnerability, but following the discussion on the PHP Internals mailing list, treating some variables as safe because of their type, seemed to confuse a few people, especially when many projects let PHP be a little vague with types.

False positives are a concern, but assuming you're using setParameter() for values, and we can introduce a way for those rare cases where variable Identifiers need to be included in a safe way (i.e. this feature request), that should cover all use-cases... that said, assuming there are some really complicated systems, we could introduce a stringable value-object, maybe named unsafe_dql, so developers could bypass this check (not advisable, but at least they have been warned, and it would help auditors identify and focus on the dangerous bits).

As to "what should happen if the newly introduced syntax for identifiers is used inside a string literal?"... I'm open to suggestions on the syntax, but going with:

$result = $qb
  ->select('u')
  ->from('User', 'u')
  ->where('{my_field} = :my_value') // A nice simple `literal-string`.
  ->setIdentifier('my_field', $_GET['field']) // Applied to {my_field} using $platform->quoteIdentifier()
  ->setParameter('my_value', $_GET['id']);

I don't think this syntax will cause issues for existing code, as the brackets cause a Syntax Error.

@derrabus derrabus added this to the 2.11.0 milestone Oct 4, 2021
@derrabus derrabus changed the base branch from 2.10.x to 2.11.x October 4, 2021 17:25
@morozov
Copy link
Member

morozov commented Oct 4, 2021

The code like this, especially if embraced by the API:

->setIdentifier('my_field', $_GET['field'])

Is as fundamentally wrong as:

->where('u.id = ' . $_GET['id']);

The input must not define the application logic. The application logic should be defined by the developer.

The new API allows my_field to be anything as long as it's properly quoted from the SQL syntax perspective, which is obviously insecure. If the identifier is determined based on the input, it should be taken from a white list, not directly from the input.

@craigfrancis
Copy link
Contributor Author

@morozov, You are right, the value should be checked against an allow-list.

I only wrote ->setIdentifier('my_field', $_GET['field']) to keep the example short, and clearly show we are working with a dangerous value.

But I will note that it's not "as fundamentally wrong", because the second example provides a full Injection Vulnerability; whereas setIdentifier() limits the attacker to a valid Identifier, which is a different problem (i.e. needing to limit the accepted values even further).

And this method would rarely be used, but it should exist to ensure variable identifiers are being included and quoted correctly (at the moment it's easier to include these values incorrectly, via basic concat)... and maybe the setIdentifier() method should have a 3rd argument that accepts an array of allowed-values?

As an aside, literal-string values can be used though-out, but it gets a bit more complicated, e.g.

$order_fields = [
    'name',
    'created',
    'admin',
  ];
 
$order_id = array_search(($_GET['sort'] ?? NULL), $order_fields);
 
$sql .= ' ORDER BY ' . $order_fields[$order_id];

@morozov
Copy link
Member

morozov commented Oct 4, 2021

Filtering input that describes a data source (e.g. a table column) against a white list is a must first step. It should be done not only when building queries but in other cases as well (e.g. when fetching data from a query). The filtering logic may be more complex than in_array(), it could be based on more complex rules (e.g. ACL. With that said, the filtering capability doesn't belong to the query builder, it belongs to the application code.

Quoting identifiers is secondary. It's only needed to prevent the identifiers that contain SQL syntax elements from being interpreted as language instructions rather than values. Quoting an identifier will make it case-sensitive on Oracle and IBM DB2, so it cannot be applied just automatically until there's full support for handling identifiers as first-class citizens (see doctrine/dbal#4357, doctrine/dbal#4772).

The API being proposed is a poor solution to the secondary problem sweeping the primary one under the rug.

@alcaeus alcaeus removed their request for review October 5, 2021 06:41
@craigfrancis
Copy link
Contributor Author

@morozov I agree. Filtering which fields are allowed is important, and it is a separate issue that's not really part of QueryBuilder (and hasn't been included in my proposal)... but if it was, a basic (and optional) one would at least prompt developers as they type:

Screenshot showing phpStorm autocomplete of a method and arguments

Interestingly your suggestion of using <identifier> seems to be pretty much the same, I just went with {identifier}, used an object named "Identifier" that's basically copying "Parameter" (instead of calling it "Name"), and I've provided a single method on the QueryBuilder which I think would effectively call your buildQuotedNameIdentifier().

As to quoting identifiers, that's my main focus, because I'm trying to address Injection Vulnerabilities. And I should note, there is nothing "automatic" about this, as developers would use this new method to ensure all of these variable Identifiers are "always" being quoted, correctly (which would avoid your noted issues with the list of keywords, case-sensitivity, and special characters); it will also provide you with a way to receive Identifiers "as first-class citizens", in the same way that Parameters work.


That said, I don't mind how variable Identifiers are provided to Doctrine, I just want to stop things like this from happening:

->where($_GET['where']); // INSECURE, in this case the value was coming from JavaScript :-)

This is why certain parameters should use literal-string instead of string. For example, ->where() using @param literal-string|object|array $predicates, as these strings should be written by the developer. The difficulty comes when there is a user defined Identifier (quite rare, but does happen, e.g. a user changing how a table orders records), where the developer should either use the $order_fields example above (which gives you a literal-string from an array of allowed values), or the developer should do their own filtering, and provide that value to Doctrine to be included in the DQL/SQL safely.

Also, following on from our email discussion on the 26th April, about the is_literal() RFC, where you said that you believe "these types of checks should be done [with static analysis]"... that's what I'm working on now; with both Psalm 4.8 and PHPStan 0.12.97 now supporting literal-string.

Copy link
Contributor

There hasn't been any activity on this pull request in the past 90 days, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 7 days.
If you want to continue working on it, please leave a comment.

@github-actions github-actions bot added the Stale label Nov 29, 2024
Copy link
Contributor

github-actions bot commented Dec 6, 2024

This pull request was closed due to inactivity.

@github-actions github-actions bot closed this Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants