Skip to content
This repository has been archived by the owner on Jul 16, 2021. It is now read-only.

Attribute shadowing with join() encourages possible data leak bugs #347

Closed
jonashaag opened this issue Dec 22, 2016 · 56 comments
Closed

Attribute shadowing with join() encourages possible data leak bugs #347

jonashaag opened this issue Dec 22, 2016 · 56 comments

Comments

@jonashaag
Copy link

Please see laravel/framework#16704, I believe the way duplicate column names (for example: users.id, groups.id when you join the both tables) are currently handled by Eloquent A) can be a surprise to many users and B) is a potential security issue.

Summary from the discussion linked above:

  • I'm concerned with common columns like id overriding each other in case you're doing a join being a potential security risk
  • Other ORMs solve this by prefixing duplicate columns with the table name
  • SQL solves this by not allowing to SELECT ambiguous column names, but Eloquent has an implicit SELECT clause that "randomly" chooses one of the columns and ignores the rest of them
  • Some suggest that this is a merely a problem of education/documentation
  • I say that it's the responsibility of Eloquent to solve such problems once and for all and that it is irresponsible to not try to solve the problem as good as possible
@fernandobandeira
Copy link

fernandobandeira commented Dec 22, 2016

I think we should throw a duplicate select column exception for any duplicated column on the result...

For the implementation I think that the simpler would be to change the current FETCH_OBJ to FETCH_NAMED, if one of the keys has an array as the value (this implies that it has more than one value for the same column name), then throw the exception, it's better than magically doing something with the column, this is something the developer should take care anyway, I guess warning with an exception should be enough...

Also we could hide this exception when not working with strict mode on sql, instead we would select the first value of the array and use it, as it should probably be what you'll expect as the value...

Update: My exception idea was rejected so I think that we won't handle this situation at all, which isn't bad since it's something the developer should be taking care after all... I guess we can leave this open anyway so ppl can comment how other ORM's are handling this and propose alternatives... Also it keeps all discussion in one place...

@jonashaag
Copy link
Author

jonashaag commented Dec 22, 2016

which isn't bad since it's something the developer should be taking care after all

That's like saying it's unnecessary to create memory-safe programming languages because developers should not make mistakes with memory management in C, so why bother about memory safety in the first place? It's the programmer's fault if half of the world in vulnerable because of some memory management mistake in OpenSSL after all!

Or it's unnecessary to have static type checking; it's the programmer's mistake if the program has type errors!

I could name hundreds of other examples where smart people have decided to change how stuff works because other people tend to make too many silly mistakes. From a pragmatic point of view, if people make silly mistakes with your framework all the time, then you have simply made wrong decisions somewhere.

@jonashaag
Copy link
Author

I would love to get some feedback on this issue.

@Garbee
Copy link

Garbee commented Jan 26, 2017

Can you please provide an actual example exploit that is possible because of this? All I'm seeing so far is the theory of a possible exploit, not a concrete example where one exists in a situation that would generally occur in an application.

@jonashaag
Copy link
Author

There is no exploit that works for all applications. My point is that the current behavior encourages code that is prone to potential data leaks and bugs. One might argue that this is strictly speaking not a security issue but I think it's very similar to SQL injections though very limited in scope.

@Garbee
Copy link

Garbee commented Jan 26, 2017

As I said, can you please provide an example exploit? As it is, I don't quite understand what the exploit allows for an attacker to do. Or even how an attacker would exploit it if you aren't taking SQL commands from them. Helping get this looked at is difficult without a concrete "X report allows for Y attack to occur."

@Garbee
Copy link

Garbee commented Jan 26, 2017

Further, I didn't ask for an example that would affect "all", of course that isn't feasible. I simply asked for a "generally applicable" example, as in one that isn't caused by a developer explicitly opening an injection attack vector by taking user input and putting it directly into a SQL query.

@jonashaag
Copy link
Author

There is no general applicable exploit. There is no exploit. This issue is about preventing BUGS that may or may not have security impact. I'd say it's very likely for bugs causes by this behavior to have security consequences (other people's data being shown to visitors).

@Garbee
Copy link

Garbee commented Jan 26, 2017

Can you provide a case in which this bug happens? I don't know how many times a code sample needs to be asked for.

@jonashaag
Copy link
Author

Please see laravel/framework#16704

Very first sentence in this issue

@Garbee
Copy link

Garbee commented Jan 27, 2017

Yes, and that code doesn't make sense anyways. Your getting a collection of multiple users from each query. Then trying to scope orders by a single user. Which means that example is already failing in multiple places. It isn't clearly demonstrating the risk you're asserting exists.

@jonashaag
Copy link
Author

I am sorry; here's an even more complete version of the code:

function getUsers() { ... }
foreach (getUsers()->get() as $someUser) {
    $someUserOrders = Order::where("user_id", $someUser->id);
    ...
}

Simpler example without a loop:

function getUserById($id) {
  return User::where("id", $id)->first();
}

A few months after coding this the project requirements change and we now have user groups as well, and we change the query to all users having a primary group :

function getUserById($id) {
  return User::where("id", $id)->join("groups", "groups.id", "=", "users.primary_group_id")->first();
}

Also we always had some code that fetches users' orders which worked like this:

Order::where("user_id", getUserById(...)->id);

The ->id property is different when you use the second getUserById method -- it's the ID of the group.

Note that this is a problem that cannot happen with raw SQL; it's a problem introduced by Eloquent, see first post in this issue thread.

@Garbee
Copy link

Garbee commented Jan 27, 2017

What about using User::with("PrimaryGroup")->find($id); instead of the manual join? Assuming "PrimaryGroup" is an association with the user model. Would that method still contain the group id instead of the user id for the call?

I feel like doing a manual join in this kind of scenario is a quirky edge-case for most Laravel users.

@jonashaag
Copy link
Author

From a quick test: No, it only happens with join().

Thing is we have quite some places where we can't use with() for performance reasons, to the point where we stopped using with() for queries. Also WHERE statements using with() don't really filter the result set, they only null the related object if none is found. For example the "get me all users that have a primary group that exists in the database" can't even be expressed with with(), if I understand things correctly.

I guess loads of people must be using join(), given the restricted usefulness of with().

@Garbee
Copy link

Garbee commented Jan 27, 2017

So, if it is only with join... Isn't this how raw SQL will work when you run the commands? It seems to me like given context, this is expected output as long as doing a manual join works with way in SQL.

@barryvdh
Copy link

For example the "get me all users that have a primary group that exists in the database" can't even be expressed with with(), if I understand things correctly.

$query->has('PrimaryGroup')

@jonashaag
Copy link
Author

jonashaag commented Jan 27, 2017

Isn't this how raw SQL will work when you run the commands?

NO. Eloquent confuses the object's IDs. Let me make a very very concrete example.

var_dump(User::where('login', 'foo@bar.com')->first()->id);
int(269)

var_dump(User::where('login', 'foo@bar.com')->join('groups', 'primary_group_id', '=', 'groups.id')->first()->id);
int(261) // <-- wrong id!

Queries made by Eloquent are correct:

MariaDB [xxx]> select ... from `users` where `login` = 'foo@bar.com' limit 1;
+-----+------------------+
| id  | primary_group_id |
+-----+------------------+
| 269 |              261 |
+-----+------------------+
1 row in set (0.00 sec)

MariaDB [xxx]> select ... from `users` inner join `groups` on `primary_group_id` = `groups`.`id` where `login` = 'foo@bar.com' limit 1;
+-----+------------------+
| id  |               id |
+-----+------------------+
| 269 |              261 |
+-----+------------------+
1 row in set (0.00 sec)

But when constructing the model objects, Eloquent chooses THE WRONG ID COLUMN. (EDIT: Or, rather, it's ambiguous, and it choses one of them.)

@jonashaag
Copy link
Author

Eloquent should be using aliases for all tables for all queries so that these ambiguities go away. The simplest way would be to something like select users.*, groups.* ... but that fails as soon as you have multiple joins with the same table. Numbered alias would work in that case: select t1.*, t2.*, t3.* from users as t1 join groups as t2 ... join groups as t3 ...

@Garbee
Copy link

Garbee commented Jan 27, 2017

So, this isn't really a security problem. It is a possible data leak issue. There isn't anything a user can really do here to force this and abuse it. It's like, if all the stars align just right and the developer implements a feature X way then the wrong data could be retrieved.

I'll try pinging Taylor to peek at this later today. But fixing this seems like it may be a slow churn and possibly a big 6.0 type fix since changing this could break existing applications quite heavily.

Thank you for the full examples, they helped a ton in understanding the exact problem!

@jonashaag jonashaag changed the title Attribute shadowing with join() is a security risk Attribute shadowing with join() encourages possible data leak bugs Jan 27, 2017
@jonashaag
Copy link
Author

🎉

Thanks for being patient enough and asking all the questions. I'll definitely have to work on my "how to file a good bug report" skills :-)

@jonashaag
Copy link
Author

Also, I might be interested in coming up with the first patch for this issue.

@Garbee
Copy link

Garbee commented Jan 27, 2017

As far as bug reporting you got one part really well which is describing the technical side. However, that can be confusing which is where also providing a direct, workable code sample comes in. Beyond that, have a section that clearly describes a recommended solution and try to provide some information on foreseen implementation problems, (BC breaks, developer confusion, etc.) so there is some meat to be discussed on that end. It also works well when you use headings to show the sections of the issue and <details> elements to hide the code stuff away (or big images if you're showing a UX problem to maintainers.) That way the primary content as far as description is up front, and the code samples/auxiliary understanding bits are hidden away until the reviewer asks to see it.

@Garbee
Copy link

Garbee commented Jan 28, 2017

Also, this does cause an active exploit against sites if they use the query builder to join based directly on content sent from a client. If a user catches on and the proper checks aren't in place for the requested data, they can cause a force dump of targeted data out of the system. This is once again, "assuming all the stars align" in probability. Therefore, not a critical priority to get addressed. Still something we should help guard against if possible.

@jonashaag
Copy link
Author

I just had a look at the Eloquent/query internals and oh boy, using unique tables aliases as I initially suggested for sure doesn't look like an easy undertaking.

IIUC Eloquent constructs model instances simply by taking SQL result row arrays and setting them as model attributes. For example if your SQL query returns an array ["a" => 1, "b" => 2] then your model will have these attributes, without even checking if these attributes are part of the model's table in the first place. (Anything else would be crazy from an implementation perspective.) But also there doesn't seem to be internal housekeeping of what select/join/... statements are part of the query other than the naïve list of clauses that are in "almost final SQL" form.

It's also horrible from a deprecation path point of view, as many have suggested in the discussion already. I guess lots of people rely on the fact that model instances may "receive" attributes from other tables, aggregates, raw queries, ..., and changing that could be painful even with a long deprecation window. Maybe something along the following?

  • Step 1: Warn if duplicate/ambiguous column was found
  • Step 2: Error if duplicate/ambiguous column was found
  • Step 3: Warn if model instance "receives" attribute from joined table in an implicit manner (i.e. without explicit select())
  • Step 4: Prefix implicitly "received" attributes from joined tables with the joined table's name; error if that table name is ambiguous

Step 1 should be reasonable, maybe even step 2. Step 3 and 4 sound a bit crazy though...

@Danonk
Copy link

Danonk commented Sep 5, 2017

How am I to walkaround this bug? I want to retrieve a model, with a condition about it joined table.
I want only those Movies (table "movies") that have a Director (table "directors") that are from USA.

$movie = Movie::query()
  ->where('name', 'Fightclub')
  ->join('directors', 'director.id', '=', 'movies.director_id')
  ->where('directors.country', 'USA')
  ->first();

Now $movie->id has an id of director! not the real id

@sisve
Copy link

sisve commented Sep 5, 2017

I want only those Movies (table "movies") that have a Director (table "directors") that are from USA.

I'm not an Eloquent user, but what you explain is whereHas, not a join. The original issue only occurs if you write queries with joins manually, instead of letting Eloquent build the queries automatically. Eloquent never issues joins, afaik.

$movie = Movie::query()
    ->with('director')
    ->where('name', '=', 'Fight Club')
    ->whereHas('director', function($query) {
        $query->where('country', '=', 'USA');
    })
    ->first();

@Danonk
Copy link

Danonk commented Sep 5, 2017

Will it also eager load director record? Because of the with('director')?

@sisve
Copy link

sisve commented Sep 5, 2017

Yes, it will eager-load all directors for any movie that has at least one director with country = 'USA'. In your case it looks like a movie only has one director. You can constraint the eager-loading further if required, like only loading directors with a certain shoe size. This is all in the documentation for Eloquent: Relationships, under Querying Relations and Eager Loading.

@taylorotwell
Copy link
Member

This is how SQL works.

@sisve
Copy link

sisve commented Mar 11, 2018

@taylorotwell

No, it isn't. This is perhaps how the Eloquent handles it, or how PDO handles it. But it's absolutely not how SQL works. This issue/idea/suggestion is about fixing/supporting it. A database query can easily produce a result with columns with identical names; SELECT id AS example, email AS example, ... and somewhere the last column returned overwrites the previous ones. This is done somewhere in Eloquent or PDO, not by the database.

Throwing an exception telling the developer that there are duplicate column names (and they are expected to be unique) is a better solution that just overwriting some values as is currently done.

@jonashaag
Copy link
Author

@taylorotwell No, this is not "how SQL works."

The closest you'd get to something a query like

User::join('groups', 'primary_group_id', '=', 'groups.id')->first()->id

is the SQL statement

SELECT id from (SELECT * FROM users JOIN groups ON users.primary_group_id = groups.id)

which fails in MySQL and PostgreSQL with an error that the id column is ambiguous (MySQL: ERROR 1060 (42S21): Duplicate column name 'id').

Besides, what kind of argument is this? You've got a well-defined, real-world problem that multiple users have experienced, with possibly catastrophic outcomes. While the root cause may not be easy to fix, there's an obvious mitigation for the problems that can arise from it: Detect duplicate columns when doing an SQL query and throw a runtime error.

None of the ORMs I have ever used have had this kind of problem. It came as a nasty surprise to me: I consider protecting your users from shooting themselves into their own feet is a large part of an ORM's responsibility. (Unless users explicitly request that are protections be lifted, i.e. going to down raw SQL.)

How many more Laravel users are affected by this issue remains unclear; I expect only a very small subset of them to find this GitHub bug report.

@eoghan27
Copy link

Totally agree.

At a minimum, put it in the docs to warn devs that this situation can happen. Thanks all.

@tdondich
Copy link

Why are we not being explicit in the table columns names when using the query builder? This would avoid this problem completely. And then on result in Eloquent, you map the column names from the table name and put them into the properties appropriately.

Specifying table names in the column selects would, ultimately, rest this case and still have SQL work the way it's designed.

@sisve
Copy link

sisve commented May 25, 2018

@tdondich Because no one knows what columns exists in the database. Most queries are a simple SELECT * FROM model which means we cannot automatically alias them. And there's no metadata available to Laravel to build a list of all columns that the model uses.

@jonashaag
Copy link
Author

FYI Ruby on Rails uses inspection to retrieve the list of columns. IIRC it does this during application boot and caches it (google the term Schema Cache).

@sisve
Copy link

sisve commented May 25, 2018

Something similar is also available in Doctrine where you declare your properties on your model, and map them accordingly to database columns. This means that Doctrine knows of every database column and can alias them accordingly.

Laravel, however, does not have this functionality.

@jonashaag
Copy link
Author

Of course it doesn't have it at this point, but that's a bad argument for not adding another feature that depends on it. The solution is simple: one could add the feature.

@sisve
Copy link

sisve commented May 25, 2018

Sure. I misinterpreted the situation. I believe the current heading of this issue was to introduce exceptions when there's duplicate column names in the result.

There are also other issues that touches the subject of introducing schemas/model metadata for Laravel.
#980 (Would a schema file make sense?)
#944 (Database migration declarative approach per table, run migrate depending on the difference)

But in the end, if we want something that large, why not switch to Doctrine which has that feature already?

@hulkur
Copy link

hulkur commented Jun 21, 2018

similar issue as laravel/framework#23548 and definitely not sql problem

to avoid duplicate column conflict relation() is defined as
$this->hasManyThrough(...)->select('related.*')

$this->relation()->toBase()->get() returns id = 1
$this->relation->first() has id = 2

There is only one related model with id of 1 as returned by base sql, but somewhere while creating a related model the id is overwritten by intermediate model id

@hulkur
Copy link

hulkur commented Jun 21, 2018

Problem starts in HasManyThrough::shouldSelect where
return array_merge($columns, [$this->getQualifiedFirstKeyName()]); adds another id column into select.
If I alias it and then use the alias in buildDictionary then it works correctly

@mfn
Copy link

mfn commented Jul 26, 2018

I've stopped counting how often I've been bitten by this.

Every join between tables of same-names columns without explicit selection of columns is prone to this problem.

This is one of the things CakePHP2 "got right" [1]:

  • all results are always prefixed by the Models name
  • it too uses (cached) DB introspection to know what columns to select

That's also why, even in only a single "Model" (if you can even call it so) result, is always "nested":

$post = $Post->find('first');
// Equals this structure
$post = [
  'Post' => [
    'id',
     …
  ]
]

Under the hood:

  • columns are always explicitly selected with the applications model prefix: Post__id, Post__title,
  • when the result is returned this is post-processed and columns are split by __ to put them in the appropriate array bucket
  • That's why a single result can return columns of different tables which automatically translate to a different model => I believe this is something too not possible with Eloquent (and also doesn't make much sense the way Models work in Laravel in general), i.e.:
    SELECT posts.id AS Post__id, posts.title AS Post__title, users.name AS AssignedUser__name, …"
    turns into
    $post = [
        'Post' => [
            'id',
        ],
        'AssignedUser' => [
            'name',
        ],
    ];

This approach however is at odds with so many things how Eloquent works so pretty much unfeasible I think.

Fun fact: I recently had a performance issues with complex results and joins and there wrote exactly such a converter:

  • I returned AnotherModel__field for 1:1 joins
  • converted this to an ad-hoc real new AnotherModel and set the field
  • plugged this ad-hoc model onto the actual Model
  • so that the final step, result serialization, had properly access through it.

It felt wrong, but I couldn't find another way except doing everything manual.

There was also mentioned FETCH_OBJ vs. FETCH_NAMED: unless I'm wrong, FETCH_OBJ is essential to how Eloquent achieves great performance to hydrate the actual Models.

No solution to the problem, just providing some perspective.

[1] "got right" as in: does not suffer from this particular problem, doens't mean it's a saint with other things

@HenkPoley
Copy link

HenkPoley commented Jul 6, 2020

I guess you can "resolve" this by adding a global query scope. Untested [Doesn't work]:

class MyModel extends Model
{
    /**
     * A side-effect that is loaded when the model is 'booted', to only query the model its own columns.
     *
     * @return void We only did things in secret
     */
    protected static function booted()
    {
        static::addGlobalScope('only_my_columns', function (Builder $builder) {
            $builder->select($this->table . '.*'); // Edit: TODO: Wait a minute, you don't have access to $this in a static function.
        });
    }
}

For sure this means you need to remember to add extra columns of other tables, if you need them in one your models.

Edit: Also it doesn't work since there is no $this in a static function. I guess you could hack around this by making a public static $REAL_TABLE_NAME in the class, and loading that into $this->table during construction. And figure out Laravel Eloquent default table name if ..::$REAL_TABLE_NAME is not set.

"Hey Siri, what's an ORM?" 😅

@jonashaag
Copy link
Author

I have written a workaround for this when I initially reported the issue that forces you to always explicitly select all the columns when doing joins. Can't find the code right now though.

Honestly, at this point I would just recommend to everyone to not use Eloquent in the first place. This issue has been largely ignored by the developers for over 4 years now, with lots of people suffering from exactly the symptoms I have proclaimed, and also the Eloquent developers themselves having to plug workarounds at other places in the code because of the issue.

Eloquent developers continue to claim that "this is how SQL works" here and in other places as well, while clearly proved wrong.

@HenkPoley
Copy link

HenkPoley commented Jul 6, 2020

Yeah, naming all required columns is a best practice in SQL (relational algebra): https://stackoverflow.com/a/3639964/273668

Sadly Eloquent didn't start that explicit way. Maybe it could be added as a kind of 'Eloquent2 is the future' to the side.

Preference to the columns of your Model its own table seems pretty sane default though 🤔.


Maybe someday someone will create one of these fancy SQL to typesafe interface/classes tooling for PHP 🤷‍♂️. So you can actually know what comes out.

@zmilan
Copy link

zmilan commented Oct 4, 2020

Hi,

I'm new to the Laravel community and didn't know about this thread so I have opened another issue where I have described something that surprised me totally and that is actually the topic of this thread.

This is actually a possible security issue and I have explained it in my comment. For a short explanation, if use users and tenant_users tables to authenticate the user and check is it belongs to the tenant, you can end up with a properly authenticated user which id is overwritten with the id from tenant_users which by accident could belong to some other record from the users table which has higher privileges. When you after that use that user model to load relations it would load them by id from that other user... (there is an example of code in issue)

@jonashaag
Copy link
Author

From #34671 @driesvints:

Feel free to attempt to PR something if you find a solution

I think this is a major step forward from @taylorotwell claiming

This is how SQL works.

:-) Now we just need someone to solve the problem on the technical level and come up with a PR. 🥳

@driesvints
Copy link
Member

I didn't say it would be easy. I was just saying that if anyone finds a genuine solution to this problem that they're free to open up a PR.

@jonashaag
Copy link
Author

That's great! I really meant it. No irony or sarcasm involved. I think it is going to be really difficult to solve, and painful in terms of backwards compatibility, but definitely worth the trouble, considering that it will prevent lots of security issues and data leaks, and save lots of debugging-hours for Laravel users.

@zmilan
Copy link

zmilan commented Oct 6, 2020

I don't think that Laravel should mandatory change how it works. The biggest issue with this is that it silently overwrite value and if you are not aware of it, it raise some issues. I believe @jonashaag aready suggested how it should be fixed:

I just had a look at the Eloquent/query internals and oh boy, using unique tables aliases as I initially suggested for sure doesn't look like an easy undertaking.

IIUC Eloquent constructs model instances simply by taking SQL result row arrays and setting them as model attributes. For example if your SQL query returns an array ["a" => 1, "b" => 2] then your model will have these attributes, without even checking if these attributes are part of the model's table in the first place. (Anything else would be crazy from an implementation perspective.) But also there doesn't seem to be internal housekeeping of what select/join/... statements are part of the query other than the naïve list of clauses that are in "almost final SQL" form.

It's also horrible from a deprecation path point of view, as many have suggested in the discussion already. I guess lots of people rely on the fact that model instances may "receive" attributes from other tables, aggregates, raw queries, ..., and changing that could be painful even with a long deprecation window. Maybe something along the following?

* Step 1: Warn if duplicate/ambiguous column was found

* Step 2: Error if duplicate/ambiguous column was found

* Step 3: Warn if model instance "receives" attribute from joined table in an implicit manner (i.e. without explicit `select()`)

* Step 4: Prefix implicitly "received" attributes from joined tables with the joined table's name; error if that table name is ambiguous

Step 1 should be reasonable, maybe even step 2. Step 3 and 4 sound a bit crazy though...

From my point of view as someone who is new to Laravel raising warn/error if duplicate/ambiguous column was found would be all what I need to know. It is my responsibility like developer to solve that issue and be more preceise. My issue was that I was assumed that it works like I have expected from my previous experience with similar tools and I wasn't warned that something is not good.

I will be happy if I can help somehow with this.

@ejunker
Copy link

ejunker commented Oct 22, 2020

@HenkPoley I took your idea and was able to create a Trait that can be used in each model or if you have a base Model then you can add it there. I haven't tested it much but it seems to work for simple cases.

I was able to get the table name by using $builder->getModel()->getTable()

trait OnlyMyColumnsTrait
{
    public static function bootOnlyMyColumnsTrait()
    {
        static::addGlobalScope('only_my_columns', function (Builder $builder) {
            $builder->select($builder->getModel()->getTable().'.*');
        });
    }
}

@zmilan
Copy link

zmilan commented Nov 12, 2020

@ejunker I like your idea. I have finally got some time to check how PDO works in this case and I was very much surprised to see that it actually work the same way as Eloquent. I'm surprised because all other SQL clients that I have used ever was handling this "correctly".
After that, I have thought more about what I feel like an issue with this and based on how PDO works I believe that the best solution should be to add default select in the case when select is not provided, but only if the query is started from the model. Something similar to what you have achieved with this trait. So if "select" part is not provided, the query builder should add something like $builder->select($builder->getModel()->getTable().'.*'), but only if it is started from the model. If it starts from DB it should be left to the developer to decide.
I hope this has sense :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests