Skip to content
This repository has been archived by the owner on Jul 16, 2021. It is now read-only.

UploadedFile::store() uses md5_file() to generate a file name #161

Closed
jeroennoten opened this issue Aug 17, 2016 · 27 comments
Closed

UploadedFile::store() uses md5_file() to generate a file name #161

jeroennoten opened this issue Aug 17, 2016 · 27 comments

Comments

@jeroennoten
Copy link

A great new addition to Laravel 5.3 is the UploadedFile::store(), which also generates a filename. The generated filename is the MD5 hash of the contents of the file. So if the same file is uploaded twice, it will overwrite the first one (correct me if I'm wrong). This could be seen as a feature (it saves storage), but I think it is not the right thing to do (imagine modifying the file without changing the name). I would propose something like uniqid() to generate the filename.

@alexbowers
Copy link

It appears that the function is md5_file which is a php function to get an md5 hash of a file by filename, but not of the filename.

So i think this might be alright

@JosephSilber
Copy link
Member

JosephSilber commented Aug 17, 2016 via email

@jeroennoten
Copy link
Author

jeroennoten commented Aug 17, 2016

@alexbowers No... it returns the hash of the file contents. I think you're reading it wrong.

"Calculates the MD5 hash of the file specified by the filename parameter" (from http://php.net/manual/en/function.md5-file.php)

So it means that the file is specified by the filename parameter, not that it calculates the MD5 hash of the filename.

@alexbowers
Copy link

Ah, i misunderstood your question at the top.

@alexbowers
Copy link

@JosephSilber I guess that your code will have to check in the trigger function for if there is another reference to the file. If there isn't, then remove the file.

@JosephSilber
Copy link
Member

JosephSilber commented Aug 18, 2016 via email

@Skysplit
Copy link

I think using tempnam() could solve this problem

@jeroennoten
Copy link
Author

@Skysplit no that will not work on cloud storage disks like S3

@Skysplit
Copy link

Skysplit commented Aug 18, 2016

@jeroennoten Ah, you're right. I forgot that there could be different storage drivers.

What about appending Illuminate\Support\Str::random() result to file name?

@jeroennoten
Copy link
Author

That will not generate a unique name. Better use something like uniqid().

@franzliedke
Copy link

IIRC MD5 also has problems with conflicts that can be generated by appending well-crafted bytes to a known file. I guess that could be abused to overwrite an existing file with a malicious version.. M

@boynet
Copy link

boynet commented Aug 18, 2016

I have multiple projects that involve content uploads to cloud , in the end there is no way to generate unique name without actually checking for existence first(from the file system or db).

uniqid is good base but it's not 100% guarantee to uniqueness

@jeroennoten
Copy link
Author

@boynet You're completely right.

@LonnyX
Copy link

LonnyX commented Sep 21, 2016

With this way, we should never remove the files from the disks because their are linked with more than once.

GUID seems a approach, no?

@jstoone
Copy link

jstoone commented Nov 1, 2016

As @franzliedke, there can be problems with collisions, but php also has a sha1_file, which might be more suitable collision-wise.

So I propose that it's either:

  1. UUID, as implemented by @themsaid
  2. sha1_file, for less possible collisions

It could even be possible for the FilesHelper::hashName() to default to either UUID or sha1, and have an extra flag to use the other.

If interested I'd gladly create a PR to illustrate the implementation.

@jeroennoten
Copy link
Author

If the same file is uploaded twice, then also sha1_file() results in a collision (of course, that is the purpose of hashing).

So the solution should be a generated ID that is 'as unique as possible'.

@franzliedke
Copy link

@jeroennoten Yes, collisions between two files with the same contents are intended, of course. The problem with MD5 is that there are algorithms that allow you to generate files with different content that collide with an existing file.

@jeroennoten
Copy link
Author

Yeah that's right, but we want uniqueness for files with the same content as well.

@jeroennoten
Copy link
Author

This is now fixed and merged in PR 16193

@boynet
Copy link

boynet commented Nov 2, 2016

@jeroennoten it's not totally solved just using str_random(32)) still good chance of duplicate...

in conclusion for any future readers: if you need more than basic usage don't use this method

@ishukshin
Copy link

@boynet Have you calculated possibility to have the same filename using str_random(32). It is at least 62^32=2e57 if we assume it is truly random. Too small to exist in the Universe.

@sisve
Copy link

sisve commented Dec 19, 2017

@shukshin-ivan Are you arguing to use str_random, which has already been merged in laravel/framework#16193, or are you arguing the chance of any collisions?

@ishukshin
Copy link

The chance of collision in file names of two files in one folder.

@dereks
Copy link

dereks commented Sep 7, 2018

@jeroennoten @ishukshin

I believe this bug should be reopened.

To summarize the requirements:

  1. Filename must be unique (and not based on file contents, to prevent hash collisions).
  2. The filename must not already exist. If the filename already exists, then it should generate a different unique name and check it for existence again.
  3. It should be possible to generate a unique filename (that is tested to not exist), but without actually saving a file with that name. This is so we can use the function to generate a new filename, but pass it to other 3rd-party libraries that expect a filename for their own save() function (like the Laravolt Avatar library).
  4. All letters in the generated filename should be in lower-case, because some filesystems are case insensitive.
  5. Please do not use str_random(), in case the filenames have punctuation or special characters which are hard to work with at the shell or create false file extensions. Using base64 or hex is safest and most compatible.

Simply using a UUID is the best bet. It meets the requirements above, is a known unique key standard, and is supported by other database / key indexing software systems.

@jeroennoten jeroennoten reopened this Sep 8, 2018
@dereks
Copy link

dereks commented Sep 8, 2018

Here is some working example code. But I'm not sure if using the default Storage:: facade is the appropriate thing to do. What if the user wants to use a specific driver?


    public static function newFileUUID($path_prefix = null, $file_suffix = null)
    {
        $path = "";

        while (1) {
            // Break out when an non-existing filename UUID is found.
            $uuid = (string) Uuid::generate(4);
            $path = $path_prefix . $uuid . $file_suffix;

            $exists = Storage::exists($path);
            if (!$exists) {
                break;
            }
        }

        return $path;
    }

This example uses the laravel uuid package.

composer require "webpatser/laravel-uuid:^3.0"

I saw that Faker has a UUID but I wasn't sure if those were valid or just a repeating set.

@sisve
Copy link

sisve commented Sep 9, 2018

@dereks I see no problem using the Storage facade. The way you uses it will make it default to the default disk, but you could as well send in a disk name into the function. (Or perhaps even a Filesystem instance.)

$exists = Storage::disk($diskName)->exists($path);
$exists = $filesystem->exists($path);

However, when it comes to the actual application logic, then I see a race condition between calling exists and actually writing to the file. It's minimal with the use of uuid v4, but is probably more prominent with other ways to generate a filename.

I would also like to add another important thing; reproducability for testing. I presume you use ramsey/uuid which allow me to use a custom uuid factory to make sure I get reproducible values. I would like that the final implementation to have this functionality too.

@themsaid
Copy link
Member

Laravel uses Str::random(40) now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests