-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the save() method to set relative paths for resources. Fixes frictionlessdata#41. #51
Update the save() method to set relative paths for resources. Fixes frictionlessdata#41. #51
Conversation
Hi @jonrmitchell, thanks for your contribution. Use case makes sense and quick first look at the code looks good. I have some time today in my afternoon to do a proper code review and human test. Stay tuned and thx again! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jonrmitchell I added a few comments but mostly question. Happy to discuss them (no rush). I have not yet tested the code but probably what requires my better understanding of your code/logic is the path
assignment question, in the case where you have files and URIs and the copied Datapackage may get path
overwritten and loose the URIs. I may be missing something. Good work here! We will get this done =)
src/Datapackages/BaseDatapackage.php
Outdated
$base = tempnam(sys_get_temp_dir(), 'datapackage-zip-'); | ||
$files = [ | ||
'datapackage.json' => $base.'datapackage.json', | ||
]; | ||
$ri = 0; | ||
foreach ($this as $resource) { | ||
foreach ($packageCopy->resources() as $resource) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonrmitchell hi. Since this class extends /Iterator I think you can iterate directly over $packageCopy here without having to request all resources. The cursor will start at 0 and may be more optimal for large Packages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that makes sense and I'd be fine with that.
@@ -210,19 +217,24 @@ public function save($zip_filename) | |||
} | |||
} | |||
|
|||
protected function copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are here, this could need a PHP DOC saying what it does. I know it is obvious but makes future contributor's work easier. Like the fact that it creates a new Object but skips validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, docblock, of course. I can add that. I can fix that in a new commit and put it in this same pull request, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Once the pull is open you can keep pushing commits into your local/forked same branch.
} | ||
$resource->descriptor()->path = count($resourceFiles) == 1 ? $resourceFiles[0] : $resourceFiles; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may need to test this further but let me see if I understand this: path
property according to the schema
can either be a string or an array of strings.
datapackage-php/src/Validators/schemas/data-package.json
Lines 262 to 276 in 48e73fc
"path": { | |
"propertyOrder": 30, | |
"title": "Path", | |
"description": "A reference to the data for this resource, as either a path as a string, or an array of paths as strings. of valid URIs.", | |
"oneOf": [ | |
{ | |
"title": "Path", | |
"description": "A fully qualified URL, or a POSIX file path..", | |
"type": "string", | |
"examples": [ | |
"{\n \"path\": \"file.csv\"\n}\n", | |
"{\n \"path\": \"http://example.com/file.csv\"\n}\n" | |
], | |
"context": "Implementations need to negotiate the type of path provided, and dereference the data accordingly." | |
}, |
If you have only one resource file for a resource you use the the file path/string, if not you use the array, but if fileNames (save) is empty (should not happen really) but What happens if the resource has also e.g URIs
for the same resource?
let's say mytable.csv and https://myrepo.org/data/mytable.csv?
Maybe you should take the original paths from the descriptor first and only replace the original absolute paths that match the files you are iterating over but keep the URIs if any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct, I hadn't considered the case when path is a URI. I'll make sure that's covered in the tests and then work to fix the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I played around with remote URLs some and it turns out the BaseDatapackage->save() method currently writes remote resources to disk as part of saving (even before my pull request changes). As I understand it, that's not how it's supposed to work, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. You are right, but i would say (personal opinion) this may be seen as a not so good documented expected behavior/facility that can save some steps for people building Packages. Since fopen
in PHP can deal with URLs same way as with a local file you get this for free. A datastream level on ::save
you already saw this
datapackage-php/src/DataStreams/DefaultDataStream.php
Lines 46 to 51 in 48e73fc
public function save($filename) | |
{ | |
$target = fopen($filename, 'w'); | |
stream_copy_to_stream($this->fopenResource, $target); | |
fclose($target); | |
} |
Of course it would be nice if one could flag if the local path is going to be the only one to be kept or also the original remote one,
but does not mean (I guess) that you can not still for a local resource/file add an additional ->path manually to mimic that behavior.
Not sure how the Python and JS versions deal with this. I may want to check with their interfaces to ensure we do this the same way. Probably a different ISSUE/Pull.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would be useful to be able to download remote files and save them locally into the datapackage. Perhaps that'd be a good parameter/option for the Datapackage->save
method? Or on the resource object with maybe a makeLocal()
call or similar. Either way I don't think it should be the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. probably at the Datastream level we should have flag. Will dig into the other language implementations and see how they do this. Will let you know how this goes. thx
], | ||
(object) [ | ||
'name' => 'my-renamed-tabular-resource', | ||
'path' => "resource-1.csv", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where were should test for paths that are also file name and URI maybe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Thanks for taking the time to review this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to you! This is all great and the needed changes are not many. Good work
…ackage->save to not attempt to save remote resources; Added a test to verify that http resources work as expected. Made DatapackageTest to use php-generated temporary files rather than hard-coded names Added a few docblocks
@jonrmitchell i have tomorrow a bit of of a busy day but can give your updates a spin on Tuesday morning (EST time) and come back. All looking good so far! |
@DiegoPino thanks! I'm not in a big rush really, this is just a package I'd like to use on a side project (if we can get to where it meets my use cases). If it takes a little longer to get reviewed, I don't mind. |
@jonrmitchell so sorry for the delay. It has been a crazy workweek (and still not finished). The code looks good and all works. I need to refresh my memory so will give the code a second read tomorrow and will then approve and merge. If you want to talk about the optional argument for remote resources to be downloaded/not downloaded, happy to give that a look. Still, to preserve current (expected functionality) we probably will want to make the default to download, but happy to be spoken out of that. I need to check the other packages and see their interfaces, its easier to make a point if there is consistency between language. Again sorry, this will be merged by tomorrow. Thanks! |
No worries, thanks for taking a look. From looking at the python library, Also, from what I can tell, the python library won't move files into a temporary location for zipping either (unless explicitly listed as in-memory resources). I think it expects you to already have the data files with a path relative to the datapackage.json file And if this is the intent... then perhaps this entire pull request is misguided, and adding a resource with an absolute path and expecting it to "fix" the path to be relative should not work, but paths for resources need to be expected to be used and exported exactly as provided. That would explain why this functionality that I thought was so basic could be missing. Another thing I recognized from reviewing the python library is that In any case don't feel like you need to merge this tomorrow, let's talk this through. |
But then again, if resources weren't meant to be downloaded when zipped, well that behavior wasn't working before I came along either... so I don't know. |
@jonrmitchell it took me (again) some time to think about your last post and here is my take:
Time to merge.This is all good work and very thankful for your patience. Let's keep in touch to make this package better =) |
Sounds good. Thanks for taking the time to review it. |
Overview
Update the save() method to set relative paths for resources. Clones the current package so the path changes only apply to the outputted package, which is what they do in the datapackage-js library (https://github.com/frictionlessdata/datapackage-js/blob/master/src/package.js#L266) Fixes #41.
This is only the second time I've ever submitted a pull request so if there's anything I'm doing wrong please let me know.
Please preserve this line to notify @DiegoPino (lead of this repository)