-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow users to override content type for aux files #8241 #8282
Conversation
@@ -577,6 +577,7 @@ private boolean isAuxiliaryObjectCached(StorageIO storageIO, String auxiliaryTag | |||
} | |||
} | |||
|
|||
// TODO: Return ".md" for "text/markdown" as well as other extensions in MimeTypeDetectionByFileExtension.properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW: We have FileUtil.generateOriginalExtension for tabular files but it uses a hardcoded list rather than the properties file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
export FILE_ID='12345' | ||
export FORMAT_TAG='dpJson' | ||
export FORMAT_VERSION='v1' | ||
export TYPE='DP' | ||
export SERVER_URL=https://demo.dataverse.org | ||
|
||
curl -H X-Dataverse-key:$API_TOKEN -X POST -F "file=@$FILENAME" -F 'origin=myApp' -F 'isPublic=true' -F "type=$TYPE" "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION" | ||
curl -H X-Dataverse-key:$API_TOKEN -X POST -F "file=@$FILENAME;type=$FILETYPE" -F 'origin=myApp' -F 'isPublic=true' -F "type=$TYPE" "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very mildly rubbed the wrong way by the fact that there are now 2 instances of type=
in the command line above... Would have been better if we had named that type
parameter something more specific (auxtype
? dvtype
? - idk, something indicating that this is our, proprietary type...) to differentiate it from the standard mime type... But I'm not going to propose changing it now.
This API is for software clients only; nobody should ever be using it on the command line except an occasional developer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same thought: two cases of "type" is weird and unfortunate. Perhaps we'll do a final cleanup of this API before we move its docs from the dev guide to the API guide and right some of these wrongs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether this specific parameter is worth changing or not, I really like the fact that we have a note in the doc saying that this API is experimental, so we reserve the right to keep changing it without guaranteeing backward compatibility, etc.
@@ -97,8 +99,12 @@ public AuxiliaryFile processAuxiliaryFile(InputStream fileInputStream, DataFile | |||
storageIO.saveInputStreamAsAux(di, auxExtension); | |||
auxFile.setChecksum(FileUtil.checksumDigestToString(di.getMessageDigest().digest())); | |||
|
|||
Tika tika = new Tika(); | |||
auxFile.setContentType(tika.detect(storageIO.getAuxFileAsInputStream(auxExtension))); | |||
if (mediaType != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question - is mediaType
(formDataBodyPart.getMediaType();
) definitely going to be null
here, if not supplied by the client? - As opposed to defaulting to something like application/octet-stream
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. You're right. When I leave off ;type=WHATEVER
(the old way), mediaType
is non-null and its toString
method prints application/octet-stream
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it should just be if (mediaType != null && (!mediaType.equals("application/octet-stream")))
here. We are doing something similar throughout the "normal" datafile upload process.
Otherwise everything looks good in the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't make any code changes but I tweaked the tests so I could make assertions on what happens if you leave off ;type=WHATEVER
: 787dbed
I'm happy to make changes, of course. Right now we're saying the user can change the content type to whatever we want. We could change is such that if they supply application/octet-stream
as a content type we'll run it though Tika and save whatever Tika says it is. That is, we could do something like this:
if (mediaType != null && !mediaType.toString().equals("application/octet-stream")) {
auxFile.setContentType(mediaType.toString());
} else {
Tika tika = new Tika();
auxFile.setContentType(tika.detect(storageIO.getAuxFileAsInputStream(auxExtension)));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what I was saying above, how that "if" line should look like. Nobody would ever want to supply "application/octet-stream" as the type. That type just means "type unknown" or "type not specified".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(otherwise Tika will never be called... and we want to leave this option, of not supplying the type and letting the application identify it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry, I didn't see that comment until I had written mine. I just pushed the change in 985b552. Please take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it looks good now - but I'd like @scolapasta and @qqmyers to chime in, on whether we really want to be doing this instead, and not in addition to saving the file name, as uploaded.
"application/octet-stream" is the default when the user doesn't supply a content type. So if it's this, send it through Tika. Yes, a user can supply "application/octet-stream" and this will also be sent through Tika.
Tika tika = new Tika(); | ||
auxFile.setContentType(tika.detect(storageIO.getAuxFileAsInputStream(auxExtension))); | ||
// The null check prevents an NPE but we expect mediaType to be non-null | ||
// and to default to "application/octet-stream". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I believe the null check is necessary, since the mediaType can still be null, if an input stream is used for the file body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, the code looks good, and everybody appears to be in agreement, that this is how we want to address both of the underlying issues, the mime type and the file name.
If somehow some picky user requests being able to have non-standard extensions, and/or custom download filenames for these, we'll just addres it separately.
@pdurbin Would you update this pr from develop? I'm getting a flyway mismatch error: |
What this PR does / why we need it:
Some auxiliary were being saved with the wrong content type (MIME type) but now the user can supply the content type on upload, overriding the type that would otherwise be assigned.
Which issue(s) this PR closes:
Closes #8241
Special notes for your reviewer:
Markdown files can now be given a better content type of
text/markdown
but on download, no file extension is added. This is because Tika's getDefaultMimeTypes doesn't include "text/markdown". I left a TODO about how perhaps we could look up more esoteric content types in the MimeTypeDetectionByFileExtension.properties file we maintain (.geojson was added recently, for example).Suggestions on how to test this:
Run the curl commands. You should be able to set the content type to whatever you want.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
No.
Is there a release notes update needed for this change?:
Included but feel free to leave it out.
Additional documentation:
Included.