Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add more mime types based on use frequency #7502

Merged
merged 7 commits into from
Jan 12, 2021
Merged

Conversation

atrisovic
Copy link
Member

What this PR does / why we need it:

Adds new mime types based on frequency of use in HDV:

MISSING MIME TYPES                    FREQUENCY IN HDV
application/vnd.isac.fcs                          2794
chemical/x-xyz                                    2315
application/vnd.ms-excel.sheet.macroenabled.12    1261
application/postscript                             833
application/vnd.flographit                         732
application/cnt                                    476
application/x-tgif                                 192
application/vnd.ms-pki.stl                         190
text/x-pascal                                      186
text/x-vcard                                       181
text/x-fortran                                     163
application/vnd.ms-excel.sheet.macroEnabled.12      61
text/vnd.fmi.flexstor                               60
application/vnd.oasis.opendocument.chart            43
application/vnd.wolfram.mathematica.package         43
text/x-java-source                                  42
text/x-sh                                           40
application/download                                40
application/winhlp                                  36
application/msaccess                                31
application/x-research-info-systems                 31
application/vnd.tcpdump.pcap                        30
application/java-vm                                 29
application/x-compressed                            27
application/gml+xml                                 27
application/x-iwork-keynote-sffkey                  27
application/x-r-data                                26
application/photoshop                               25
application/vnd.palm                                24
application/vnd.ms-cab-compressed                   24
application/macbinary                               23
text/x-log                                          21
application/rat-file                                20
application/x-photoshop                             20
audio/x-ape                                         19
application/x-graphpad-prism-pzfx                   19
application/x-download                              18
application/vnd.oasis.opendocument.text             18
application/vnd.google-earth.kml+xml                17
text/x-objcsrc                                      17

Additional documentation:

References:
Data formats:
https://en.wikipedia.org/wiki/XYZ_file_format
https://en.wikipedia.org/wiki/Flow_Cytometry_Standard
https://www.stata.com/manuals13/g-4conceptgphfiles.pdf
https://en.wikipedia.org/wiki/STL_(file_format)
https://www.digipres.org/formats/mime-types/#text/x-comma-separated-values
Archives:
https://en.wikipedia.org/wiki/Optical_disc_image
http://fileformats.archiveteam.org/wiki/Microsoft_Excel
Code:
https://en.wikipedia.org/wiki/PostScript
The rest:
https://fileinfo.com/extension/cnt
https://en.wikipedia.org/wiki/ReStructuredText
https://en.wikipedia.org/wiki/Shapefile
https://en.wikipedia.org/wiki/Keyhole_Markup_Language

@coveralls
Copy link

coveralls commented Jan 6, 2021

Coverage Status

Coverage decreased (-0.003%) to 19.478% when pulling 944776c on atrisovic:mime_types into 28406b5 on IQSS:develop.

@mheppler mheppler self-assigned this Jan 6, 2021
@mheppler mheppler removed their assignment Jan 12, 2021
Copy link
Contributor

@mheppler mheppler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good. Added the new file types to the MimeTypeDisplay.properties in order to display friendly formatted file types on the dataset and file pg. Would ask that @atrisovic and @landreev eye ball this once more to confirm... even though I am approving and passing the PR to QA.

@mheppler
Copy link
Contributor

This PR is an extension of the Lawd's work that Leonid and I set out to deliver in #2202. This PR will continue on that effort, by stopping down the "Unknown" file types, while lifting up "Data" and other known types that bring value to the users.

Here is a look at the current Harvard Dataverse file type standings...

Screen Shot 2021-01-12 at 11 15 52 AM

@kcondon kcondon self-assigned this Jan 12, 2021
@atrisovic
Copy link
Member Author

Otherwise LGTM. 👍
Thank you!

kcondon and others added 2 commits January 12, 2021 15:11
Updated release notes to specify reindex rather than reidentify.
@mheppler
Copy link
Contributor

Ana's revisions have been committed.

Tweak based on Leonid's input
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants