Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search: search box returns 0 results when searching for existing/duplicate MD5 #3129

Closed
sbarbosadataverse opened this issue May 18, 2016 · 11 comments
Assignees

Comments

@sbarbosadataverse
Copy link

I was trying to locate a duplicate md5 in dataset with a large number of files but the search feature returns 0 results. MD5 i tested was tagged as a duplicate during file upload. I put the md5 into the search for the entire dataset, with 0 results.

@pdurbin
Copy link
Member

pdurbin commented May 18, 2016

@sbarbosadataverse can you please provide the md5 you were searching on?

@sbarbosadataverse
Copy link
Author

MD5: a0d3c0496ea2fa4c1795d441c2fecc8a
message during file upload: A file with this MD5 already exists in the
dataset.

On Wed, May 18, 2016 at 4:38 PM, Philip Durbin notifications@github.com
wrote:

@sbarbosadataverse
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sbarbosadataverse&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=LOFz7e5bobQjwE5fvmk7udeTNwcI974_nc5mxG8jyMY&s=LpNv2gKSAUCAWUxDela6mlqpmynYFRhhtDLucGU-32U&e=
can you please provide the md5 you were searching on?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_3129-23issuecomment-2D220150702&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=LOFz7e5bobQjwE5fvmk7udeTNwcI974_nc5mxG8jyMY&s=dWKBe8nLVDK-TGaYSc7qdMOIkrKorL_0yU4jobtWIk4&e=

Sonia Barbosa
Manager of Data Curation, IQSS Dataverse Network
Manager of the Murray Research Archive, IQSS
Data Science
Harvard University

Dataverse 4.0 is now available for use!
http://dataverse.harvard.edu

All test dataverses should be created in 4.0 Demo!
http://dataverse-demo.iq.harvard.edu/

Join our Dataverse Community!
https://groups.google.com/forum/#!forum/dataverse-community

@pdurbin
Copy link
Member

pdurbin commented May 18, 2016

@sbarbosadataverse thanks. Here's an example of a successful search from before for an MD5: #2038 (comment)

And here's a live example: https://dataverse.harvard.edu/dataverse/harvard?q=fileMd5:48a76222cf5c06cb4f2d8f75cc0caa63

@sbarbosadataverse
Copy link
Author

@pdurbin is it because i'm not using advanced search? just putting it into the file page search box?

@sbarbosadataverse
Copy link
Author

@pdurbin were you able to replicate my issue with the md5 i sent?

@sbarbosadataverse
Copy link
Author

@pdurbin any update? Thanks

@scolapasta
Copy link
Contributor

The file page search doesn't search everything currently. It runs a query in the db. What we really want is for this to go through Solr, but we'll need to resolve how we handle versions first.

So, solutions:

  1. Add Solr search here, tracked by issue Use Solr for file listing on dataset page #2455
  2. Add md5 to the limited search provided here.

(Not so great) Workaround: do the search from the dataverse page

@pdurbin
Copy link
Member

pdurbin commented Oct 23, 2016

@sbarbosadataverse when I log in to production and go to https://dataverse.harvard.edu/dataverse/harvard?q=fileMd5:a0d3c0496ea2fa4c1795d441c2fecc8a which has the MD5 you're looking for I can tell that the file being found is "2010 47.45 Le Blanc.pdf". Does that help? The key to this is knowing that "fileMd5" is the Solr field to use. Rather than hacking the URL, you can also just type "fileMd5:a0d3c0496ea2fa4c1795d441c2fecc8a" in the search box.

@sbarbosadataverse
Copy link
Author

searching using Phil's suggestion works fine!

@pdurbin
Copy link
Member

pdurbin commented Oct 25, 2016

@sbarbosadataverse great! Passing to QA at https://waffle.io/IQSS/dataverse

@kcondon
Copy link
Contributor

kcondon commented Oct 27, 2016

@pdurbin @sbarbosadataverse @scolapasta It seems from this ticket that Sonia has a workaround but is there a feature request here or expectation that md5 is searchable directly from the UI.
Also, what prompted this ticket is confusion about how duplicate files with different names but same md5s are reported, already opened in a separate ticket: #2467

I'm closing this but opening a separate ticket to allow searching on md5 from UI, #3436

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants