Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering of Files on Dataset Page, Search using Solr on Dataset Page #5584

Closed
djbrooke opened this issue Feb 28, 2019 · 13 comments
Closed

Filtering of Files on Dataset Page, Search using Solr on Dataset Page #5584

djbrooke opened this issue Feb 28, 2019 · 13 comments
Assignees
Milestone

Comments

@djbrooke
Copy link
Contributor

djbrooke commented Feb 28, 2019

On the dataset page, users should be able to search for AND filter files as depicted in the screenshot. This may involve some work with Solr on the page (or some representation of the Solr index that can be easily tapped into). Assigning to @scolapasta so that he can use tech hours to generate some ideas.

screen shot 2019-02-27 at 11 45 27 am

@scolapasta
Copy link
Contributor

As we only index the latest published and draft versions ("present versions"), we'd only be able to use Solr for those versions. It had previously been decided that this was ok (at least, as a first batch).

We discussed today at tech hours - we will generate a list of ids from either the solr (for present versions) or the db (for past versions). That list could then be used the same way to get the details to display on the cards. The facets themselves would only be rendered on present versions.

@djbrooke djbrooke assigned scolapasta and unassigned scolapasta Mar 27, 2019
@djbrooke djbrooke changed the title Filtering of Files on Dataset Page Filtering of Files on Dataset Page, Search using Solr on Dataset Page Mar 27, 2019
@djbrooke
Copy link
Contributor Author

  • We should check on the implications for the Solr load as part of this. Does Solr load with the page (to get the facets), or only if the user attempts to filter/search on the page?

@mheppler
Copy link
Contributor

mheppler commented Apr 1, 2019

Changes to the files table source code on the dataset pg is being revised as part of Enable the display of file hierarchy metadata on the dataset page #5572 which will impact the same HTML files this issue touches. This will need some development coordination and manual merging to resolve the expected conflicts.

@mheppler
Copy link
Contributor

mheppler commented Apr 1, 2019

Added static placeholders for file table facets on dataset pg in a new branch 5584-dataset-solr-facets.

@mheppler
Copy link
Contributor

mheppler commented Apr 17, 2019

Updated the latest from develop in order to get the new file hierarchy view toggle UI component (#5572) merged into this branch.

Screen Shot 2019-04-17 at 11 37 31 AM

@landreev
Copy link
Contributor

landreev commented May 7, 2019

(just made the PR above; just to make it easier to look at the commits. it should stay in dev. for now!)

@djbrooke djbrooke added the Large label May 9, 2019
landreev added a commit that referenced this issue May 9, 2019
- added an indexed flag, for the published files removed from the current draft;
- backward compatibility, if talking to a solr server with an older schema;
- added check for solr being down - reverting back to searching in the db if it is. (#5584)
landreev added a commit that referenced this issue May 9, 2019
…le that's been removed from the current draft. (#5584)
@mheppler
Copy link
Contributor

mheppler commented May 10, 2019

  • add render logic to hide filter facets and Sort btn when there is 1 or less files
  • add selected state bold styling to facet values in btn label, and dropdown menu
  • move dynamic file counter into file table header to make space for filter facets
  • move file thumbnails from table column into the file metadata table column to make room for the dynamic file counter in the file table header
  • move Upload Files and Download Rsync Script btns to left of search input to make space for Sort btn
  • change Sort btn style to match dv pg
  • wire up Sort btn
  • fix various responsive layout issues with file table (related to new display:flex; container around file metadata)
  • add word-break: break-all; inline style to checksum + UNF containers to fix responsive layout issues
  • move text to bundle

Screen Shot 2019-05-10 at 2 21 37 PM

@landreev
Copy link
Contributor

@mheppler I changed your rendering rules, to show the Sort button for non-indexed versions too. But that really breaks your styling of the fragment, because without the facets, the button is now hanging in the middle of all that white space:
Screen Shot 2019-05-13 at 5 54 31 PM

I'm sure you can fix it... but I would seriously consider living without it, for non-indexed versions (since they will be rarely used; and that's how the page is looking now anyway). But up to you.

I'm still working on the back end.

@landreev
Copy link
Contributor

sort button should be working now.
(for indexed and non-indexed versions both)

mheppler added a commit that referenced this issue May 15, 2019
@mheppler
Copy link
Contributor

Fixed the Sort btn layout issue @landreev discovered in old versions with no facets. Fixed other various layout issues including the checksums and UNF's getting off in small browser windows due to the new flexbox layout used with the file thumbnail and metadata layout in each row. (See attached.)

These fixes were added to the to-do list above which outlines all the moving pieces.

Screen Shot 2019-05-15 at 3 00 00 PM

@mheppler mheppler removed their assignment May 15, 2019
@landreev
Copy link
Contributor

landreev commented May 15, 2019

Notes for the reviewer(s):

In the process of working on this we realized it was impossible to accurately search for files in draft versions using our solr index. This is because we do NOT index files that have not changed between the latest published version and the draft. (this is to avoid having duplicate search cards for these files)
So for any dataset we index a) all the files in the latest published version and b) all the files that have been added in the draft (if present), as well as the files for which any metadata have changed in the draft.
What this means is we can search files in the latest published version by supplying the filter query "datasetVersionId: N". However, if you run a search with "datasetVersionId: M", where M is the version number of the draft, you're only going to find the files added or changed since the last publication. You can use "parentId: N" where N is the id of the dataset. And then you will find all the files in the published and draft versions. But that means we are still getting the files that have been deleted from the draft.
This does not cause a problem with the list of the files shown on the dataset page. Because the search results are filtered against the files in the version. BUT it can mess up the numbers in the facets, if there are any deleted files.

I solved this by adding another boolean to the solr schema - "fileDeleted". It's set to true in a solr document for a published file that's no longer found in the draft (if exists). This way we can find all the files in the draft by adding 2 filter queries: "parentId: N" and "fileDeleted: false"

The disadvantage of this solution is having to force the solr schema update (and the reindex). The search that serves the page is built with backward compatibility - so that it doesn't completely fail if it's talking to the solr server that doesn't have the new schema yet (in this case it reverts to searching without the new flag; potentially showing higher numbers in the facets).

Is it worth it? - We could alternatively just say that this is a known limitation - that the facet numbers for draft versions may show higher counts, on account of the deleted files. After all, only the dataset owners will be seeing the draft versions.

(that said, we will have to force a reindex of all everything sometime soon anyway - to get all the improved file types indexed!)

@landreev
Copy link
Contributor

Notes for QA:

This PR has a new solr schema.
The new schema, and a reindex, are required - although it's still going to work; the only problem is the numbers in the facets may be inaccurate (for drafts ONLY! - and only if any previously published files have been deleted from the draft; this is explained in the long comment above)

If solr is down, the page should still be working - the search box should still be there, just without the facets.
Similarly, this is how the page should behave for published versions other than the latest published, or the draft.

Aside from testing the searching and the facets for basic accuracy, the page should probably be tested some more with larger numbers of files.

@landreev landreev removed their assignment May 15, 2019
landreev added a commit that referenced this issue May 16, 2019
@djbrooke
Copy link
Contributor Author

djbrooke commented May 17, 2019

I'm going to do an experiment for a possible workflow change, that is, once there's a PR for something, lock the discussion in the issue and move the discussion to the PR. I think this will work better with the flow on our new project board https://github.com/orgs/IQSS/projects/2.

So, I'm going to "lock" this conversation and move the conversation to #5820. Like I said, an experiment.

@IQSS IQSS locked and limited conversation to collaborators May 17, 2019
@djbrooke djbrooke added this to the 4.15 milestone Jun 11, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants