Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Custom Homepage - ROUND TWO #5445

Closed
19 of 22 tasks
mheppler opened this issue Jan 9, 2019 · 31 comments
Closed
19 of 22 tasks

Dynamic Custom Homepage - ROUND TWO #5445

mheppler opened this issue Jan 9, 2019 · 31 comments
Assignees

Comments

@mheppler
Copy link
Contributor

mheppler commented Jan 9, 2019

Misc HTML + CSS + layout improvements

  • Make symbols in the big red buttons gray. See mockup.
  • Make the rules (lines) that are above "Find data across...", above “Activity” and above "Looking for other online..." bolder – if "hairline" then 1px, if 1px, then 1.5 or two (see mockup)
    ...
  • “Browse by subject” font to be the same as “Harvard Dataverse is a digital data repository…”
    ...
  • Recent datasets: change titles to “Datasets from Journal dataverses” and "Datasets from other Dataverses”
  • Three columns of datasets – one on left is journals, and two on right are latest not journals. Header runs across both “other” columns (see mockup, previous version)
  • Name of dataverse should be bold
  • Remove dash before date in display of datasets
  • "All recent journal activity" needs to go to search results showing journal dataverses and datasets, both facet boxes checked
  • "All recent researchers activity" link – Change text and behavior. Text to say: "All recent activity". Behavior to load the regular search page (same as “all subjects”)
    ...
  • Activity section: add more counts (UI), and have them be correct, responsive layout (see new mockup below, Homepage Count Updates #5447)
    ...
  • "Looking for other repositories...?" section needs line break after DASH for responsive layouts

Javascript fixes

  • Handle cases with no JS through no script tag
  • Remove "Other" in JS
  • “Search over ###### datasets...” placeholder count needs a comma
  • Subject counts (1146) need commas
  • Activity counts (4936934) need commas
  • Search input watermark dataset count quickly blinks "75,000" before changing to comma-less count
  • Recent dataset dates (Wed Jan 09 2019) need comma (e.g. Wed Jan 09, 2019), remove day of week, remove "0" if the day is a single digit

Other customization fixes

  • Header: add 2px above and below Harvard logo (or make the logo slightly smaller)
  • Header & footer: change both to solid background #ececec

Homepage template fixes

  • Change page title from "Harvard Dataverse Dataverse" to "Harvard Dataverse"

Additional curation efforts

  • Journal images – how might we include images of the Journal, if no image of data to show? I (or Dwayne, or Mike) could make images for each journal if the journal does not have a representative image of the journal cover we can use

Related GitHub Issues

Updated Activity section
screen shot 2019-01-10 at 2 17 25 pm

Misc notes...

  • Retest this: The navigation is really off. It's hard to get back to the "front" page once you use it to get to production...so sometimes I get the production page, sometimes I get the new homepage
  • Confirm settings: Create dataverse link behavior: link goes to the right page but 'dataverse' is automatically added to my repository name, which is not required anymore – it can be called anything, not just "dataverse."
@scolapasta
Copy link
Contributor

scolapasta commented Jan 9, 2019

Related to the "Activity download count being off" to-do list item: #4970

@matthew-a-dunlap
Copy link
Contributor

matthew-a-dunlap commented Jan 9, 2019

For the "Activity" download counts problem, a short-term fix could be to just remove that section of the html until we get the metrics to line up in a future release

@scolapasta
Copy link
Contributor

Should Search input watermark dataset count be 27.4k (# of datasets added?) or 81.2k (total number including harvested)? I'd vote for the latter

@djbrooke
Copy link
Contributor

djbrooke commented Jan 9, 2019

There's some more feedback coming from @mercecrosas for this issue. @TaniaSchlatter will add it tomorrow morning.

@sbarbosadataverse
Copy link

sbarbosadataverse commented Jan 9, 2019 via email

@TaniaSchlatter
Copy link
Member

@scolapasta Search input watermark dataset count should be @ 81.2k – total number including harvested.

@mheppler
Copy link
Contributor Author

Wanted to record this Stack Overflow resource for new column CSS properties used in the subject count and recent dataset sections.

@landreev
Copy link
Contributor

landreev commented Feb 7, 2019

Regarding the harvested datasets:
We do NOT populate the publicationdate of harvested datasets. We only fill the creationdate - and since all the harvested datasets are published by definition, it can be assumed to also be the publicationdate.
The harvested datasets in the database that happen to have the publicationdate are the legacy ones that were migrated from DVN3.

We can discuss changing this arrangement separately. But for the purposes of this issue, we should simply go ahead and change the dataset-counting queries to work based on this definition, that all the harvested datasets should be counted as published.

So instead of doing
"SELECT ... WHERE ... dvobject.publicationdate IS NOT null"
we should be doing
"SELECT ... WHERE ... (dvobject.publciationdate IS NOT null OR dataset.harvestingclient_id IS NOT null)"

@matthew-a-dunlap
Copy link
Contributor

@landreev Thanks for investigating this! I'll make the change :)

matthew-a-dunlap added a commit that referenced this issue Feb 7, 2019
We will try to add this again later. Is it not actually required.
@matthew-a-dunlap
Copy link
Contributor

matthew-a-dunlap commented Feb 7, 2019

I've run into more problems that I thought trying to get all the file/dataset queries to work dynamically for harvested/local. I removed the dataLocation option from all files queries (as we don't use them in homepage anyways) and from dataset/bySubject . The harvest/local/all queryParam for the other dataset queries seems to work well.

After removing this from dataset/bySubject I realized that it was a hard requirement for homepage to get all the results. Talking with @landreev earlier, we agreed that the base query that we had used for datasets/files is a bit confusing and should be rewritten, but I had hoped to avoid doing that as part of the homepage story.

We may be able to sidestep this issue somewhat by writing a different/simpler query that gets the subject counts without caring about the timestamp, and having that return harvest/local. But it'll make the metrics api a bit more confusing and is still work.

I'm out tomorrow and will be unable to work on this. Feel free to revert my last two commits if needed to work on the bySubject query.

@matthew-a-dunlap
Copy link
Contributor

matthew-a-dunlap commented Feb 8, 2019

btw, the approach I was trying was to update this section of bySubject/toMonth:

from datasetversion where datasetversion.dataset_id || ':' || datasetversion.versionnumber + (.1 * datasetversion.minorversionnumber) in

removing it to be how the basic toMonth query is now. There may be some problem with this tho as harvested datasets may not have a datasetversion.

@landreev
Copy link
Contributor

landreev commented Feb 8, 2019

I can definitely help figuring out better queries there.
Just to confirm that I'm reading this correctly - the "totals" queries are now working correctly (for local, harvested and/or both); and the bySubject query is working correctly for local datasets, but not for harvested ones - ? - I'll look into it.

And yes, it looks like the only harvested datasets that have numeric version numbers are the ones harvested from other Dataverses. The ones harvested from generic OAI archives and such don't. Whether this is a problem necessarily - we need to find out; that fragment in the query:

... ':' || datasetversion.versionnumber + (.1 * datasetversion.minorversionnumber) ...

may simply become a "0" when the version numbers are missing; and it would still uniquely identify the dataset, in combination with the dataset id.

@landreev
Copy link
Contributor

landreev commented Feb 8, 2019

(and yes, the bySubjectToMonth should be the same query as bySubject - but with the time argument added...)

@landreev landreev self-assigned this Feb 8, 2019
@matthew-a-dunlap
Copy link
Contributor

@landreev thats correct the totals look to be working correct now. Thanks for looking into this.

@landreev
Copy link
Contributor

landreev commented Feb 8, 2019

so yeah, these lines:

datasetversion.dataset_id || ':' || max(datasetversion.versionnumber + (.1 * datasetversion.minorversionnumber))

or

datasetversion.dataset_id || ':' || datasetversion.versionnumber + (.1 * datasetversion.minorversionnumber)

both result in empty strings when versionnumber and/or minorversionnumber are null. so count(*) works - it just counts lines, regardless of the content. But "where ... in ..." using this expression only finds the versions with the version numbers present.

(I'm working on a simpler query)

landreev added a commit that referenced this issue Feb 8, 2019
@landreev
Copy link
Contributor

landreev commented Feb 8, 2019

OK, I haven't really made it simpler per se; I'm still relying on the "max(datasetversion.versionnumber + (.1 * datasetversion.minorversionnumber))" gimmick in order to select the latest released version, for the local datasets (haven't been able to think of a simpler/cleaner query).
But I got it to work with harvested datasets, and I used a simpler query for those - that relies on the assumption that all the harvested datasets are published, and that there's only one version per dataset.

(I've only modified the datasets/bySubjectToMonth query; if any other similar queries in there need to be able to select either local, or harvested, or both - they need be similarly modified)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants