Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets not showing on google #1

Closed
chaudj opened this issue Feb 26, 2019 · 9 comments
Closed

Datasets not showing on google #1

chaudj opened this issue Feb 26, 2019 · 9 comments
Assignees

Comments

@chaudj
Copy link

chaudj commented Feb 26, 2019

Data sets are not showing up on google search results. We are lucky they are being picked up by an external harvester that links to Dataverse otherwise, it would be impossible to find the data sets solely relying to google search.

WorldFish Dataverse https://dataverse.harvard.edu/dataverse/worldfish

@pdurbin
Copy link
Member

pdurbin commented Feb 26, 2019

@chaudj hi! @djbrooke indicated here that we'll be fixing our robots.txt file this week: https://groups.google.com/d/msg/dataverse-community/RZb6SsuL17E/Y_Xomd8bAQAJ

I believe this may have been done yesterday so I'll move this to QA. Thanks for letting us know!

@pdurbin
Copy link
Member

pdurbin commented Feb 26, 2019

I guess the other thing I'll add is that search engines like Google will have an easier time indexing content from a Dataverse installation if the new sitemap feature is enabled. We used #4261 to track the feature and it was added in Dataverse 4.10.

Here are the docs: http://guides.dataverse.org/en/4.11/installation/config.html#creating-a-sitemap-and-submitting-it-to-search-engines

Here's a related post I made about this yesterday: https://groups.google.com/d/msg/dataverse-community/RZb6SsuL17E/rmCPrCbmAwAJ

@djbrooke djbrooke self-assigned this Feb 26, 2019
@djbrooke
Copy link
Contributor

djbrooke commented Mar 1, 2019

Just an update on this, the robots.txt was modified earlier this week, so we should see results in Google Dataset Search soon. They aren't there yet, but I'll check on Monday.

@djbrooke
Copy link
Contributor

djbrooke commented Mar 5, 2019

Not there yet, I talked with @landreev about making some adjustments to robots.txt

@landreev
Copy link
Collaborator

landreev commented Mar 5, 2019

I tried yesterday to explicitly tell googlebot to come and index this specific dataverse, using their "search console" (https://search.google.com/search-console). I haven't been able to "verify ownership" of the site - I used the "html tag" method, and I'm pretty sure I followed their instructions correctly, but it still refused to listen to me.

I'm going to try again today, using sitemap instead.

@landreev landreev self-assigned this Mar 6, 2019
@landreev
Copy link
Collaborator

landreev commented Mar 6, 2019

We have successfully requested a recrawl and reindexing of this dataverse by Google. They don't give any specific promises on how long it's going to take, for the updated results to start appearing in searches.
We are definitely still having problems with Google not indexing our holdings as actively as we want - still trying to figure out what's going on.

@landreev
Copy link
Collaborator

landreev commented Mar 6, 2019

(But at least I am seeing the bot crawling the dataverse in the access logs now - which is a huge step forward)

@djbrooke djbrooke transferred this issue from IQSS/dataverse Mar 6, 2019
@djbrooke djbrooke assigned landreev and unassigned landreev and djbrooke Mar 6, 2019
@landreev
Copy link
Collaborator

@chaudj Hello, I am now seeing your most recent datasets appear in Google search results; for example:
Screen Shot 2019-03-11 at 11 12 16 AM
Screen Shot 2019-03-11 at 11 12 57 AM
Screen Shot 2019-03-11 at 7 58 26 PM
etc.
I haven't checked them all; but If any other datasets are still not showing as expected, hopefully the index will refresh shortly - because Google's crawler has accessed and read all of them in recent days.

@chaudj
Copy link
Author

chaudj commented Mar 13, 2019

@chaudj Hello, I am now seeing your most recent datasets appear in Google search results; for example:
Screen Shot 2019-03-11 at 11 12 16 AM
Screen Shot 2019-03-11 at 11 12 57 AM
Screen Shot 2019-03-11 at 7 58 26 PM
etc.
I haven't checked them all; but If any other datasets are still not showing as expected, hopefully the index will refresh shortly - because Google's crawler has accessed and read all of them in recent days.

Great! Thanks, Seems to be working okay since Monday!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants