Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Trends changes (error 200) #273

Closed
PMassicotte opened this issue May 29, 2018 · 19 comments
Closed

Google Trends changes (error 200) #273

PMassicotte opened this issue May 29, 2018 · 19 comments

Comments

@PMassicotte
Copy link
Owner

PMassicotte commented May 29, 2018

GeneralMills/pytrends#243 (comment)

Hint: https://cran.r-project.org/web/packages/curl/vignettes/intro.html#reading_cookies

@PMassicotte PMassicotte changed the title Google Trends changes (error 201) Google Trends changes (error 200) May 29, 2018
@zachokeeffe
Copy link
Contributor

I got it to work. Do this:

h <- new_handle()
req <- curl_fetch_memory("http://apis.google.com/Cookies/OTZ", handle = h)
handle_cookies(h)

Then, modify get_widget like so:

(widget <- curl::curl_fetch_memory(url,h))

I got status 200, and then was able to use that to extract the data.

@SaulCes
Copy link

SaulCes commented May 29, 2018

@zachokeeffe would you mind explaining the get_widget part a bit further? What is supposed to go in the url variable?

@zachokeeffe
Copy link
Contributor

zachokeeffe commented May 29, 2018

@SaulCes in zzz.R (https://github.com/PMassicotte/gtrendsR/blob/master/R/zzz.R), get_widget is defined. It is the first thing that requests info from Google, before grabbing actual trend data (which it requires a token for). While other functions (interest_over_time, interest_by_region) also use curl::curl_fetch_memory(), you only need to specify the cookie handler, h, in the curl_getch_memory() call in get_widget(). You would change the line (74) from:

widget <- curl::curl_fetch_memory(url)

to

widget <- curl::curl_fetch_memory(url, h) # note that the second argument, h, is the cookie handler

And you only need to get the cookie once in the session; then you can repeatedly use that handler. So, you could, e.g., have a function like "get_api_cookies()" that the user has to run once, or, probably better for users, you could have an if() statement in get_widget that checks to make sure the handler is available, and if it is not, create it, and then proceed.

@PMassicotte
Copy link
Owner Author

@zachokeeffe Mind to create a PR?

@SaulCes
Copy link

SaulCes commented May 30, 2018

@zachokeeffe thanks so much - working well!

@herrmannrobert
Copy link

Thanks a lot @zachokeeffe - performs like new...

@Christianmontes
Copy link

Christianmontes commented May 30, 2018

so, how do we implement the change? im not sure how to edit that line. any easy way to install from your repo?

@eddelbuettel
Copy link
Collaborator

@Christianmontes Just sit back and watch #274

@bbakk14
Copy link

bbakk14 commented May 30, 2018

Sorry if I shouldn't be asking here but would anyone be able to tell me how to swap (am an R novice)

"http://trends.google.com/Cookies/NID"

into

curl_fetch_memory("http://apis.google.com/Cookies/OTZ", handle = h)

in my own installation of the package?

I am behind a firewall that blocks social media sites like Google Plus which seems somehow connected to the OTZ cookie and throws an error. It appears that the NID cookie would work when running

curl_fetch_memory("http://trends.google.com/Cookies/NID", handle = h)

outside of the package.

@zachokeeffe
Copy link
Contributor

zachokeeffe commented May 30, 2018

@bbakk14 Interesting. That's good to know. Try installing my fork to see if the alternative link works for you. If it does, we should probably adopt it as the default. In a new session try,

library(devtools)
install_github('zachokeeffe/gtrendsR',force=T) # this is my repository, which will overwrite @PMassicotte's
library(gtrendsR)
testgt<-gtrends('pizza',time='2010-01-01 2010-01-15',geo='US',gprop='web',category=0,hl='en-US')
head(testgt)

Does that work?

@bbakk14
Copy link

bbakk14 commented May 30, 2018

@zachokeeffe It works!! Thanks for the solution and quick response.

@bbakk14
Copy link

bbakk14 commented Jun 4, 2018

@zachokeeffe Unfortunately, it looks like the NID cookie has stopped working for me as well (connection timing out). It doesn't seem that this is related to Google+ at all this time though. Would you potentially have any other tips?

@zachokeeffe
Copy link
Contributor

@bbakk14 Are you still having this issue? Does it occur in a fresh R environment? If so, could you please provide the exact error message?

@bbakk14
Copy link

bbakk14 commented Jun 5, 2018

@zachokeeffe Thanks for taking a look. I get the following error when using gtrends() in a fresh R environment

Error in curl::curl_fetch_memory(cookie_url, handle = cookie_handler) : 
  Timeout was reached: Connection timed out after 10000 milliseconds

I also get the time out error when calling curl_fetch_memory seperately

Error in curl_fetch_memory("http://trends.google.com/Cookies/NID", handle = h) : 
  Timeout was reached: Connection timed out after 10015 milliseconds

I don't know if it helps but the following is produced by GET()

httr::GET("http://trends.google.com/Cookies/NID", 
+                   use_proxy(url = my proxy, port = my port), verbose())
-> GET http://trends.google.com/Cookies/NID HTTP/1.1
-> Host: trends.google.com
-> User-Agent: libcurl/7.59.0 r-curl/3.2 httr/1.3.1
-> Accept-Encoding: gzip, deflate
-> Proxy-Connection: Keep-Alive
-> Accept: application/json, text/xml, application/xml, */*
-> 
<- HTTP/1.1 301 Moved Permanently
<- Location: https://trends.google.com/trends/Cookies/NID
<- Content-Type: text/html; charset=UTF-8
<- X-Content-Type-Options: nosniff
<- Date: Thu, 31 May 2018 12:39:20 GMT
<- Expires: Sat, 30 Jun 2018 12:39:20 GMT
<- Server: sffe
<- Content-Length: 241
<- X-XSS-Protection: 1; mode=block
<- Cache-Control: public, max-age=2592000
<- Proxy-Connection: Keep-Alive
<- Connection: Keep-Alive
<- Age: 465355
<- 
-> CONNECT trends.google.com:443 HTTP/1.1
-> Host: trends.google.com:443
-> User-Agent: libcurl/7.59.0 r-curl/3.2 httr/1.3.1
-> Proxy-Connection: Keep-Alive
-> 
<- HTTP/1.1 200 Connection established
<- 
-> GET /trends/Cookies/NID HTTP/1.1
-> Host: trends.google.com
-> User-Agent: libcurl/7.59.0 r-curl/3.2 httr/1.3.1
-> Accept-Encoding: gzip, deflate
-> Accept: application/json, text/xml, application/xml, */*
-> 
<- HTTP/1.1 301 Moved Permanently
<- Location: /trends/
<- Content-Type: text/html; charset=UTF-8
<- Content-Encoding: gzip
<- Date: Tue, 05 Jun 2018 21:55:15 GMT
<- Expires: Tue, 05 Jun 2018 21:55:15 GMT
<- Cache-Control: private, max-age=0
<- X-Content-Type-Options: nosniff
<- X-Frame-Options: SAMEORIGIN
<- X-XSS-Protection: 1; mode=block
<- Server: GSE
<- Alt-Svc: quic=":443"; ma=2592000; v="43,42,41,39,35"
<- Transfer-Encoding: chunked
<- 
-> GET /trends/ HTTP/1.1
-> Host: trends.google.com
-> User-Agent: libcurl/7.59.0 r-curl/3.2 httr/1.3.1
-> Accept-Encoding: gzip, deflate
-> Accept: application/json, text/xml, application/xml, */*
-> 
<- HTTP/1.1 200 OK
<- Content-Type: text/html; charset=utf-8
<- Cache-Control: no-cache, no-store, max-age=0, must-revalidate
<- Pragma: no-cache
<- Expires: Mon, 01 Jan 1990 00:00:00 GMT
<- Date: Tue, 05 Jun 2018 21:55:15 GMT
<- Content-Encoding: gzip
<- P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
<- X-Content-Type-Options: nosniff
<- X-Frame-Options: SAMEORIGIN
<- X-XSS-Protection: 1; mode=block
<- Server: GSE
<- Set-Cookie: NID=131=jJsSRYFwjGg_iCOcypLFGY2h7F_rQ3Ui49cbk8mLvywMSLkPNxchZnDtkau9Ekl2o1W-3BfRkAIUyMOCQ6OesHZm1_BdUCqy39YtvFxiBj1He0Uel3sWTkNlZOvitpl8;Domain=.google.com;Path=/;Expires=Wed, 05-Dec-2018 21:55:15 GMT;HttpOnly
<- Alt-Svc: quic=":443"; ma=2592000; v="43,42,41,39,35"
<- Transfer-Encoding: chunked

@zachokeeffe
Copy link
Contributor

@bbakk14 Have you tried setting a higher timeout threshold? E.g., before running it, try:

options(timeout= 4000000)

@bbakk14
Copy link

bbakk14 commented Jun 11, 2018

@zachokeeffe Still having the same issue. Entered options(timeout= 4000000) but still reach a timeout after 10015 milliseconds. It looks like @klarioui has had the same issue as well.

@klarioui
Copy link

@bbakk14 I do have the same error and tried changing the timeout option to no avail... Seems the problem is related to my server, still looking into it.

@bbakk14
Copy link

bbakk14 commented Jun 19, 2018

@klarioui same on my side, please keep us posted if you have any breakthroughs

@EmilioDatalio
Copy link

@zachokeeffe Thanks a lot for providing this fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants