Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Losing results due to &start=1 #5

Open
MaelC opened this issue Jan 16, 2013 · 3 comments
Open

Losing results due to &start=1 #5

MaelC opened this issue Jan 16, 2013 · 3 comments

Comments

@MaelC
Copy link

MaelC commented Jan 16, 2013

Was working with this tool to check it out and noticed an odd behavior with small results

If you use Google and submit something that has a small number of results (less than 10) like this: site:nasa.gov intitle:"NASA - Kennedy Space Center 2012"
https://www.google.com/search?n%20um=500&q=site%3Auta.edu%20intitle%3A%22Home%20-%20College%20%20of%20Business%22&start=1#hl=en&tbo=d&sclient=psy-ab&q=site:nasa.gov+intitle%3A%22NASA+-+Kennedy+Space+Center+2012%22&oq=site:nasa.gov+intitle%3A%22NASA+-+Kennedy+Space+Center+2012%22&gs_l=serp.3...4920.6151.7.6351.5.5.0.0.0.3.210.767.0j2j2.4.0.les%3B..0.0...1c.1.45C4MqZAjKY&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&bvm=bv.41018144,d.b2I&fp=dfe86ec64229ab6&biw=944&bih=951

and compare it to the string gooDork submitting the same command using URL Encoding:

GooDork.py site%3Anasa.gov%20intitle%3A%22NASA%20-%20Kennedy%20Space%20Center%202012%22

Gives you this URL:

https://www.google.com/search?num=500&q=site%3Anasa.gov%20intitle%3A%22NASA%20-%20Kennedy%20Space%20Center%202012%22&start=1

Which is missing the first result. I'm pretty sure that's because &start=1 goes to the second page of a Google result and thus dropping results. I'm really curious if that means the first pages of results is consistently being dropped (I guess the test case is to run a search that returns between 11 and 20 results?). I'm still mucking around in your code, so I figured it would be best to put this here.

-Mael

@0xKD
Copy link
Contributor

0xKD commented Jan 16, 2013

'&start=' is hardcoded when performing a search, so all you have to do is modify this line: https://github.com/k3170makan/GooDork/blob/master/netlib.py#L58 and remove '&start=' or only append it if 'start' > 1

@k3170makan
Copy link
Owner

Awesome thanks for your help, I'll update this function as soon as possible

On Wed, Jan 16, 2013 at 2:09 PM, Kedar notifications@github.com wrote:

'&start=' is hardcoded when performing a search, so all you have to do is
modify this line:
https://github.com/k3170makan/GooDork/blob/master/netlib.py#L58 and
remove '&start=' or only append it if 'start' > 1


Reply to this email directly or view it on GitHubhttps://github.com//issues/5#issuecomment-12315754.

<Keith k3170makan http://about.me/k3170makan Makan/>

@MaelC
Copy link
Author

MaelC commented Jan 16, 2013

Sorry, I was pretty tired when I wrote this last night.

You can actually start at '0' and Google should use that as page one. So rather than remove it, just start it at zero and modify to be start-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants