Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

have scrapped phone number?? from just dial #1

Open
dvijparekh opened this issue Mar 8, 2018 · 29 comments
Open

have scrapped phone number?? from just dial #1

dvijparekh opened this issue Mar 8, 2018 · 29 comments

Comments

@dvijparekh
Copy link

No description provided.

@Alankar0416
Copy link
Owner

I was able to earlier, but it seems they have started sending svg image instead of numbers.

@Alankar0416
Copy link
Owner

@dvijparekh1995 However we can take the class name and map it from there. But this will break when they change it again.

@vishnu1991
Copy link

vishnu1991 commented Jul 12, 2018

There is a series used by JD to show phone number.
If we can extract the span>classname then we can get mobile numbers easily.

Series is as below
Number - span class="icon-XX"
1 - icon-yz
2 -icon-wx
3 -icon-vu
4 -icon-ts
5 -icon-rq
6 -icon-po
7 -icon-nm
8 -icon-lk
9 -icon-ji
0 -icon-acb

@Alankar0416
Copy link
Owner

Alankar0416 commented Jul 12, 2018

Yes, I had that in mind. But the issue is they can change the class name whenever they want and this will break then. Better to think of something concrete. The most foolproof solution is to use digit recognition on the image.

@vishnu1991
Copy link

yes i think the same. as the will surely change it.

@krishnamalireddy
Copy link

I'm not getting the phonenumbers. Can you tell me how to get phone numbers

@Alankar0416
Copy link
Owner

Alankar0416 commented Aug 20, 2018

@krishnamalireddy JD is now using svg's in place of actual numbers. That's why parsing is getting failed. There are couple of ways to get around this.

Each svg's has a unique code which can be mapped - will fail if they change mapping again
Use a digit recognition over the svg.

Unfortunately I am not getting time to develop this. Will pick it up whenever I have some bandwidth.

@hrwndr
Copy link

hrwndr commented Jan 13, 2019

@Alankar0416 Could you please demonstrate, how can we implement the numbers from svgs in code?

@AdityaMalireddy
Copy link

@Alankar0416 Could you please demonstrate, how can we implement the numbers from svgs in code?

simple solution is instead of using .string use .find_all for phone number.

You will get random code of svg's convert them

@Alankar0416
Copy link
Owner

The issue is we can to keep a map of svg code and number but it JD can change it anytime.

@AdityaMalireddy
Copy link

Ha they can change it any time. If they have changed we have to decode it again. By the way they haven't changed it for a long time

@ketanshah79
Copy link

Thanks @Alankar0416 for sharing the code.

Here is an array mapping I've used as a second pass on the csv file. I used the .find_all for phone number.

  • '<bound method Tag.find_all of ' => '',
  • '>' => '',
  • '<span class=""mobilesv icon-dc"">' => '',
  • '<span class=""mobilesv icon-fe"">' => '',
  • '<span class=""mobilesv icon-hg"">' => '',
  • '<span class=""mobilesv icon-ba"">' => '-',
  • '<span class=""mobilesv icon-acb"">' => '0',
  • '<span class=""mobilesv icon-yz"">' => '1',
  • '<span class=""mobilesv icon-wx"">' => '2',
  • '<span class=""mobilesv icon-vu"">' => '3',
  • '<span class=""mobilesv icon-ts"">' => '4',
  • '<span class=""mobilesv icon-rq"">' => '5',
  • '<span class=""mobilesv icon-po"">' => '6',
  • '<span class=""mobilesv icon-nm"">' => '7',
  • '<span class=""mobilesv icon-lk"">' => '8',
  • '<span class=""mobilesv icon-ji"">' => '9',
  • '<bound method Tag.find_all of ' => '',
  • '>' => '',

Attached is my php code.
clean_csv.php.txt

@Alankar0416
Copy link
Owner

Great work @ketanshah79
Haven't tried this code. Are you able to successfully map phone numbers with this additional script? If yes, I can add this into the original script to make things easy for everyone.

@ketanshah79
Copy link

ketanshah79 commented Feb 8, 2019 via email

@Dhiren-Biren
Copy link

only 10 data retrieving

@mps1305
Copy link

mps1305 commented Apr 20, 2019

@Alankar0416 could you please post the code along with @ketanshah79 's changes?
Need to get justdial data for a college project.
Please guys, if either of you could do it, it'll be really helpful

Thanks!

@dvijparekh
Copy link
Author

dvijparekh commented Apr 20, 2019

@Alankar0416 could you please post the code along with @ketanshah79 's changes?
Need to get justdial data for a college project.
Please guys, if either of you could do it, it'll be really helpful

Thanks!

@mps1305 check my forked repo i have made changes accordingly and its working just change url whichever you want

@mps1305
Copy link

mps1305 commented May 6, 2019

hey @dvijparekh , it was working up until sometime back. then started getting this error. Any help in this regard would be highly appreciated!
"[WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond"

@dvijparekh
Copy link
Author

it seems like justdial is blocking scraper to scrape working on it

@SuhailSaify
Copy link

SuhailSaify commented Jul 18, 2019

Hey, I have written a script that will scrape phone numbers from any JustDail Business page.
It uses the info in CSS stylesheet to create a mapping between the strings assigned to each number.
The mapping is done every time you load a page, therefore it works for every business.

Please try this:
https://github.com/SuhailSaify/Justdial-Scrapper

PS: it also scrapes other info along with Phone numbers.
(Working on July, 2019)

@krishnamalireddy
Copy link

I am getting urllib open timeout error. Is this code still working for anyone?

@abhi-ux
Copy link

abhi-ux commented Feb 6, 2020

can anyone update latest code here?

@abhi-ux
Copy link

abhi-ux commented Feb 6, 2020 via email

@builditpossible-gs
Copy link

I am about to solve this issue, can anyone help me with this error - https://stackoverflow.com/questions/60875316/typeerror-string-indices-must-be-integers-when-getting-class-fro-span-tag-using

@dvijparekh
Copy link
Author

I am about to solve this issue, can anyone help me with this error - https://stackoverflow.com/questions/60875316/typeerror-string-indices-must-be-integers-when-getting-class-fro-span-tag-using

please share link url of just dial you are trying to scrape

@builditpossible-gs
Copy link

I am about to solve this issue, can anyone help me with this error - https://stackoverflow.com/questions/60875316/typeerror-string-indices-must-be-integers-when-getting-class-fro-span-tag-using

please share link url of just dial you are trying to scrape

Solved it brother. Thank you.

@builditpossible-gs
Copy link

There is another error though
AttributeError: 'NoneType' object has no attribute 'text' on line return body.find('span', {'class':'mrehover'}).text.strip() in get_address

@dvijparekh
Copy link
Author

dvijparekh commented Mar 27, 2020

There is another error though
AttributeError: 'NoneType' object has no attribute 'text' on line return body.find('span', {'class':'mrehover'}).text.strip() in get_address

it means it is not able to find span tag having class mrehover so body.find is returning none which doesnt have any method or attribute text()
try below code and let me know what are you getting from it

tesVar = body.find('span', {'class':'mrehover'})
print(`tesVar)

@alokm014
Copy link

Hey, use this method https://youtu.be/EkbF5JwuHqU

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests