Skip to content

Commit

Permalink
Changed code base to OOPS and made Extendable
Browse files Browse the repository at this point in the history
  • Loading branch information
israfil-miya committed Dec 4, 2022
1 parent 403340a commit a5bfe46
Show file tree
Hide file tree
Showing 12 changed files with 1,339 additions and 14,865 deletions.
257 changes: 210 additions & 47 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,78 +8,241 @@ npm i eshop-scraper

## What it does (in short)

This package that can be used for getting some important data like **price, currency, name** from various famous websites like **Amazon, Steam, Walmart etc.**
This package can be used for getting some important data like **price, currency, name** from various famous e-commerce websites like **Amazon, Steam, Walmart etc.**

## Example code snippet
## Support

### Code:
```json
{
"node": ">=16.16.0",
"npm": ">=8.11.0"
}
```

## Getting Started

### Create a instance of eshop-scraper class

```js
import getPrice from 'eshop-scraper'
;(async () => {
let response = await getPrice(
'https://bikroy.com/en/ad/computer-pc-for-sale-rangpur-33',
'USD',
)
console.log(response)
})()
import eshop_scraper from 'eshop-scraper'
const scraper = new eshop_scraper()
```

### Output:
### Use `.getData()` method of the class to scrape

```console
```js
import eshop_scraper from 'eshop-scraper'
const scraper = new eshop_scraper()(async () => {
let res = await scraper.getData('https://www.test.com/product/355223235')

console.log(res)
})
```

## .getData()

The method is used to scrape an website data that's entry is available in `_webprops`.

### Parameter

The method takes only one single parameter.<br/>
Pass the link of the item you want to scrape inside the function.

```js
scraper.getData(link)
```

### Output

It will will output a promise. Use _async/await_ to handle the output.<br/>

Sample output:

```js
{
price: 160.95,
price: 140.36,
currency: 'USD',
name: 'Computer Pc',
site: 'Bikroy.com',
link: 'https://bikroy.com/en/ad/computer-pc-for-sale-rangpur-33'
name: 'Test Item',
site: 'Test',
link: 'https://www.test.com/product/355223235'
}
```

## More info
## Config

The module takes **2 parameters**.
First parameter is the absolute **URI** of an item and the second parameter is the **Currency Code** in string (it's optional).
Pass new configs inside the class to config some extra things. It's optional because some common configs already included in the scraper to use without any problem.

## Features
### Insert new entries

1. Can get Price of an item
2. Can get the Currency Code of an item
3. Can automatically detect the website from just a link (only for supported websites)
4. Can convert price to almost any country's local currency, only by the Currency Code
You can insert new entries in the scraper, then you can scrape items from that website just like default entries.

## Supported websites
```js
import eshop_scraper from 'eshop-scraper'

// create a map with new entries
const propsList = new Map([
[
'test.com',
{
// website's domain or subdomain
site: 'Test', // website's name
selector: {
price: ['span[itemprop="price"]'], // items's price html selector
name: ['h1[itemprop="name"]'], // items's name html selector
},
},
],
// follow the same structure and add many more sites, inside the map
])

const config = {
webprops: propsList, // pass a map with new entries in webprops
}

Sorry to say that it doesn't support many websites. But it does support most of the popular e-shops also it's extendable so, you can add many more websites very easily (check `website_props.js` and `main.test.js` files). If any of the bottom websites doesn't work then please open an issue on github. Contributions are welcomed!
const scraper = new eshop_scraper(config)
```

### List:
### Replace or exclude extra things

1. Othoba (othoba.com)
2. Steam (store.steampowered.com)
3. Amazon (amazon.com, amazon.in)
4. Walmart (walmart.com)
5. Crutchfield (crutchfield.com)
6. Playstation (store.playstation.com)
7. Priceminister (fr.shopping.rakuten.com)
8. Ebay (ebay.com)
9. Ebags (ebags.com)
10. Bikroy (bikroy.com)
11. Flipkart (flipkart.com)
12. Etsy (etsy.com)
Exclude extra things to make the scraper work. The scraper needs to get a string like "\$50.30" or "USD 40" or "30 \$" from the price selector.

## Tests
```js
import eshop_scraper from 'eshop-scraper'

```console
npm run test
const obj = {
'price is:': '', // pass empty string to exclude
now: '',
usd: '$', // replace one string with another
}

const config = {
replaceobj: obj, // pass an object in replaceobj
}

const scraper = new eshop_scraper(config)
```

OR
### Insert new currencies

```console
npm run test:watch
Some websites may show prices in bitcoin or some unknown currency, to show them in proper way you need to map them. Otherwise you will get `undefined` in `currency` output.

```js
import eshop_scraper from 'eshop-scraper'

// create a map with new currencies
const currencyList = new Map([
[
'$',
'USD',
['euro', ''],
'EUR', // to map multiple strings to one currency put the strings inside an array
],
// follow the same structure and add many more currencies, inside the map
])

const config = {
currencymap: currencyList, // pass a map with new currency entries in currencymap
}

const scraper = new eshop_scraper(config)
```

### Insert new set of headers

To make scraper looks realistic and prevent the website from blocking the ip, realistic headers needed to be set.

```js
import eshop_scraper from 'eshop-scraper'

const newheaders = [
{
Accept:
'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'User-Agent':
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36',
},
{
Accept:
'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'User-Agent':
'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0',
},
// add many more headers inside the array
]

const config = {
headersarr: newheaders, // pass a arrray with new set of headers in headersarr
}

const scraper = new eshop_scraper(config)
```

### Set timeout

If the request takes longer time than the set amount of time and the website doesn't response within that time then the request will be cancelled.

```js
import eshop_scraper from 'eshop-scraper'

const config = {
timeout: 10, // pass an integer in timeout (counted as second)
}

const scraper = new eshop_scraper(config)
```

---
## Check default values

Use these only to check default valuess, directly replacing values with new values not recommended.

```js
import eshop_scraper from 'eshop-scraper'
const scraper = new eshop_scraper()(async () => {
let defProps = scraper._webprops // default supported websites
let defReplaceStrings = scraper._replaceobj // default replaced strings
let defHeaders = scraper._headers // default set of headers
let defTimeout = scraper._timeoutAmount // default timeout amount
let defCurrencyMap = scraper._currencymap // default currency map

console.log(defProps)
console.log(defReplaceStrings)
console.log(defHeaders)
console.log(defTimeout)
console.log(defCurrencyMap)
})
```

## Supported websites

It supports **12** websites by default and more can be added very easily.

### Websites list

1. Steam (store.steampowered.com)
2. Amazon (amazon.com, amazon.in)
3. Walmart (walmart.com)
4. Crutchfield (crutchfield.com)
5. Playstation (store.playstation.com, gear.playstation.com, direct.playstation.com)
6. Rakuten (fr.shopping.rakuten.com)
7. Ebay (ebay.com)
8. Ebags (ebags.com)
9. Bikroy (bikroy.com)
10. Flipkart (flipkart.com)
11. Etsy (etsy.com)
12. Avito (avito.ru)

## Note

Some websites may show unexpected result. Because all websites doesn't support the same way of scraping. Also this scraper is made for static websites. Dynamic / Single Page websites won't work with this scraper. Those will be supported in future version of this scraper.
<br/>

Some websites may show prices like "2345" instead of "23.45" because those websites initially shows the price without any dot or shows with a comma and later dynamically changed with a dot, as comma is excluded by the scraper and the scraper can't execute javascript while scraping, that's why the price is shown as "2345".
<br/>

Some websites shows price in local language. The scraper processes the price that's got from the website and it only understands English. So the price has to be in English. Otherwise it will return `NaN` in price output and `undefined` in currency output.

## Contribute

Contribute in the project by opening a pull request on github. Contributions are welcomed!

**<p align="center">Proudly Made In Bangladesh 🇧🇩</p>**
Loading

0 comments on commit a5bfe46

Please sign in to comment.