Use 4G modem and different user agents to get fake certified podcast downloads
The process runs a loop following this scenario :
-
Get several episodes' information from input podcasts feeds. A random number of items are picked. The more recent the content is, the more probabilities it has to get chosen. The more episodes published dates are ancient, the less number of picked items are.
-
Select user agents from WhatIsMyBrowser API, according expected frequencies of app usage (configured with
UA_PROP
environment variable). -
Request the picked episodes with select user agents. Request only a part of the file (
MIN_NB_BYTES
environment variable), and then abort. -
Reboot the modem with device API, to get a new IP address
-
Wait internet is up, and then wait again a specific amount of time according the configuration with
WAIT
environment variable.
- Device with NodeJS installed
- Huawei 4G Modem connected to the device, with a valid SIM inside. Need a generous mobile plan... (in France there are cheap ones !). Tested with HUAWEI LTE USB Stick E3372, on Free Mobile network.
- WhatIsMyBrowser API Key (pro plan)
A valid .env
is required
cp .env_example .env
Array of podcasts' feeds urls to request to.
Maximum number of items the process should download during a single iteration
This is the IP address related to the Huawei modem's API. It seems to be 192.168.8.1
in many cases.
API key from WhatIsMyBrowser to get access to specific User-Agent. Once logged, get it from here
The minimum amount of episode data to download, for each request. Set for example to 1500000 bytes, as IAB asks to ignore downloads with less than 60 seconds transferred.
Inline JSON array providing User-Agent frequencies, according times of the day/week (provided with oh
value, following opening_hours
specification).
The one provided in the .env_example
is valid.
The frequencies
key provides the WhatIsMyBrowser's database search parameters for the first element, and the desired probabilities for the second element.
[
{
// Here "24/7" means "all time". "oh" could be "10:00-13:00" to express from 10am to 1pm for example
"oh":"24/7",
"frequencies":[
// target 48% of requests with Apple Podcast User-Agent
[
{
"software_name":"Apple Podcast App"
},
48
],
// target 13% of requests with Spotify on mobile User-Agent
[
{
"software_name":"Spotify",
"hardware_type":"mobile"
},
13
],
// target 11% of requests with Spotify on computer User-Agent
[
{
"software_name":"Spotify",
"hardware_type":"computer"
},
11
],
// target 11% of requests with a browser User-Agent
[
{
"software_type":"browser"
},
11
],
// target 9% of requests with a User-Agent from any media player (ffmpeg, Roku, ...)
[
{
"software_type": "application",
"software_type_specific":"media-player"
},
9
],
// and so on, for Castbox, iTunes, Alexa, Google Assistant
[{"software_name":"CastBox"},4],
[{"software_name":"iTunes"},2],
[{"software_name":"Alexa Media Player"},1],
[{"software_name":"Google Assistant"},1]
]
}
]
Inline JSON array providing the number of seconds between each single process execution, according hours of the day (provided with oh
value, following opening_hours
specification).
The one provided in the .env_example
is valid.
The frequencies
key provides the desired number of seconds.
Example with WAIT=[{"oh":"Mo-Fr 10:00-13:00", "seconds": "60"}, {"oh":"24/7", "seconds": "200"}]
[
// If actual time is in working weekdays from 10am to 1pm, wait 60 seconds (1 minute)
{
"oh":"Mo-Fr 10:00-13:00",
"seconds": "60"
},
// for others timeslots, default = 200 seconds (24/7 = "all time")
{
"oh":"24/7",
"seconds": "200"
}
]
If a valid download needs to be recorded in a Google BigQuery table, set the dataset id
If a valid download needs to be recorded in a Google BigQuery table, set the table id
fieldName | type | mode |
---|---|---|
requestDate | TIMESTAMP | REQUIRED |
podcastTitle | STRING | NULLABLE |
episodeTitle | STRING | NULLABLE |
episodeUrl | STRING | NULLABLE |
episodeDate | TIMESTAMP | NULLABLE |
IP | STRING | NULLABLE |
UA | STRING | NULLABLE |
Recommanded : Partitionned by day, by requestDate
. Clustered by IP, UA
If a valid download needs to be recorded in a Google BigQuery table, set the GCP project id
If this environment variable is set, the process requests this URL and provides downloads values through downloads
GET parameter.
// in GET query string parameters
downloads : [
{
requestDate: Date,
podcastTitle: string,
episodeTitle: string,,
episodeUrl: string,
episodeDate: string,
IP: string,
UA: string,
}
]
Verbose to see each executed steps.
npm run start
- Another strategy to select user agent, free from WhatIsMyBrowser, and "smarter"
- Provide values from Deezer, and others podcast apps.
- User agent cleaning : replace language informations, ...