Skip to content

Commit

Permalink
refactor(captcha): use screenshot instead of passing image url
Browse files Browse the repository at this point in the history
  • Loading branch information
klords committed May 10, 2021
1 parent c9049a3 commit 41a80d9
Show file tree
Hide file tree
Showing 26 changed files with 553 additions and 192 deletions.
51 changes: 41 additions & 10 deletions docs/reference/captcha.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@
A mechanism has been implemented to allow users to interactively handle captcha challenges without being directly connected to their streetmerchant instance. This works by sending the captcha challenge to the user directly via their preferred messaging service and waiting for their response, which is then input to the captcha page, allowing streetmerchant to proceed with its processing.

???+ attention
This implementation has only been tested/used on Amazon (North America). Please submit an issue if you're facing captcha on other stores so we can get it integrated.
This implementation has only been tested/used on Amazon sites. Please submit an issue if you're facing captcha on other stores so we can get it integrated.

## How to use

To use this feature, you will have to set up a bot user on your desired messaging service (see the [FAQ](#how-do-I-obtain-a-token-for) for more help). Once that's complete and you have the token for the bot, simply configure the variables shown below and then start your streetmerchant instance.
To use this feature, you will have to set up a bot user on your desired messaging service (see the [FAQ](#how-do-i-obtain-a-token-for) for more help). Once that's complete and you have the token for the bot, simply configure the variables shown below and then start your streetmerchant instance.

???+ attention
When a DM is received, you must reply to the message directly. In Slack, this is done by clicking the "reply to thread" button on the bot's DM, and sending a response in the thread panel that appears. In Discord, you simply click "Reply" on the bot's DM and type your response in the input field (you will see "Replying to [bot user name]" above the input field).
Expand All @@ -21,24 +21,35 @@ To use this feature, you will have to set up a bot user on your desired messagin
You can test your notification configuration by running `npm run test:captcha`.

???+ info
The test command will allow the user up to 30 seconds to enter a response before timing out. This is not directly configurable.
The test command will use the values from the dotenv configuration file, including timeout and poll interval.

## Configuration variables

| Environment variable | Description |
|---|---|
| `CAPTCHA_HANDLER_CAPTURE_TYPE` | Global override of [Capture type](#capture-types) to use for the captcha handler. Default: `link` (if not set in store) |
| `CAPTCHA_HANDLER_POLL_INTERVAL` | Interval (in seconds) at which streetmerchant will check if the user has responded. Default: `5` |
| `CAPTCHA_HANDLER_RESPONSE_TIMEOUT` | Timeout (in seconds) duration, after which streetmerchant will assume the user is unavailable and continue to the next page. Default: `300` (5 minutes) |
| `CAPTCHA_HANDLER_SERVICE` | [Supported messaging service](#supported-messaging-services) to use for the captcha handler |
| `CAPTCHA_HANDLER_TOKEN` | Token to identify the bot user of the selected messaging service. See the [FAQ](#how-do-I-obtain-a-token-for) for information on where to obtain this. |
| `CAPTCHA_HANDLER_USER_ID` | ID representing _your account_ in the selected messaging service. The account specified here will receive the bot's DMs. See the [FAQ](#how-do-I-obtain-my-user-ID-for) for information on where to obtain this. |
| `CAPTCHA_HANDLER_USER_ID` | ID representing _your account_ in the selected messaging service. The account specified here will receive the bot's DMs. See the [FAQ](#how-do-i-obtain-my-user-id-for) for information on where to obtain this. |

???+ info
The poll interval is 5 seconds so that the bot doesn't get rate-limited trying to check for responses (plus let's be honest, it's only 5 seconds at most).

???+ info
While you can obviously adjust the response timeout to your liking, setting it to a high value is better. If you set it too low, you likely won't have time to respond before the bot moves on, and you will also get bombarded with DM notifications. If your bot runs into captcha pages without solving them, it will start to get flagged more frequently and eventually only get captcha pages. It's better to set a high timeout and solve it once, even if it stops the processing for a few minutes, rather than have to deal with multiple captchas anyway, but that's your call to make.

## Capture types

| Type | Description |
|---|---|
| `image` | Captures a screenshot of the defined challenge element. This screenshot is temporarily stored in the streetmerchant directory while the interactive handler does its work, after which the bot will attempt to clean the file up. |
| `link` | Extracts the URL from the `src` property of the defined challenge element, which is then sent to the user. Most modern chat applications will attempt to unfurl this URL automatically and display the image, so it should be mostly the same experience as using `image`. |

???+ info
For the dotenv file, this is a global override and will most likely not need to be set, as this will be set per-store by other maintainers. That said, if you do need to set it, see the [FAQ](#which-capture-type-should-i-use) for guidance on which type to use.

## Supported messaging services

| Service | Environment variable |
Expand Down Expand Up @@ -68,28 +79,48 @@ You have to enable Developer Mode in the Advanced settings. Once that's enabled,
Create an app [here](https://api.slack.com/apps) and copy the token you get once the setup is complete. Put the token in the dotenv file.

???+ info
The app will need `chat:write`, `im:history`, `im:write`, and `reactions:write` permissions.
The app will need `chat:write`, `im:history`, `im:write`, `files:write`, and `reactions:write` permissions.

#### Discord

Create an app [here](https://discord.com/developers/applications) and copy the token, client ID, and permissions integer (I used `518208`). Then use the url [here](https://discord.com/developers/docs/topics/oauth2#bot-authorization-flow-url-example), replacing the `client_id` and `permissions` values with your own to add the bot to your server. Paste the token into your dotenv file.

### The bot didn't send a message when I got a captcha page.
### The bot didn't send a message when it detected a captcha page.

That isn't a question. This is an FAQ.

### The bot didn't send a message when I got a captcha page?
### The bot didn't send a message when it detected a captcha page?

Much better. This could either be a configuration error in streetmerchant (not completed, wrong values, etc) or the bot user isn't configured correctly in your messaging service. Double-check the configuration variables you've entered and use `npm run test:captcha` to help find out the root cause.

### Why are the bot images coming through broken?

This can depend on the capture type you are using as well as some other settings. If you are running in low bandwidth mode, disable it to ensure captcha images load. You can also try changing the capture type `link` or `image` to see if a different setting works. Otherwise, file an issue.

### Which capture type should I use?

tl;dr - Neither approach offers a totally perfect solution. The `image` type is generally more robust, but can falter to image upload limits. The `link` type is higher quality, but is easier for stores to lock down.

The choice between `image` and `link` capture types should mostly be unnecessary, but there are times where one will be required over the other.

First, if a store gets wise to the fact that their captcha images are being accessed outside their store/captcha pages, they may either block access or embed the images directly in the page, in which case `link` will not work as there will be no usable URL. Additionally a store may implement captcha in a way that a single URL is not sufficient, in which case `link` will also not be useful.

Second, messaging services do not offer unlimited upload and storage of files. If you find that you've hit a quota, you may want to set the dotenv to `link`, as long as the first point above is not working against you. If so, you can explore using another messaging service for the remainder of the quota period or find another way to increase your quota. If either of those options are non-starters, you will have to pursue one of the [old-school workarounds in the troubleshooting guide](../help/troubleshoot#captcha-issues).

Finally, there are some usability caveats with either case that may just not be to your liking. The `image` type can sometimes be offset from the actual content (though this has only been observed in testing, not in "actual" usage). The `link` type will not succumb to this as you'll be linking to the original file. However, the `link` type will not automatically be unfurled all the time, which means extra clicks will be necessary to solve the captcha.

### The bot doesn't do anything when I respond to the message and eventually times out. What's happening?

When streetmerchant sends a message via Slack/Discord, it keeps a reference to that message and listens only for direct replies to it until either a response is obtained or the timeout threshold is reached. This allows the interactive captcha process to be used with multiple concurrent streetmerchant instances. Please review the warning in the [How to use](#how-to-use) section, which discusses specifically how to respond to the bot for a successful interaction.

### Why isn't captcha being detected on some of the stores I'm monitoring?

Not sure, but we'll want to get that fixed! Submit an issue and we can look into it.
Captcha is detected by looking for elements on the page that someone has defined in the streetmerchant code. These elements can change over time, or something else could be going on. Either way, submit an issue and we can look into it.

### Does the interactive captcha handler process work on every store?

Not yet. It's only implemented for a subset of stores. If you're facing captchas (detected or not) that aren't being handled, submit an issue and we can work on integrating it.

### Will this work on every store's captcha system?
### Does this work against (insert captcha implementation here)?

Not likely. There are a plethora of captcha implementations that retailers can utilize to protect their sites. As of this writing, the interactive captcha handler has only been tested/used for Amazon (North America), and this is where the vast majority of captcha complaints come from. Any other store can implement a different captcha approach, and even Amazon can change their captcha at any time. All that said, if you're running into captcha issues with a store, submit an issue so we can work on a solution and getting it integrated, if possible.
Not likely (yet). There are a plethora of captcha implementations that retailers can utilize to protect their sites. Any store can pick from existing captcha solutions, make their own, and obviously change it at any time.
1 change: 1 addition & 0 deletions dotenv-example
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ APNS_PRODUCTION=
APNS_TEAMID=
AUTO_ADD_TO_CART=
BROWSER_TRUSTED=
CAPTCHA_HANDLER_CAPTURE_TYPE=
CAPTCHA_HANDLER_POLL_INTERVAL=
CAPTCHA_HANDLER_RESPONSE_TIMEOUT=
CAPTCHA_HANDLER_SERVICE=
Expand Down
2 changes: 1 addition & 1 deletion src/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ import {existsSync, readFileSync} from 'fs';
import {banner} from './banner';
import dotenv from 'dotenv';
import path from 'path';
import * as console from 'console';

if (process.env.npm_config_conf) {
if (
Expand Down Expand Up @@ -202,6 +201,7 @@ const browser = {
};

const captchaHandler = {
captureType: envOrString(process.env.CAPTCHA_HANDLER_CAPTURE_TYPE),
pollInterval: envOrNumber(process.env.CAPTCHA_HANDLER_POLL_INTERVAL, 5),
responseTimeout: envOrNumber(
process.env.CAPTCHA_HANDLER_RESPONSE_TIMEOUT,
Expand Down
91 changes: 49 additions & 42 deletions src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -28,48 +28,7 @@ async function restartMain() {
* Starts the bot.
*/
async function main() {
const args: string[] = [];

// Skip Chromium Linux Sandbox
// https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md#setting-up-chrome-linux-sandbox
if (config.browser.isTrusted) {
args.push('--no-sandbox');
args.push('--disable-setuid-sandbox');
}

// https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md#tips
// https://stackoverflow.com/questions/48230901/docker-alpine-with-node-js-and-chromium-headless-puppeter-failed-to-launch-c
if (config.docker) {
args.push('--disable-dev-shm-usage');
args.push('--no-sandbox');
args.push('--disable-setuid-sandbox');
args.push('--headless');
args.push('--disable-gpu');
config.browser.open = false;
}

// Add the address of the proxy server if defined
if (config.proxy.address) {
args.push(
`--proxy-server=${config.proxy.protocol}://${config.proxy.address}:${config.proxy.port}`
);
}

if (args.length > 0) {
logger.info('ℹ puppeteer config: ', args);
}

await stop();
browser = await launch({
args,
defaultViewport: {
height: config.page.height,
width: config.page.width,
},
headless: config.browser.isHeadless,
});

config.browser.userAgent = await browser.userAgent();
browser = await launchBrowser();

for (const store of storeList.values()) {
logger.debug('store links', {meta: {links: store.links}});
Expand Down Expand Up @@ -115,6 +74,54 @@ async function loopMain() {
}
}

export async function launchBrowser(): Promise<Browser> {
console.warn('launch browser called');
const args: string[] = [];

// Skip Chromium Linux Sandbox
// https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md#setting-up-chrome-linux-sandbox
if (config.browser.isTrusted) {
args.push('--no-sandbox');
args.push('--disable-setuid-sandbox');
}

// https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md#tips
// https://stackoverflow.com/questions/48230901/docker-alpine-with-node-js-and-chromium-headless-puppeter-failed-to-launch-c
if (config.docker) {
args.push('--disable-dev-shm-usage');
args.push('--no-sandbox');
args.push('--disable-setuid-sandbox');
args.push('--headless');
args.push('--disable-gpu');
config.browser.open = false;
}

// Add the address of the proxy server if defined
if (config.proxy.address) {
args.push(
`--proxy-server=${config.proxy.protocol}://${config.proxy.address}:${config.proxy.port}`
);
}

if (args.length > 0) {
logger.info('ℹ puppeteer config: ', args);
}

await stop();
const browser = await launch({
args,
defaultViewport: {
height: config.page.height,
width: config.page.width,
},
headless: config.browser.isHeadless,
});

config.browser.userAgent = await browser.userAgent();

return browser;
}

void loopMain();

process.on('SIGINT', stopAndExit);
Expand Down
21 changes: 16 additions & 5 deletions src/messaging/captcha.ts
Original file line number Diff line number Diff line change
@@ -1,18 +1,29 @@
import {config} from '../config';
import {getDiscordCaptchaInputAsync} from './discord';
import {getSlackCaptchaInputAsync} from './slack';
import {sendDMAndGetResponseAsync as getWithDiscord} from './discord';
import {sendDMAndGetResponseAsync as getWithSlack} from './slack';
import {DMPayload} from '.';

export type CaptchaPayload = DMPayload; // for now this is a 1:1 alias

const {service} = config.captchaHandler;

/**
* Picks the service that will handle the user interaction
* based on configuration and sends the payload to that service
*
* @param payload the content to send to user
* @param timeout timeout for response, in seconds
* @returns response from user
*/
export async function getCaptchaInputAsync(
payload: string,
payload: CaptchaPayload,
timeout?: number
): Promise<string> {
switch (service) {
case 'discord':
return await getDiscordCaptchaInputAsync(payload, timeout);
return await getWithDiscord(payload, timeout);
case 'slack':
return await getSlackCaptchaInputAsync(payload, timeout);
return await getWithSlack(payload, timeout);
default:
return '';
}
Expand Down
49 changes: 28 additions & 21 deletions src/messaging/discord.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,11 @@ import {Link, Store} from '../store/model';
import Discord from 'discord.js';
import {config} from '../config';
import {logger} from '../logger';
import {DMPayload} from '.';

const {notifyGroup, webhooks, notifyGroupSeries} = config.notifications.discord;
const {pollInterval, responseTimeout, token, userId} = config.captchaHandler;

let clientInstance: Discord.Client | undefined;
let dmChannelInstance: Discord.DMChannel | undefined;

function getIdAndToken(webhook: string) {
const match = /.*\/webhooks\/(\d+)\/(.+)/.exec(webhook);

Expand Down Expand Up @@ -97,22 +95,37 @@ export function sendDiscordMessage(link: Link, store: Store) {
}

export async function sendDMAsync(
payload: string
payload: DMPayload
): Promise<Discord.Message | undefined> {
if (userId && token) {
logger.debug('↗ sending discord DM');
let client = undefined;
let dmChannel = undefined;
try {
const client = await getDiscordClientAsync();
const dmChannel = await getDMChannelAsync(client);
client = await getDiscordClientAsync();
dmChannel = await getDMChannelAsync(client);
if (!dmChannel) {
logger.error('unable to get discord DM channel');
return;
}
const result = await dmChannel.send(payload);
let message: string | {} = payload;
if (payload.type === 'image') {
message = {
files: [
{
attachment: payload.content,
name: payload.content,
},
],
};
}
const result = await dmChannel.send(message);
logger.info('✔ discord DM sent');
return result;
} catch (error: unknown) {
logger.error("✖ couldn't send discord DM", error);
} finally {
client?.destroy();
}
} else {
logger.warn("✖ couldn't send discord DM, missing configuration");
Expand All @@ -137,6 +150,7 @@ export async function getDMResponseAsync(
let response = '';
const intervalId = setInterval(async () => {
const finish = (result: string) => {
client?.destroy();
clearInterval(intervalId);
resolve(result);
};
Expand All @@ -156,7 +170,7 @@ export async function getDMResponseAsync(
}
} else {
response = lastUserMessage.cleanContent;
lastUserMessage.react('✅');
await lastUserMessage.react('✅');
logger.info(`✔ got captcha response: ${response}`);
return finish(response);
}
Expand All @@ -168,37 +182,30 @@ export async function getDMResponseAsync(
});
}

export async function getDiscordCaptchaInputAsync(
payload: string,
export async function sendDMAndGetResponseAsync(
payload: DMPayload,
timeout?: number
): Promise<string> {
const message = await sendDMAsync(payload);
const response = await getDMResponseAsync(
message,
timeout || responseTimeout
);
closeClient();
return response;
}

function closeClient() {
if (clientInstance) {
clientInstance.destroy();
clientInstance = undefined;
dmChannelInstance = undefined;
}
}

async function getDiscordClientAsync() {
if (!clientInstance && token) {
let clientInstance = undefined;
if (token) {
clientInstance = new Discord.Client();
await clientInstance.login(token);
}
return clientInstance;
}

async function getDMChannelAsync(client?: Discord.Client) {
if (!dmChannelInstance && userId && client) {
let dmChannelInstance = undefined;
if (userId && client) {
const user = await new Discord.User(client, {
id: userId,
}).fetch();
Expand Down
Loading

0 comments on commit 41a80d9

Please sign in to comment.