128 KB AWS IoT message broker limit with .evaluate() result #114

vladholubiev · 2017-07-31T19:37:29Z

I'm using Chromeless to scrape html from websites, and discovered function never returns value if text is too big. I deployed my own serverless project provided in the repo. By trial and error I found it times out if returned html string is larger 131060 bytes.

const chromeless = new Chromeless({ remote: true })

const text = await chromeless
  .goto('https://www.graph.cool')
  .evaluate(() => 'a'.repeat(131061)) // times out, but 131060 works

console.log(text)

await chromeless.end()

Looking up this 'magical' number it seems to have some sense:

Is there any internal CDP limitation for 128KiB?

adieuadieu · 2017-07-31T19:46:50Z

Hi @vladgolubev. Hm.. I suspect you've run into the 128 KB AWS IoT message broker limit. Not sure about the best solution, but we'll need to figure something out as I can imagine 128 KB won't be enough in many situations..

joelgriffith · 2017-07-31T19:48:16Z

Is it possible to gzip content?

vladholubiev · 2017-07-31T19:49:07Z

@adieuadieu maybe similar solution as for pdfs/screenshots?

Implement .html() method(#74) which will upload ${cuid()}.html file to S3 bucket?

adieuadieu · 2017-07-31T19:53:56Z

I'm thinking something along the lines of breaking up the payload into multiple messages-chunks that get passed around by the MQTT broker—perhaps gzipping them onto of that. We would like to support Azure and GCP in the future, too, so also need to take their equivalent messaging products and their limits into consideration.

adieuadieu · 2017-07-31T20:01:07Z

@vladgolubev we don't have to worry about the response payload limit (or any APIG limits) since we never respond with anything Chrome-related from the Lambda function's callback(). Currently, everything is communicated between Chromeless and the Proxy (running on Lambda) over MQTT (AWS IoT).

vladholubiev · 2017-07-31T20:01:12Z

@adieuadieu Can 6MB response payload limit for Lambda or 10MB for API Gateway will be an issue later even after splitting? Or chromeless doesn't interact w/ Lambda directly?

vladholubiev · 2017-07-31T20:09:24Z

Thanks, now I got it!

Wanted to leave here as a reference how AWS encapsulated a solution for a similar problem - https://aws.amazon.com/about-aws/whats-new/2015/10/now-send-payloads-up-to-2gb-with-amazon-sqs/

But now I see splitting messages is a more generic solution.

Because it may work for html now, but then the same problem will pop up when someone wants to return a large array of URLs or whatever from .evaluate()

labithiotis · 2017-08-08T11:24:18Z

@vladgolubev Hi, I am having issues using .html() with size limits as mentioned above.
You mentioned that .html saves to S3 was implemented (${cuid()}.html), however I'm not seeing them in the S3 bucket, do see the .png though.

vladholubiev · 2017-08-08T12:26:23Z

@labithiotis sorry if it was misleading. I only suggested that solution. This size issue is still being resolved by @adieuadieu

labithiotis · 2017-08-08T12:40:25Z

@vladgolubev Great to know, but is there anything I could do now to resolve this? Either increase limits or save html?

joelgriffith · 2017-08-08T18:41:05Z

I think saving the html file is the best solution for the time being. @adieuadieu and @schickling what do you think? .html can return a large payload depending on the page

schickling · 2017-08-08T18:49:58Z

Another option would be to implement message chunking for the websocket connection.

Alternatively, we should make it easier to work with S3 while at the same time decoupling it from APIs like .screenshot etc. WDYT?

joelgriffith · 2017-08-08T18:53:29Z

I think there's a longer-term task to make chunking happen.. but seems like it is still a ways off. I can also see the case where folks want to persist more than just html to disk (IE: dumps of local-store or other serializable values) in S3.

Maybe the solution is in doing both to a degree:

Support chunking for larger messages in WS.
Support or refine API's for persisting to S3 (IE: have another API that's more descriptive saveScreenshot and saveHtml)

labithiotis · 2017-08-09T18:46:26Z

I adjusted the code to filter through/search over the page dom in evaluate and avoid passing back huge payloads.

YazzyYaz · 2017-08-28T20:47:01Z

@joelgriffith @labithiotis I have added a htmlUrl() endpoint on this fork: https://github.com/YazzyYaz/chromeless and it works locally on my computer, returning back a file on my desktop with the html. I'm trying however to test it on AWS Lambda, but my issue is that it doesn't recognize the endpoint after I deploy it. I even configured the package.json to point to the chromeless that is locally modified and it didn't help. Any ideas on what I'm doing wrong?

EDIT: I was doing something stupid, it works on AWS Lambda now :)

YazzyYaz · 2017-08-31T12:47:13Z

@adieuadieu @joelgriffith PR for this issue: #274

adieuadieu added bug enhancement Proxy labels Jul 31, 2017

adieuadieu changed the title ~~.evaluate() times out when returning >=131061 bytes, <=131060 works~~ 128 KB AWS IoT message broker limit with .evaluate() result Jul 31, 2017

adieuadieu self-assigned this Jul 31, 2017

adieuadieu mentioned this issue Aug 2, 2017

Add .html method #74

Closed

YazzyYaz mentioned this issue Aug 31, 2017

Html Saved on S3 With URL Returned Endpoint #274

Merged

adieuadieu added this to the Improve Proxy Service's Reliability milestone Jan 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

128 KB AWS IoT message broker limit with .evaluate() result #114

128 KB AWS IoT message broker limit with .evaluate() result #114

vladholubiev commented Jul 31, 2017

adieuadieu commented Jul 31, 2017

joelgriffith commented Jul 31, 2017

vladholubiev commented Jul 31, 2017

adieuadieu commented Jul 31, 2017

adieuadieu commented Jul 31, 2017 •

edited

Loading

vladholubiev commented Jul 31, 2017

vladholubiev commented Jul 31, 2017

labithiotis commented Aug 8, 2017

vladholubiev commented Aug 8, 2017

labithiotis commented Aug 8, 2017

joelgriffith commented Aug 8, 2017

schickling commented Aug 8, 2017

joelgriffith commented Aug 8, 2017

labithiotis commented Aug 9, 2017

YazzyYaz commented Aug 28, 2017 •

edited

Loading

YazzyYaz commented Aug 31, 2017

128 KB AWS IoT message broker limit with .evaluate() result #114

128 KB AWS IoT message broker limit with .evaluate() result #114

Comments

vladholubiev commented Jul 31, 2017

adieuadieu commented Jul 31, 2017

joelgriffith commented Jul 31, 2017

vladholubiev commented Jul 31, 2017

adieuadieu commented Jul 31, 2017

adieuadieu commented Jul 31, 2017 • edited Loading

vladholubiev commented Jul 31, 2017

vladholubiev commented Jul 31, 2017

labithiotis commented Aug 8, 2017

vladholubiev commented Aug 8, 2017

labithiotis commented Aug 8, 2017

joelgriffith commented Aug 8, 2017

schickling commented Aug 8, 2017

joelgriffith commented Aug 8, 2017

labithiotis commented Aug 9, 2017

YazzyYaz commented Aug 28, 2017 • edited Loading

YazzyYaz commented Aug 31, 2017

adieuadieu commented Jul 31, 2017 •

edited

Loading

YazzyYaz commented Aug 28, 2017 •

edited

Loading