Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
October 6, 2022 07:21 am GMT

Puppeteer on AWS Lambda

Wondering how you can get Puppeteer to work properly on AWS Lambda?

Youre in the right place! In this post, well cover the main challenges you can encounter while trying to do that. But first, lets start with introducing both Puppeteer and AWS Lambda.

What is Puppeteer?

Simply put, Puppeteeris a software for controlling a (headless) browser. Its a piece of open-source software developed and supported by Googles developer tools team. It allows you to simulate user interaction with a browser through a simple API. This is very helpful for doing things like automated tests or web scraping.

A pictures worth a thousand words. How much is a gifworth? With a little bit of code shown in the gif below, I can log in to a Google account. You simply need to click, enter text, paginate, and scrape all the publicly available data you need. Puppeteer on AWS Lambda tutorial

What is AWS Lambda?

AWS Lambdais what Amazon calls Run code without thinking about servers or clusters. You can simply create a function on Lambda and then execute it. Its that easy.

Simply put, you can do everything on AWS Lambda. Okay, everything is a strong word, but almost. For example, it is possible to scrape thousands of public web pages every night with AWS Lambda functions. Also, it manages to insert data into databases.

Getting started with AWS Lambda is simple and inexpensive. You only need to pay for what you use, and they also have a generous free trial.

Problem #1 Puppeteer is too big to push to Lambda

AWS Lambda has a 50 MB limit on the zip file you push directly to it. Due to the fact that it installs Chromium, the Puppeteer packageis significantly larger than that. However, this 50 MB limit doesnt apply when you load the function from S3! See the documentation here.

AWS Lambda quotas can be tight for Puppeteer
Puppeteer lambda

The 250 MB unzipped can be bypassed by uploading directly from an S3 bucket. So we create a bucket in S3, use a node script to upload to S3, and then update our Lambda code from that bucket. The script looks something like this:

"zip":"npm run build && 7z a -r function.zip ./dist/* node_modules/",

"sendToLambda":"npm run zip && aws s3 cp function.zip s3://chrome-aws && rm function.zip && aws lambda update-function-code --function-name puppeteer-examples --s3-bucket chrome-aws --s3-key function.zip"

Puppeteer on AWS Lambda doesnt work

By default, Linux (including AWS Lambda) doesnt include the necessary libraries required to allow Puppeteer to function.

Fortunately, there already exists a package of Chromium built for AWS Lambda. You can find it here. You will need to install it and puppeteer-corein your function that you are sending to Lambda.

The regular Puppeteer package will not be needed and, in fact, counts against your 250 MB limit.

npm i --save chrome-aws-lambda puppeteer-core

And then, when you are setting it up to launch a browser from Puppeteer, it will look like this:

constbrowser =awaitchromium.puppeteer

.launch({

args:chromium.args,

defaultViewport:chromium.defaultViewport,

executablePath:awaitchromium.executablePath,

headless:chromium.headless

});

Final note

Puppeteer requires more memory than a regular script, so keep an eye on your max memory usage. When using Puppeteer, we recommend at least 512 MB on your AWS Lambda function. Also, dont forget to run await browser.close()at the end of your script. Otherwise, you may end up with your function running until timeout for no reason because the browser is still alive and waiting for commands.


Original Link: https://dev.to/oxylabs-io/puppeteer-on-aws-lambda-3ike

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To