Puppeteer is a powerful tool capable of simulating human interactions with web pages, enabling various use cases such as webpage screenshots, PDF generation, automated testing, uptime monitoring, web scraping, and content tracking.
There are many scenarios where deploying Puppeteer in the cloud makes sense. For example:
- Triggering automated tests via APIs in a CI/CD pipeline.
- Using cron jobs to periodically check website availability.
- Running large-scale, distributed web scrapers.
The pay-as-you-go and scalable nature of serverless computing makes it an excellent choice for browser automation tasks. However, most platforms, like DigitalOcean, only provide virtual machines, forcing you to pay for idle time (which would waste a lot of money!). Only a few platforms currently support running Puppeteer in a serverless manner: Leapcell, AWS Lambda, and Cloudflare Browser Rendering.
This article explores these platforms: how to use them to accomplish a typical Puppeteer task, and their pros and cons.
The Task
Let’s take a common Puppeteer use case as our example: capturing a screenshot of a web page.
The task involves these steps:
- Visiting a specified URL.
- Taking a screenshot of the page.
- Returning the image.
Code Example:
const puppeteer = require('puppeteer');
const { Hono } = require('hono');
const { serve } = require('@hono/node-server');
const screenshot = async (url) => {
const browser = await puppeteer.launch({ args: ['--single-process'] });
const page = await browser.newPage();
await page.goto(url);
const img = await page.screenshot();
await browser.close();
return img;
};
const app = new Hono();
app.get('/', async (c) => {
const url = c.req.query('url');
if (url) {
const img = await screenshot(url);
return c.body(img, { headers: { 'Content-Type': 'image/png' } });
} else {
return c.text('Please add an ?url=https://example.com/ parameter');
}
});
const port = 8080;
serve({ fetch: app.fetch, port }).on('listening', () => {
console.log(`Server is running on port ${port}`);
});
Leapcell is a versatile platform that allows you to deploy any application in a serverless manner. However, because it’s not designed exclusively for HTTP requests, its setup can be slightly more involved – you’ll need to manually create an HTTP request handler.
Local Development
Debugging is straightforward. Like any other Node.js application: node index.js
, it’s done!
Deployment
To deploy, specify the build command, run command, and service port (like below).
Once the deployment is complete, your application is ready to use online.
Summary
✅ Pros:
- Consistent local and cloud environments, making debugging easier.
- Supports the official Puppeteer library.
❌ Cons:
- Slightly more complex setup: you must write your own HTTP handler.
Code Example:
const chromium = require('chrome-aws-lambda');
exports.handler = async (event) => {
let browser = null;
try {
browser = await chromium.puppeteer.launch({
args: chromium.args,
defaultViewport: chromium.defaultViewport,
executablePath: await chromium.executablePath,
headless: chromium.headless,
});
const page = await browser.newPage();
await page.goto(event.url);
const screenshot = await page.screenshot();
return {
statusCode: 200,
headers: {'Content-Type': 'image/jpeg'},
body: screenshot.toString('base64'),
isBase64Encoded: true,
};
} catch (error) {
return {
statusCode: 500,
body: 'Failed to capture screenshot.',
};
} finally {
if (browser !== null) {
await browser.close();
}
}
};
AWS Lambda requires the use of puppeteer-core
paired with a third-party Chromium library, such as alixaxel/chrome-aws-lambda
. This is necessary because AWS imposes a 250MB limit on the size of a Lambda function. The default Chromium bundled with Puppeteer easily exceeds this limit (~170MB on macOS, ~282MB on Linux, ~280MB on Windows), making the use of a slimmed-down Chromium necessary.
Local Development
Local debugging requires complex configurations due to differences in runtime environments. As you can see in alixaxel/chrome-aws-lambda
‘s guide.
Deployment
To deploy, you need to upload your node_modules
as a ZIP file. Depending on your use case, you might also need to configure Lambda Layers. The main business logic can be written directly in the AWS console, save it to execute.
Summary
✅ Pros:
❌ Cons:
- Rely on third-party Chromium libraries, which may introduce potential risks.
- Complex local debugging.
- Tedious deployment process requiring ZIP uploads and potentially Lambda Layers.
Code Example:
import puppeteer from '@cloudflare/puppeteer';
export default {
async fetch(request, env) {
const { searchParams } = new URL(request.url);
let url = searchParams.get('url');
if (url) {
url = new URL(url).toString(); // normalize
const browser = await puppeteer.launch(env.MYBROWSER);
const page = await browser.newPage();
await page.goto(url);
const img = await page.screenshot();
await browser.close();
return new Response(img, {
headers: {
'content-type': 'image/png',
},
});
} else {
return new Response('Please add an ?url=https://example.com/ parameter');
}
},
};
Cloudflare Browser Rendering is a relatively new serverless Puppeteer solution. Similar to AWS Lambda, it does not support the official Puppeteer library. Instead, it uses a Puppeteer version provided by Cloudflare.
While Cloudflare’s library is more secure than any third-party options, its slow update cycle can be frustrating – it hasn’t been updated in over five months!
Additionally, Cloudflare Browser Rendering has several limitations:
- Only available to Worker Pro users.
- Each Cloudflare account can only create a maximum of 2 browsers per minute, with no more than 2 browsers running concurrently.
Local Development
Local debugging requires complex configurations.
Deployment
To deploy, write your function online and save it to run.
Summary
✅ Pros:
❌ Cons:
- Rely on Clouflare’s Puppeteer library, which has a slow update cycle.
- Complex local debugging.
- Restricted access due to paywalls and other limitations.
Conclusion
This article has compared the three main serverless Puppeteer deployment platforms: Leapcell, AWS Lambda, and Cloudflare Browser Rendering. Each platform has its pros and cons.
Which one do you prefer? Are there other serverless Puppeteer deployment solutions you know of? Share your thoughts in the comments!
If you’re planning to deploy your Puppeteer project online, as compared above, Leapcell would be a good choice.
For deployment guide, visit our documentation.
Source link
lol