How to Deploy Puppeteer in the Cloud: Solutions Compared

How to Deploy Puppeteer in the Cloud: Solutions Compared


Puppeteer is a powerful tool capable of simulating human interactions with web pages, enabling various use cases such as webpage screenshots, PDF generation, automated testing, uptime monitoring, web scraping, and content tracking.

There are many scenarios where deploying Puppeteer in the cloud makes sense. For example:

  • Triggering automated tests via APIs in a CI/CD pipeline.
  • Using cron jobs to periodically check website availability.
  • Running large-scale, distributed web scrapers.

The pay-as-you-go and scalable nature of serverless computing makes it an excellent choice for browser automation tasks. However, most platforms, like DigitalOcean, only provide virtual machines, forcing you to pay for idle time (which would waste a lot of money!). Only a few platforms currently support running Puppeteer in a serverless manner: Leapcell, AWS Lambda, and Cloudflare Browser Rendering.

This article explores these platforms: how to use them to accomplish a typical Puppeteer task, and their pros and cons.



The Task

Let’s take a common Puppeteer use case as our example: capturing a screenshot of a web page.

The task involves these steps:

  1. Visiting a specified URL.
  2. Taking a screenshot of the page.
  3. Returning the image.

Code Example:

const puppeteer = require('puppeteer');
const { Hono } = require('hono');
const { serve } = require('@hono/node-server');

const screenshot = async (url) => {
  const browser = await puppeteer.launch({ args: ['--single-process'] });
  const page = await browser.newPage();
  await page.goto(url);
  const img = await page.screenshot();
  await browser.close();

  return img;
};

const app = new Hono();

app.get('/', async (c) => {
  const url = c.req.query('url');

  if (url) {
    const img = await screenshot(url);
    return c.body(img, { headers: { 'Content-Type': 'image/png' } });
  } else {
    return c.text('Please add an ?url=https://example.com/ parameter');
  }
});

const port = 8080;
serve({ fetch: app.fetch, port }).on('listening', () => {
  console.log(`Server is running on port ${port}`);
});
Enter fullscreen mode

Exit fullscreen mode

Leapcell is a versatile platform that allows you to deploy any application in a serverless manner. However, because it’s not designed exclusively for HTTP requests, its setup can be slightly more involved – you’ll need to manually create an HTTP request handler.



Local Development

Debugging is straightforward. Like any other Node.js application: node index.js, it’s done!



Deployment

To deploy, specify the build command, run command, and service port (like below).

Once the deployment is complete, your application is ready to use online.



Summary

✅ Pros:

  • Consistent local and cloud environments, making debugging easier.
  • Supports the official Puppeteer library.

❌ Cons:

  • Slightly more complex setup: you must write your own HTTP handler.

Code Example:

const chromium = require('chrome-aws-lambda');

exports.handler = async (event) => {
  let browser = null;

  try {
    browser = await chromium.puppeteer.launch({
      args: chromium.args,
      defaultViewport: chromium.defaultViewport,
      executablePath: await chromium.executablePath,
      headless: chromium.headless,
    });

    const page = await browser.newPage();
    await page.goto(event.url);

    const screenshot = await page.screenshot();

    return {
      statusCode: 200,
      headers: {'Content-Type': 'image/jpeg'},
      body: screenshot.toString('base64'),
      isBase64Encoded: true,
    };
  } catch (error) {
    return {
      statusCode: 500,
      body: 'Failed to capture screenshot.',
    };
  } finally {
    if (browser !== null) {
      await browser.close();
    }
  }
};
Enter fullscreen mode

Exit fullscreen mode

AWS Lambda requires the use of puppeteer-core paired with a third-party Chromium library, such as alixaxel/chrome-aws-lambda. This is necessary because AWS imposes a 250MB limit on the size of a Lambda function. The default Chromium bundled with Puppeteer easily exceeds this limit (~170MB on macOS, ~282MB on Linux, ~280MB on Windows), making the use of a slimmed-down Chromium necessary.



Local Development

Local debugging requires complex configurations due to differences in runtime environments. As you can see in alixaxel/chrome-aws-lambda‘s guide.



Deployment

To deploy, you need to upload your node_modules as a ZIP file. Depending on your use case, you might also need to configure Lambda Layers. The main business logic can be written directly in the AWS console, save it to execute.



Summary

✅ Pros:

❌ Cons:

  • Rely on third-party Chromium libraries, which may introduce potential risks.
  • Complex local debugging.
  • Tedious deployment process requiring ZIP uploads and potentially Lambda Layers.

Code Example:

import puppeteer from '@cloudflare/puppeteer';

export default {
  async fetch(request, env) {
    const { searchParams } = new URL(request.url);
    let url = searchParams.get('url');
    if (url) {
      url = new URL(url).toString(); // normalize

      const browser = await puppeteer.launch(env.MYBROWSER);
      const page = await browser.newPage();
      await page.goto(url);
      const img = await page.screenshot();
      await browser.close();

      return new Response(img, {
        headers: {
          'content-type': 'image/png',
        },
      });
    } else {
      return new Response('Please add an ?url=https://example.com/ parameter');
    }
  },
};
Enter fullscreen mode

Exit fullscreen mode

Cloudflare Browser Rendering is a relatively new serverless Puppeteer solution. Similar to AWS Lambda, it does not support the official Puppeteer library. Instead, it uses a Puppeteer version provided by Cloudflare.

While Cloudflare’s library is more secure than any third-party options, its slow update cycle can be frustrating – it hasn’t been updated in over five months!

Additionally, Cloudflare Browser Rendering has several limitations:

  • Only available to Worker Pro users.
  • Each Cloudflare account can only create a maximum of 2 browsers per minute, with no more than 2 browsers running concurrently.



Local Development

Local debugging requires complex configurations.



Deployment

To deploy, write your function online and save it to run.



Summary

✅ Pros:

❌ Cons:

  • Rely on Clouflare’s Puppeteer library, which has a slow update cycle.
  • Complex local debugging.
  • Restricted access due to paywalls and other limitations.



Conclusion

This article has compared the three main serverless Puppeteer deployment platforms: Leapcell, AWS Lambda, and Cloudflare Browser Rendering. Each platform has its pros and cons.

Which one do you prefer? Are there other serverless Puppeteer deployment solutions you know of? Share your thoughts in the comments!


If you’re planning to deploy your Puppeteer project online, as compared above, Leapcell would be a good choice.

For deployment guide, visit our documentation.

Leapcell

Read on our blog



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.