blog bg

August 01, 2024

Using Puppeteer in Node.js: Automating Browser Tasks

Share what you learn in this blog to prepare for your interview, create your forever-free profile now, and explore how to monetize your valuable knowledge.

Puppeteer is a powerful Node.js library developed by Google that provides a high-level API to control headless Chrome or Chromium over the DevTools Protocol. It is widely used for web scraping, automated testing, creating screenshots, generating PDFs, and more. In this blog post, we'll dive into what Puppeteer is, how to set it up in your Node.js environment, and explore some practical use cases with code examples.

 

What is Puppeteer?

Puppeteer is essentially a tool that allows you to programmatically control a web browser. By using Puppeteer, you can automate tasks like:

  • Navigating to web pages
  • Filling out and submitting forms
  • Capturing screenshots and PDFs
  • Scraping data from websites
  • Running automated tests

Since Puppeteer operates on a headless browser by default (a browser without a graphical user interface), it can perform these tasks quickly and efficiently.

 

Setting Up Puppeteer

Before you can start using Puppeteer, you need to have Node.js installed on your system. You can download and install Node.js from nodejs.org.

Once Node.js is installed, you can set up a new project and install Puppeteer via npm:

mkdir puppeteer-example
cd puppeteer-example
npm init -y
npm install puppeteer

 

This will create a new directory for your project, initialize a package.json file, and install Puppeteer.

 

Basic Usage

Let's start with a simple example of launching a browser, navigating to a website, and taking a screenshot.

Example: Taking a Screenshot

Create a new file called screenshot.js and add the following code:

const puppeteer = require('puppeteer');

(async () => {
  // Launch the browser
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Navigate to the URL
  await page.goto('https://example.com');

  // Take a screenshot
  await page.screenshot({ path: 'example.png' });

  // Close the browser
  await browser.close();
})();

 

Run the script using Node.js:

 

node screenshot.js

 

This script will open a headless browser, navigate to https://example.com, take a screenshot, and save it as example.png in your project directory.

 

Advanced Usage

Puppeteer can do much more than taking screenshots. Let's explore some advanced use cases.

 

Example: Scraping Data

Suppose we want to scrape the titles of the latest articles from a news website. Here's how you can do it:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://news.ycombinator.com/');

  // Scrape the titles of the articles
  const titles = await page.evaluate(() => {
    return Array.from(document.querySelectorAll('.storylink')).map(element => element.textContent);
  });

  console.log(titles);

  await browser.close();
})();

 

This script navigates to Hacker News, scrapes the titles of the articles, and logs them to the console.

 

Example: Filling Forms and Submitting

Puppeteer can also be used to automate form submission. Here's an example of how to fill out and submit a form:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com/login');

  // Fill out the form
  await page.type('#username', 'myusername');
  await page.type('#password', 'mypassword');

  // Submit the form
  await page.click('#login-button');

  // Wait for navigation
  await page.waitForNavigation();

  // Take a screenshot of the logged-in page
  await page.screenshot({ path: 'logged-in.png' });

  await browser.close();
})();

 

This script navigates to a login page, fills in the username and password fields, submits the form, waits for the navigation to complete, and takes a screenshot of the logged-in page.

 

Running Automated Tests

Puppeteer is also an excellent tool for running automated tests. For instance, you can use Puppeteer in combination with a testing framework like Jest to perform end-to-end testing.

 

First, install Jest:

npm install jest

 

Then, create a test file app.test.js:

const puppeteer = require('puppeteer');

describe('Google', () => {
  let browser;
  let page;

  beforeAll(async () => {
    browser = await puppeteer.launch();
    page = await browser.newPage();
    await page.goto('https://google.com');
  });

  afterAll(async () => {
    await browser.close();
  });

  it('should display "Google" text on the page', async () => {
    await page.waitForSelector('title');
    const title = await page.title();
    expect(title).toBe('Google');
  });
});

 

Add a test script to your package.json:

"scripts": {
  "test": "jest"
}

 

Run the tests:

 

npm test

 

This setup will run a simple test to check if the title of the Google homepage is "Google".

 

Conclusion

Puppeteer is a versatile and powerful library for browser automation. Whether you need to take screenshots, scrape data, automate form submissions, or run automated tests, Puppeteer has you covered. By integrating Puppeteer into your Node.js projects, you can significantly enhance your ability to interact with web pages programmatically.

 

 

716 views

Please Login to create a Question