Sources Contact Advanced Search Tutorials

An Interest In:

Web News this Week

Search Archive

Some of Our Sources

View All Sources

Help Webnuz

Referal links:

January 8, 2023 08:10 pm GMT

Chrome Powered Web Scraping with Puppeteer: Boosting Speed and Efficiency

Chrome Automation with Puppeteer : Scrape the Web with Style

Puppeteer is a Node.js library that provides a high-level API for controlling headless Chrome or Chromium over the DevTools Protocol. It is a powerful tool for web scraping because it allows you to scrape websites that use JavaScript, cookies, and other complex features that may not be possible to scrape with a traditional web scraper.

To use Puppeteer for web scraping, you will need to install it using npm (the Node Package Manager). Once installed, you can use Puppeteer in your Node.js script to programmatically control a headless Chrome browser and perform web scraping tasks.

Here is a simple example of how to use Puppeteer to scrape a webpage:

const puppeteer = require('puppeteer');(async () => {  const browser = await puppeteer.launch();  const page = await browser.newPage();  await page.goto('https://radiojavan.com');  // Extract data from the page  const data = await page.evaluate(() => {    const name = document.querySelector('h1').textContent;    const price = document.querySelector('.price').textContent;    return { name, price };  });  console.log(data);  await browser.close();})();

In this example, Puppeteer is used to open a new page in a headless Chrome browser, navigate to the specified URL, and then extract data from the page by using DOM manipulation methods like querySelector. The extracted data is stored in an object and logged to the console.

Puppeteer also provides many other useful features for web scraping, such as the ability to handle cookies, manipulate the DOM, and simulate user events like clicks and form submissions. With these capabilities, Puppeteer can be used to scrape virtually any modern website.

Here's a real-life app to extract artist-names and song-titles and output in array format to a text file...

const puppeteer = require('puppeteer');const fs = require('fs');const { Console } = require('console');(async () => {    // browser config    const browser = await puppeteer.launch({headless:true,        args: [        '--start-maximized',        ],        defaultViewport: null});    const page = await browser.newPage();    await page.goto('https://www.radiojavan.com/');    await page.waitForSelector('.grid');    await page.click("#featuredPlaylists > div.grid > a:nth-child(3) > img");    // node    const info = await page.evaluate(()=>{        const songs = document.querySelectorAll(".song");        const artists = document.querySelectorAll(".artist");        // song array        let song = []        songs.forEach(element => {            let x = element.textContent.trim()            song.push(x)        });        // artist array        let artist = []        artists.forEach(element => {            let x = element.textContent.trim()            artist.push(x)        });        res = song.concat(artist)        return res    })// file writeawait fs.writeFileSync("info.txt", info.join("\r
"))// console.log(info)browser.close()console.log("Success");})();

Original Link: https://dev.to/mzanbagh/chrome-r-powered-web-scraping-with-puppeteer-boosting-speed-and-efficiency-32bf

Share this article:

View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To