Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
August 21, 2020 02:50 am GMT

Scrape sever-side rendered HTML content with JavaScript

Scraping can be used to collect and analyse data from sources that dont have APIs.

In this tutorial well scrape content using JavaScript from a website thats rendered server-side.

Youll need to have Node.js and npm installed if you havent already.

Lets start by creating a project folder and initialising it with a package.json file:

mkdir scrapernpm init -y

Well be using two packages to build our scraper script.

  • axios Promise based HTTP client for the browser and node.js.
  • cheerio Implementation of jQuery designed for the server (makes it easy to work with the DOM).

Install the packages by running the following command:

npm install axios cheerio --save

Next create a file called scrape.js and include the packages we just installed:

const axios = require("axios");const cheerio = require("cheerio");

In this example ill be using https://lobste.rs/ as the data source to be scraped.

Inspecting the code the site name in the header has a cur_url class so lets see if we can scrape its text:

Alt Text

Add the following to scrape.js to fetch the HTML and log the title text if successful:

axios('https://lobste.rs/')  .then((response) => {    const html = response.data;    const $ = cheerio.load(html);        const title = $(".cur_url").text();       console.log(title);  })  .catch(console.error);

Run the script with the following command and you should see Lobsters logged in the terminal:

node scrape.js

If everythings working we can proceed to scrape some actual content from the website.

Lets get the titles, domains and points for each of the stories on the homepage by updating scrape.js:

axios("https://lobste.rs/")  .then((response) => {    const html = response.data;    const $ = cheerio.load(html);    const storyItem = $(".story");    const stories = [];    storyItem.each(function () {      const title = $(this).find(".u-url").text();      const domain = $(this).find(".domain").text();      const points = $(this).find(".score").text();      stories.push({        title,        domain,        points,      });    });    console.log(stories);  })  .catch(console.error);

This code loops through each of the stories, grabs the data, and then stores it in an array called stories.

If youve worked with jQuery then the selectors will be familiar, if not you can learn about them here.

Now re-run node scrape.js and you should see the data for each of the stories:

Alt Text


Original Link: https://dev.to/michaelburrows/scrape-sever-side-rendered-html-content-with-javascript-24bi

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To