cheerio

Cheerio is a library for convenient web scraping, providing a syntax similar to jQuery, making it easy to parse HTML data. It offers methods to asynchronously retrieve and process data and allows you to extract necessary information using selectors.

Features

  1. Familiar syntax: Cheerio implements a part of the core jQuery. It eliminates all the inconsistencies of the jQuery library with the DOM and the quirks of browsers, providing a beautiful and intuitive API.

  2. High-speed processing: Cheerio operates on a very simple and consistent DOM model. As a result, parsing, manipulation, and rendering are incredibly efficient.

  3. Highly flexible: Cheerio operates based on the parse5 parser and can optionally use @FB55's lenient htmlparser2. This allows Cheerio to parse almost any HTML or XML document.


Installation

First, install npm cheerio to make it available for use.

Installation

npm install cheerio

Usage

You can easily use Cheerio by loading the necessary modules and following its syntax.

  • Name
    Cheerio Usage
    Type
    Usage
    Description

    After loading the HTML of a web page using Cheerio, you can select and process the required elements in a way similar to jQuery.

Cautions

  • Name
    String Encoding
    Type
    Cautions
    Description

    If the string encoding of the web page is incorrect, there is a possibility of issues like garbled text. If needed, use encoding libraries to convert the strings appropriately.

  • Name
    Dynamic Page Handling
    Type
    Cautions
    Description

    Cheerio is suitable for scraping static HTML pages, but if you need to retrieve the content of dynamically rendered pages with JavaScript, you'll need to use separate libraries.

  • Name
    Compliance with Laws and Service Agreements
    Type
    Cautions
    Description

    When performing web scraping, ensure compliance with the laws and service agreements of the respective websites, and handle confidential information and copyrighted content appropriately.

Usage

const cheerio = require('cheerio');

const htmlString = '<h1>Hello, Cheerio!</h1>';
const $ = cheerio.load(htmlString);

const heading = $('h1');

const text = heading.text();

const attributeValue = $('input').attr('type');

$('body').append('<p>New paragraph</p>');

$('p').remove();

Example

This is the code that processes API requests on the server. However, it uses Cheerio through lib/lostark/scrapper to perform web scraping and provide users with character information for the game "Lost Ark."

To obtain character search results based on user needs, a minimum of 700 lines of Cheerio code is required. Additionally, getting all the information beyond the minimum will require even more code.

Feel free to read only about the functionality of Cheerio, as there might be unnecessary code that can be removed or better approaches.

Additional Explanation

  • Name
    HTML Loading and Cheerio Object Generation
    Type
    Example
    Description
    • The getCharacterInfo function generates the URL of the web page based on the input character name.
    • It uses the getHTML function to load the web page. The axios.get function is responsible for retrieving the HTML code of the web page.
    • The obtained HTML code is passed to the cheerio.load function to generate a Cheerio object. This object allows selecting and manipulating elements within the web page.
  • Name
    Selection of Specific Elements and Information Extraction
    Type
    Example
    Description
    • Using the Cheerio object and a selector, it selects HTML elements from the web page containing the necessary information.
    • It extracts the content and attributes of the selected elements and saves them in a dictionary object.
  • Name
    Modification and Addition of Element Content
    Type
    Example
    Description
    • It finds the elements that need to be modified or added.
    • It then modifies the content of the found elements or adds new elements.

searchCharacter.js

import { searchCharacter } from '@/lib/lostark/scrapper'

export default async function handler(req, res) {
    try {
        const { name } = req.query
        const result = await searchCharacter(name)
        return res.status(200).json({ data: result })
    } catch (error) {
        console.error(error)
        return res
        .status(500)
        .json({ message: 'Error retrieving character information' })
    }
}

scrapper.js

const axios = require('axios')
const cheerio = require('cheerio')

const NAME = 'name'
const URL_HEADER = 'NEXT_LOSTARK_WEB_URL'

const getHTML = async (url) => {
    try {
        return await axios.get(url)
    } catch (error) {
        console.log(error)
    }
}
const getCharacterInfo = (name) => {
    const URL_PARAMS = `${name}`
    return URL_HEADER + encodeURI(URL_PARAMS)
}
const searchCharacter = async (name) => {
    const URL = getCharacterInfo(name)
    return await getHTML(URL).then((html) => {
        // 아래 LostArkResult 결과는 예시입니다.
        let LostArkResult = {
            Character: '',
            profileEngraveSet: []
        }
        const $ = cheerio.load(html.data, { xmlMode: false })
        let profileEngraveSet = []
        $(
        '.profile-ability-engrave > .swiper-container > .swiper-wrapper > .swiper-slide',
        ).each((i, li) => {
        const children = $(li).children()
            children.each((i, child) => {
                const title = $(child).find('span').text()
                const desc = $(child).find('div').find('p').text()

                profileEngraveSet.push({ title: title, content: desc })
            })
        })
        // Character는 예시입니다.
        const Character = $(
            '#lui-tab1-1 > div > div.collection-graph > div > h4 > span',
        )
    }
    return LostArkResult
}
module.exports = {
    NAME,
    searchCharacter,
}