Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
April 28, 2023 03:01 am GMT

The new version of x-crawl v7 has been released!

x-crawl

x-crawl is a flexible Node.js multipurpose crawler library. Flexible usage and numerous functions can help you quickly, safely and stably crawl pages, interfaces and files.

If you also like x-crawl, you can give x-crawl repository a star to support it, thank you for your support!

GitHub: https://github.com/coder-hxl/x-crawl

Breaking Changes

  • Fingerprint upgrade:
    • The fingerprint of the advanced writing method is renamed to fingerprints, which is an array writing method, which stores objects of the DetailTargetFingerprintCommon type, which is convenient for customization. Internally, the objects inside will be randomly assigned to the target.
    • Adjustment of crawlPage fingerprint options: the maximum width and height of the fingerprint configuration of advanced writing and detailed target writing are changed to optional.
  • Proxy upgrade: create a crawler instance, change the proxy of the advanced writing method and the detailed target writing method to the object writing method, with three attributes: urls, switchByHttpStatus and switchByErrorCount, urls can set multiple proxy URLs, and the internal default uses the first one first, switchByHttpStatus Set which non-compliant response status codes need to switch the proxy, and switchByErrorCount sets how many times the proxy needs to be switched when errors such as timeouts arrive. The proxy rotation feature needs to be used with error retries.
  • Return value type adjustment: CrawlCommonRes, CrawlPageSingleRes, CrawlDataSingleRes and CrawlFileSingleRes are renamed to CrawlCommonResult, CrawlPageSingleResult, CrawlDataSingleResult and CrawlFileSingleResult respectively

Features

  • It is possible to cancel the configuration of the upper-level unified setting by setting null in the option.
  • The userAgent option in DetailTargetFingerprintCommon overrides the object notation and allows customization of the maximum and minimum values of the major version, minor version, and revision number inside. Each crawl target gets a new userAgent .
  • A new proxyDetails property is added to the crawling results to record the proxy status.
  • Added 'random' attribute value to mobile option of fingerprint configuration, allowing internal randomization.
  • Terminal prompts are simplified and color adjusted.

Bug fixes

  • Unable to create multiple levels of non-existent folders on linux systems.

Original Link: https://dev.to/coderhxl/the-new-version-of-x-crawl-v7-has-been-released-1053

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To