Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
May 30, 2022 09:20 pm GMT

Crawling Thousands of URL's? Import sys.

This post is mainly here to act as a mental note for me in the future! However, you might find it helpful.

I'm coding a bot that loops through a CSV file of about 15,000 URL's, these are added to a Set when successfully scraped.

But, when Firefox or Chrome driver couldn't load the website, my bot would require a restart. The scraper function would need to restart again and check to see if the URL was in the Set.

This would throw an exception saying something to the effect of 'maximum recursion depth exceeded'

If you get this error when running your Python code try this:

import sys
sys.setrecursionlimit(40000)

Courtesy of coderjack this will increase the capacity of the stack and allow the code to run.

Be careful with the number you set this to, especially if you are on an old machine. The spinning circle may pay a visit.


Original Link: https://dev.to/olney1/crawling-thousands-of-urls-import-sys-4fk7

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To