Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
January 23, 2022 05:56 pm GMT

How to create an Instagram Scraping Bot with Python

THIS ARTICLE WAS ORIGINALLY POSTED ON THE SOCIALSCRAPE BLOG

How to scrape Instagram
Instagram is one of the world's largest social media platforms, developed for users to share moments, market new products, and ultimately create their aesthetic reality. With almost 900 million user profiles, it also happens to be a great place to scrape data, and find new trends in any category you can think of!

In this article, you are going to understand how to create an Instagram scraping bot in Python using instaPy. You'll also learn how you can scrape Instagram hassle-free with SocialScrape. SocialScrape is an inexpensive content scraping API, with a super friendly UI, that allows you to scrape Instagram in just one step! You even get your first 100 calls free!

Installation & Requirements

  • Have Python2.7 or Python3 installed on your local machine

  • pip or pip3 installed on your local machine

  • Have Firefox browser installed on your local machine

  • Some knowledge of Python/Data manipulation

Why scrape Instagram?
Unfortunately, Instagram is one platform that can be difficult to scrape. For starters, it's not guaranteed that an account will hold an abundance of data, and even the ones that are full of data don't have much to choose from. Instagram also uses powerful tools to prevent bottling and catches on easily to "over-active" profiles. But, that's not to say it isn't impossible to do! So, the question remains why would you want to scrape Instagram?

By scraping user profiles developers can find endless data trends. Due to its ever-increasing user profile count, Instagram estimates to be home to over a billion posts. And with that much digital media, captions, hashtags, etc. developers and data scientists can learn an immense amount about a population, regardless, data-abundant or not.

Are you, or possibly someone you're writing this script for, an IG influencer? Or, perhaps you're interested in growing a follower count and need to understand the basics of "How to be an influencer". By creating a script that looks for users with 10,000+ followers using the hashtag #beauty, you could find what trends exist amongst other IG beauty influencers through media/captions. Not to mention, find out what the #beauty audience responds most positively to by examining the like count on each media. Once you have all this information, you can even begin thinking about creating bots that execute interactions with real-time users utilizing the obtained data. The opportunities are truly infinite.

Now that you have some ideas about what you want to do with your data, let's start scraping!

How to create an Instagram Scraping Bot
This bot will be created using Python, if you are searching to scrape Instagram in other languages such as Node, Ruby, PHP, or Shell. Try heading over to SocialScrape!

For this example, you'll be creating a script that finds the most up-to-date data trends for the #beauty and #fashion hashtags/audience. Feel free to code along yourself, or follow along with this github repository.

  1. Open up your favorite code editor and navigate to a directory you'd like to create your scraping bot in, or create a new ig_scraping directory. Next inside your ig_scraping directory, create a ig_scraping_bot.py file, this is where you'll write your script.

To create this script, you'll first need to import instaPy into the file. You can install instaPy by running:

If you are using pip3, you may have to run pip3 install instapy instead.

pip install instapy

  1. Import instaPy library into the file.

  2. Instantiate instaPy by passing in an account username and password. You can set the instance to a variable named session.

It is not recommended to perform scraping on an important/personal account!

  1. Login to the designated scraping IG account using instaPy's login() method.

  2. Use the set relationship_bounds method to determine the attributes a user must have to determine whether you would like to scrape, _in this case also like, it.

  3. If the user passes the set_relationship_bounds, the post will be liked and returned in the response.

  4. Now that you have some response data from posts you've liked, you can observe the output of creating scripts based on the trend data you're looking for.

Here is an example of what your scraping bot may look like:

from instapy import InstaPy#Turn on VPN or use requests package to create (rotating) proxies# PASS IN USERNAME, PASSWORD, and TWO HASHTAGS you would like to scrape fromdef run_ig_scraping_session(username, password, hashtag1, hashtag2):    # 1. session ->    # create a session variable from the InstaPy initialization    session = InstaPy(username=username, password=password)    # 2. session.login ->    #Login into IG with username and password given above.    session.login()    # 3. session.set_relationship_bounds ->    # (from InstaPy Docs:)    # THIS IS USED TO CHECK THE NUMBER OF    # FOLLOWERS AND/OR FOLLOWING A USER HAS AND IF    # THESE NUMBERS EITHER EXCEED THE NUMBER SET OR    # DOES NOT PASS THE NUMBER SET OR IF THEIR RATIO    # DOES NOT REACH DESIRED POTENCY RATIO THEN NO    # FURTHER INTERACTION HAPPENS        # Arguments            # potency_ratio: following is higher than follower count            # delimit_by_numbers: is used to activate & deactivate the usage of max & min values            # max_followers: maximum of follower count user can have            # max_following: maximum following count user can have            # min_followers: minimum follower count user can have            # min_following: minimum following count user can have            # min_posts: minimum amount of posts user can have            # max_posts: maximum amount of posts user can have    session.set_relationship_bounds(enabled=True,                    potency_ratio=1.34,                    delimit_by_numbers=True,                    max_followers=20000,                    max_following=4000,                        min_followers=4000,                        min_following=800,                        min_posts=20,                max_posts=1000)    # 4. session.like_by_tags ->    # Establish a liked_photos variable that will return the information liked pictures data    # created by the session.like_by_tags method, this method will only execute if the post falls    # within the set_relationship_bounds set above        # Arguments:        # amount: Amount of photos set to like for each hashtag passed        # daysold: How many days old the post is allowed to be        # max_pic: The maximum amount back the picture is allowed to be found on the users profile (i.e. The 20th most recent pic)    liked_photos = session.like_by_tags([hashtag1, hashtag2], amount=10)    # 5. Return liked photos that the method above finds based on the arguments passed    return "CONTENT: ", liked_photos    # 6. Now that you have the returned data, you can see what trends are found in the media post    # (such as trends found in captions, content type, etc.)run_ig_scraping_session(, , "Fashion", "Beauty")

Once you've run your script, your response should look something like this, keep in mind the responses can get lengthy, so make sure to take advantage of the amount argument passed in above :

InstaPy Version: 0.6.15 ._.  ._.  ._.  ._.  ._.  ._.  ._.  ._.  ._.Workspace in use: "/Users/user1/InstaPy"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOINFO [2022-01-17 15:46:47] [<Your Username>]  Session started!ooooooooooooooooooooooooooooooooooooooooooooooooooooooINFO [2022-01-17 15:46:47] [<Your Username>]  -- Connection Checklist [1/2] (Internet Connection Status)INFO [2022-01-17 15:46:48] [<Your Username>]  - Internet Connection Status: okINFO [2022-01-17 15:46:48] [<Your Username>]  - Current IP is "71.127.218.114" and it's from "United States/US"INFO [2022-01-17 15:46:48] [<Your Username>]  -- Connection Checklist [2/2] (Hide Selenium Extension)INFO [2022-01-17 15:46:48] [<Your Username>]  - window.navigator.webdriver response: TrueWARNING [2022-01-17 15:46:48] [<Your Username>]  - Hide Selenium Extension: errorINFO [2022-01-17 15:46:55] [<Your Username>]  - Cookie file for user '<Your Username>' loaded...................................................................INFO [2022-01-17 15:47:21] [<Your Username>]  Logged in successfully!''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''INFO [2022-01-17 15:47:21] [<Your Username>]  Saving account progress...INFO [2022-01-17 15:47:25] [<Your Username>]  Tag [1/2]INFO [2022-01-17 15:47:25] [<Your Username>]  --> b'fashion'/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/selenium/webdriver/remote/webelement.py:359: UserWarning: find_elements_by_* commands are deprecated. Please use find_elements() instead  warnings.warn("find_elements_by_* commands are deprecated. Please use find_elements() instead")INFO [2022-01-17 15:47:36] [<Your Username>]  desired amount: 11  |  top posts [disabled]: 9  |  possible posts: 1043222855/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/selenium/webdriver/remote/webelement.py:426: UserWarning: find_elements_by_* commands are deprecated. Please use find_elements() instead  warnings.warn("find_elements_by_* commands are deprecated. Please use find_elements() instead")INFO [2022-01-17 15:47:38] [<Your Username>]  Verifying media type: ('Photo', 'Carousel', 'Video', 'IGTV', 'Clip')INFO [2022-01-17 15:47:38] [<Your Username>]  Found media type: PhotoINFO [2022-01-17 15:47:38] [<Your Username>]  Verifying media type: ('Photo', 'Carousel', 'Video', 'IGTV', 'Clip')INFO [2022-01-17 15:47:38] [<Your Username>]  Found media type: PhotoINFO [2022-01-17 15:47:38] [<Your Username>]  Verifying media type: ('Photo', 'Carousel', 'Video', 'IGTV', 'Clip')INFO [2022-01-17 15:47:38] [<Your Username>]  Found media type: PhotoINFO [2022-01-17 15:47:38] [<Your Username>]  Verifying media type: ('Photo', 'Carousel', 'Video', 'IGTV', 'Clip')INFO [2022-01-17 15:47:38] [<Your Username>]  Found media type: Carousel - Video - IGTV/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/selenium/webdriver/remote/webelement.py:393: UserWarning: find_element_by_* commands are deprecated. Please use find_element() instead  warnings.warn("find_element_by_* commands are deprecated. Please use find_element() instead")INFO [2022-01-17 15:47:38] [<Your Username>]  Post category: CarouselINFO [2022-01-17 15:47:38] [<Your Username>]  Verifying media type: ('Photo', 'Carousel', 'Video', 'IGTV', 'Clip')INFO [2022-01-17 15:47:38] [<Your Username>]  Found media type: Carousel - Video - IGTVINFO [2022-01-17 15:47:38] [<Your Username>]  Post category: CarouselINFO [2022-01-17 15:47:38] [<Your Username>]  Verifying media type: ('Photo', 'Carousel', 'Video', 'IGTV', 'Clip')INFO [2022-01-17 15:47:38] [<Your Username>]  Found media type: PhotoINFO [2022-01-17 15:47:38] [<Your Username>]  Verifying media type: ('Photo', 'Carousel', 'Video', 'IGTV', 'Clip')INFO [2022-01-17 15:47:38] [<Your Username>]  Found media type: Carousel - Video - IGTVINFO [2022-01-17 15:47:38] [<Your Username>]  Post category: CarouselINFO [2022-01-17 15:47:38] [<Your Username>]  Verifying media type: ('Photo', 'Carousel', 'Video', 'IGTV', 'Clip')INFO [2022-01-17 15:47:38] [<Your Username>]  Found media type: PhotoINFO [2022-01-17 15:47:38] [<Your Username>]  Verifying media type: ('Photo', 'Carousel', 'Video', 'IGTV', 'Clip')INFO [2022-01-17 15:47:38] [<Your Username>]  Found media type: PhotoINFO [2022-01-17 15:47:38] [<Your Username>]  Verifying media type: ('Photo', 'Carousel', 'Video', 'IGTV', 'Clip')INFO [2022-01-17 15:47:38] [<Your Username>]  Found media type: Carousel - Video - IGTVINFO [2022-01-17 15:47:38] [<Your Username>]  Post category: CarouselINFO [2022-01-17 15:47:38] [<Your Username>]  Verifying media type: ('Photo', 'Carousel', 'Video', 'IGTV', 'Clip')INFO [2022-01-17 15:47:39] [<Your Username>]  Found media type: Carousel - Video - IGTVINFO [2022-01-17 15:47:39] [<Your Username>]  Post category: CarouselINFO [2022-01-17 15:47:39] [<Your Username>]  Verifying media type: ('Photo', 'Carousel', 'Video', 'IGTV', 'Clip')INFO [2022-01-17 15:47:39] [<Your Username>]  Found media type: PhotoINFO [2022-01-17 15:47:39] [<Your Username>]  Verifying media type: ('Photo', 'Carousel', 'Video', 'IGTV', 'Clip')INFO [2022-01-17 15:47:39] [<Your Username>]  Found media type: Carousel - Video - IGTVINFO [2022-01-17 15:47:39] [<Your Username>]  Post category: CarouselINFO [2022-01-17 15:47:39] [<Your Username>]  Verifying media type: ('Photo', 'Carousel', 'Video', 'IGTV', 'Clip')INFO [2022-01-17 15:47:39] [<Your Username>]  Found media type: Carousel - Video - IGTVINFO [2022-01-17 15:47:39] [<Your Username>]  Post category: VideoINFO [2022-01-17 15:47:39] [<Your Username>]  Links retrieved:: [1/https://www.instagram.com/p/CY2Dd3HJ5zz/]INFO [2022-01-17 15:47:39] [<Your Username>]  Links retrieved:: [2/https://www.instagram.com/p/CY19bPxr2k4/]INFO [2022-01-17 15:47:39] [<Your Username>]  Links retrieved:: [3/https://www.instagram.com/p/CY2A408lT3n/]INFO [2022-01-17 15:47:39] [<Your Username>]  Links retrieved:: [4/https://www.instagram.com/p/CY2CVJSNlwk/]INFO [2022-01-17 15:47:39] [<Your Username>]  Links retrieved:: [5/https://www.instagram.com/p/CY183WwosJb/]INFO [2022-01-17 15:47:39] [<Your Username>]  Links retrieved:: [6/https://www.instagram.com/p/CY1_gwktViY/]INFO [2022-01-17 15:47:39] [<Your Username>]  Links retrieved:: [7/https://www.instagram.com/p/CY2CGKLOdUs/]INFO [2022-01-17 15:47:39] [<Your Username>]  Links retrieved:: [8/https://www.instagram.com/p/CY2B5_ArZnZ/]INFO [2022-01-17 15:47:39] [<Your Username>]  Links retrieved:: [9/https://www.instagram.com/p/CY2BPldKhZL/]INFO [2022-01-17 15:47:39] [<Your Username>]  Links retrieved:: [10/https://www.instagram.com/p/CY2Fp5SqaGB/]INFO [2022-01-17 15:47:39] [<Your Username>]  Links retrieved:: [11/https://www.instagram.com/p/CY2FqIUM1SD/]INFO [2022-01-17 15:47:39] [<Your Username>]  Links retrieved:: [12/https://www.instagram.com/p/CY2Fpo6q6KF/]INFO [2022-01-17 15:47:44] [<Your Username>]  Like# [1/11]INFO [2022-01-17 15:47:44] [<Your Username>]  https://www.instagram.com/p/CY2Fp5SqaGB/INFO [2022-01-17 15:47:47] [<Your Username>]  Image from: b'estela_fashionmodel'INFO [2022-01-17 15:47:47] [<Your Username>]  Image link: b'https://www.instagram.com/p/CY2Fp5SqaGB/'INFO [2022-01-17 15:47:47] [<Your Username>]  Description: b'Thank you!\xf0\x9f\x98\x8d @littleattitudeofficial \xf0\x9f\x98\x8d 

How beautiful this sports set, its fantastic, cozy and above all cotton!

You can find this and many other clothes only at @littleattitudeofficial !\xf0\x9f\xa4\xa9

#gift #collab #amazing #beautiful #model #modellife #modeling #fashion #fashionstyle #fashionista #fashionblogger #fashionable #love #life #live #blogger @estela_fashionmodel \xf0\x9f\xa4\x8d@littleattitudeofficial \xf0\x9f\x92\x9a'INFO [2022-01-17 15:47:51] [<Your Username>] User: 'estela_fashionmodel' |> followers: 26790 |> following: 849 |> relationship ratio: 31.55INFO [2022-01-17 15:47:51] [<Your Username>] User 'estela_fashionmodel's followers count exceeds maximum limit ~skipping userINFO [2022-01-17 15:47:51] [<Your Username>] Like# [2/11]INFO [2022-01-17 15:47:51] [<Your Username>] https://www.instagram.com/p/CY2FqIUM1SD/INFO [2022-01-17 15:47:55] [<Your Username>] Image from: b'sagavai_black'INFO [2022-01-17 15:47:55] [<Your Username>] Image link: b'https://www.instagram.com/p/CY2FqIUM1SD/'INFO [2022-01-17 15:47:55] [<Your Username>] Description: b'\xd0\x9f\xd1\x83\xd1\x81\xd1\x82\xd1\x8c \xd0\xbe\xd1\x81\xd1\x82\xd0\xb0\xd0\xbd\xd0\xb5\xd1\x82\xd1\x81\xd1\x8f \xd1\x82\xd1\x83\xd1\x82!!
me #love #instadaily #selfie #photooftheday #fun #followme #smile #summer #swag #instalike #igers #tbt #picoftheday #follow4follow #fashion #like4like #follow #instagood #amazing #cute #friends #bestoftheday #happy #instatag #l4l #beautiful #likeforlike#sagavai_blak #'INFO [2022-01-17 15:47:59] [<Your Username>] User: 'sagavai_black' |> followers: 5481 |> following: 125 |> relationship ratio: 43.84INFO [2022-01-17 15:47:59] [<Your Username>] User 'sagavai_black's following count is less than minimum limit ~skipping userINFO [2022-01-17 15:48:10] [<Your Username>] https://www.instagram.com/p/CY2FqDWhwkI/INFO [2022-01-17 15:48:18] [<Your Username>] Image from: b'mr_amaan967'

While the response snippet is quite long, let's break down what data is received back.

  • InstaPy confirms a successful Login attempt to Instagram with the given credentials
  • 11 posts are retrieved based on the hashtags passed through the like_by_tags method. You may ask why 11, when we specified 5 for each hashtag, well that's because InstaPy likes the first post from the Explore page.
  • The links to those 11 posts
  • A description of each post including the caption, hashtags used, tagged_users, location, and more depending on what the user chooses to expose.
  • You also may notice that some scraped posts respond with "Unavailable Page". This can be due to several reasons, but it just may mean the user is private.
  • For this example, I did not include all 11 posts in the response, but rather only the first four to avoid writing an entire response epic. Congrats on making a Python Instagram scraping bot!

Scrape Instagram with SocialScrape
Remember how at the beginning of this article, we mentioned that there is in fact, an incredibly easy way to scrape Instagram? Well, let's loop back around to that again, and talk about SocialScrape!

Although you may have just made an awesome IG scraping bot of your own, chances are you are going to have to make a lot of repairs along the way. Instagram is a stickler about scraping and does whatever it can to prevent it. If not cautious, the account could be blocked from performing certain actions temporarily, or you could even be banned forever! This is why it discouraged to perform scraping on an important/personal account! But SocialScrape fixes that all for you.

  • SocialScrape allows you to paste in a user's "@" handle, and with just one execution call, all the data is ready for you to use. That's amazing!
  • If that's not enough, SocialScrape also handles the burden of rotating proxies for you.
  • SocialScrape includes numerous API endpoints that allow you to get all the information you need from any user.And as mentioned before, the first 100 calls are free!Check out the SocialScrape getting started docs to learn more.

So, what are you waiting for? Instagram data is awaiting to be scraped.


Original Link: https://dev.to/socialscrape/how-to-create-an-instagram-scraping-bot-with-python-45c8

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To