Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
November 13, 2022 12:35 am GMT

Reddit Monitoring with Python

Originally published at https://segue.co/blog/reddit-monitoring-python

All source code can be found here:
https://github.com/theleonwei/reddit_bot

Introduction:
Reddit is the second-most popular website in the United States, with more than 300 million unique visitors per month.

It's also one of the most trafficked sites on the internet and has become an important part of online marketing strategy for brands across industries.

This article will show you how to programmatically set up a keyword monitor service on Reddit using Python and the PRAW library.

You will also learn how to set up this web service using Django and run it on your local machine to monitor Reddit, save the leads into a database automatically.

All of the code and step-by-step instructions are based on the assumption that you are on a Mac.

Table of contents:

  • Setting up the Django Project with the CookieCutter template
  • Register a Reddit application and install the PRAW library
  • Keyword monitoring with regular expression
  • Persisting data to a Postgres database
  • Leads report view
  • Schedule a cron job to check Reddit periodically

Setting up the Django Project with the CookieCutter template
Cookiecutter Django is a framework for jumpstarting production-ready Django projects quickly.

To learn more, you can visit their official website on Github:

https://github.com/cookiecutter/cookiecutter-django

Step 1: install cookiecutter.
Open up your favorite terminal app (mine is iTerm2), and install the latest cookiecutter.

pip install "cookiecutter>=1.7.0"

Step 2: start a Django project.

cookiecutter https://github.com/cookiecutter/cookiecutter-django

Follow the instruction, answer the questions, and set up the project; here are my choices:

project_name [My Awesome Project]: Reddit Botproject_slug [reddit_bot]:description [Behold My Awesome Project!]: My awesome Reddit Bot Projectauthor_name [Daniel Roy Greenfeld]: Leon Wdomain_name [example.com]:email [[email protected]]:version [0.1.0]:Select open_source_license:1 - MIT2 - BSD3 - GPLv34 - Apache Software License 2.05 - Not open sourceChoose from 1, 2, 3, 4, 5 [1]: 5timezone [UTC]: US/Pacificwindows [n]:use_pycharm [n]: yuse_docker [n]:Select postgresql_version:1 - 142 - 133 - 124 - 115 - 10Choose from 1, 2, 3, 4, 5 [1]:Select cloud_provider:1 - AWS2 - GCP3 - NoneChoose from 1, 2, 3 [1]:Select mail_service:1 - Mailgun2 - Amazon SES3 - Mailjet4 - Mandrill5 - Postmark6 - Sendgrid7 - SendinBlue8 - SparkPost9 - Other SMTPChoose from 1, 2, 3, 4, 5, 6, 7, 8, 9 [1]:use_async [n]: nuse_drf [n]: nSelect frontend_pipeline:1 - None2 - Django Compressor3 - GulpChoose from 1, 2, 3 [1]:use_celery [n]: nuse_mailhog [n]: nuse_sentry [n]: nuse_whitenoise [n]: nuse_heroku [n]: ySelect ci_tool:1 - None2 - Travis3 - Gitlab4 - GithubChoose from 1, 2, 3, 4 [1]:keep_local_envs_in_vcs [y]: ndebug [n]: n [SUCCESS]: Project initialized, keep up the good work!

Step 3: Install dependencies

cd reddit_bot; ls

Those are the files and directories that we have got so far.

Procfile  locale  reddit_bot  setup.cfgREADME.md  manage.py  requirements  utilityconfig  merge_production_dotenvs_in_dotenv.py requirements.txtdocs  pytest.ini  runtime.txt

Step 3.1: Create a virtual environment.

reddit_bot  python3 -m venv ./venv

After, you should see a newly created venv folder.

Procfile  locale reddit_bot  setup.cfgREADME.md  manage.py requirements  utilityconfig  merge_production_dotenvs_in_dotenv.py requirements.txt  venvdocs  pytest.ini  runtime.txt

Step 3.2: Activate the virtual environment and install all dependencies.

source venv/bin/activate

Notice there is a (venv) prompt, which means you have successfully activated the virtual environment.

Note: if you have not installed a Postgres database on your Mac, you must install it first.

Check this article on Postgres installation on Mac for more details.

Install the dependencies from the local.txt files (slightly different than production requirements as it gives you more tools for debugging and testing.)

(venv) reddit_bot (venv) reddit_bot  pip install -r requirements/local.txt

Step 3.3: Create a local database reddit_bot

(venv) reddit_bot  createdb reddit_bot

After that, start the Django web server for testing.

(venv) reddit_bot  python manage.py runserver

You will probably see something like the following.

Watching for file changes with StatReloaderINFO 2022-09-17 11:28:04,131 autoreload 17789 4335895936 Watching for file changes with StatReloaderPerforming system checks...System check identified no issues (0 silenced).You have 28 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): account, admin, auth, contenttypes, sessions, sites, socialaccount, users.Run 'python manage.py migrate' to apply them.September 17, 2022 - 11:28:04Django version 3.2.15, using settings 'config.settings.local'Starting development server at http://127.0.0.1:8000/Quit the server with CONTROL-C.[17/Sep/2022 11:28:16] "GET / HTTP/1.1" 200 13541[17/Sep/2022 11:28:16] "GET /static/debug_toolbar/css/toolbar.css HTTP/1.1" 200 11815[17/Sep/2022 11:28:16] "GET /static/css/project.css HTTP/1.1" 200 228[17/Sep/2022 11:28:16] "GET /static/debug_toolbar/css/print.css HTTP/1.1" 200 43[17/Sep/2022 11:28:16] "GET /static/debug_toolbar/js/toolbar.js HTTP/1.1" 200 12528[17/Sep/2022 11:28:16] "GET /static/js/project.js HTTP/1.1" 200 45[17/Sep/2022 11:28:16] "GET /static/debug_toolbar/js/utils.js HTTP/1.1" 200 4479[17/Sep/2022 11:28:16] "GET /static/images/favicons/favicon.ico HTTP/1.1" 200 8348

Since the database is brand new, we need to initialize it with some built-in Django tables.

(venv) reddit_bot  python manage.py migrate

Once the migration is completed, restart the server.

(venv) reddit_bot  python manage.py migrate

open your favorite browser (I am using the latest Chrome) and visit localhost:8000 make sure you see the website succesfully.

Congratulations on finishing setting up your local Django server; next, let's set up our Reddit account and install the Reddit API library: PRAW.

Register a Reddit application and install the PRAW library
We assume you already have a Reddit account to set up a Reddit developer account. If not, simply visit https://reddit.com and create one, then come back.

You must first register an application of the appropriate type on Reddit.

Then Visit https://www.reddit.com/prefs/apps/

Note: sometimes, I am having some page redirect issues when visiting the above page; if that happens to you, try visiting the following instead:

https://old.reddit.com/prefs/apps/

Scroll down and click the create another app... button

reddit monitoring tutorial | register application

For the name of your app, anything should be fine

On the app type: since we will be building something running in the backend, choose script.

And for redirect uri, enter http://localhost:8000

Then click the create app button to finish this step.

reddit monitoring app client secret | segue.co

There are two tokens that we will need for our service to run:

  1. client id: the one beneath personal use script

  2. client secret: the one to the right of secret

Finally, we need to install the PRAW library and try to connect with Reddit using the secret keys from the last step.

(venv) reddit_bot  pip install praw

Next, we append PRAW to the dependency (otherwise, the service won't work when we deploy to production:

(venv) reddit_bot  pip freeze | grep praw >> requirements/base.txt

It's a common best practice not to save sensitive information such as your app secret tokens in the git repository, so let's create a new file .env at the root of your project so we can access the app secret on the local machine.

(venv) reddit_bot  vim .env

Replace the secret and the client_id with yours from the last step.

For the user agent, it's not that important. To find out exactly what your user agent is, simply go to google and type "find my user agent" and copy and paste yours into the .env file.

Next, make sure you add .env into the list of files that will not be checked into the git repository by editing the .gitignore file.

(venv) reddit_bot  vim .gitignore

And add .env to the file; after that, run the following to load the environment variables.

Test your Reddit connection. In your terminal, run the following:

(venv) reddit_bot  source .env

We also need to load the variables into Django's environment variables.

Open up the config/settings/local.py file (I am using Pycharm) and add the following:

# Reddit settings:REDDIT_SECRET=env('SECRET')REDDIT_CLIENT_ID=env('CLIENT_ID')REDDIT_USERAGENT=env('REDDIT_USERAGENT')

Reddit Monitoring Django Load Reddit App Secret | segue.co

In the terminal

Launch the Django console

(venv) reddit_bot  python manage.py shellIn [1]: from django.conf import settingsIn [2]: import prawIn [3]: reddit = praw.Reddit(   ...:     client_id=settings.REDDIT_CLIENT_ID,   ...:     client_secret=settings.REDDIT_SECRET,   ...:     user_agent=settings.REDDIT_USERAGENT,   ...: )

If you don't receive any error message, it means you have successfully created a Reddit instance through PRAW.

Next, let's run a simple task to check the connection.

In [4]: for submission in reddit.subreddit("marketing").hot(limit=10):   ...:     print(submission.title)   ...:New Job ListingsSorry if this isn't allowed, but I recently created a subreddit focused on the business side of art, and would love for people to go there and share their knowledge and experiences.I read privacy and policies of Tiktok, IG and Other Platforms. Heres what I learned about Social Media Platforms!Has anyone actually worked with an impressive agency?How to bring first customers to shop?Facebook ad numbers don't ad upBeginning my career in marketing, looking to go in to an AgencyI hate digital marketing - help me find a new roleWhere to start for ecom store?Making a website

If you see something similar to the above, congratulations, you've successfully connected with Reddit's official API and retrieved the top 10 hot posts from r/marketing, congratulations!

Keyword monitoring with regular expression
Assuming you run a Facebook ad agency and your target customers are new to Facebook ads, wouldn't it be nice if you could respond to someone who has questions about Facebook ads on Reddit?

Chiming in and joining a conversation on Reddit will help:

  1. Establish your reputation as a Facebook ads expert;

  2. Spread the word about your service to the world's largest online community and drive quality traffic to your website;

  3. If you get enough votes, your response may become a backlink to improve your SEO

For this article, we will show you how to find any posts whose title contains the phrase 'facebook'.

Of course, you can continue optimizing this matching rule and develop your own solution.

For a more advanced matching algorithm, feel free to check out Segue's Reddit lead generation engine based on a state-of-the-art NLP semantics search.

Again, we first open up a Django console.

(venv) reddit_bot  python manage.py shell

Inside of the Django console:

import prawfrom django.conf import settingsimport re # new, the python regular expression librarykeyword =  "facebook"reddit = praw.Reddit(     client_id=settings.REDDIT_CLIENT_ID,     client_secret=settings.REDDIT_SECRET,     user_agent=settings.REDDIT_USERAGENT,)for submission in reddit.subreddit("marketing").hot(limit=100): # we are searching 100 hottest posts on the marketing subreddit    if re.search(keyword, submission.title, re.IGNORECASE): # new, notice we are ignoring the case sensitivity        print(submission.title)

Your results may differ from mine (as we ran this search on September 18th, 2022).

In [2]: for submission in reddit.subreddit("marketing").hot(limit=100):   ...:     if re.search(keyword, submission.title, re.IGNORECASE): # new, notice we are ignoring the case sensitivity   ...:         print(submission.title)   ...:Facebook ad numbers don't ad upFacebook & Instagram Ads campaign SetupMeasuring the impact of FacebookHow many interests is too many Interest - Facebook AdsWe've found four discussions about Facebook without too much work. How awesome is it!Let's save those results and other metadata such as URLs, post date, and content into a database so we don't lose them.

Persisting data to a Postgres database.
Let's first create a new app

# In the root dir of your project(venv) reddit_bot  django-admin startapp reddit

Note: we also need to move the newly created app to the reddit_bot subdirectory. This step is important due to the way cookie-cutter structured our project.

mv reddit ./reddit_bot

Next, let's update the apps.py config file.

The default name is "reddit", we need to update it to "reddit_bot.reddit" since we have moved it from the root directory to the subdirectory.

And don't forget to include this app in the base.py config file.

Now let's open up the models.py file and add our first class.

class Lead(models.Model):    post_id = models.CharField(max_length=10) # Original post id    title = models.TextField()    content = models.TextField()    posted_at = models.DateTimeField()    url = models.URLField(max_length=500)

In the command line, let's install the app and model and make the migration.

(venv) reddit_bot  python manage.py makemigrationsMigrations for 'reddit':  reddit_bot/reddit/migrations/0001_initial.py    - Create model Lead(venv) reddit_bot  python manage.py migrateOperations to perform:  Apply all migrations: account, admin, auth, contenttypes, reddit, sessions, sites, socialaccount, usersRunning migrations:  Applying reddit.0001_initial... OK

Next, let's create a command line script to execute the keyword matching and save the results into the Lead model.

For this script, let's call it lead_finder.py and put it under reddit/management/commands folder.

First we need to create the two folders:

# Move to the reddit app directory(venv) reddit_bot  cd reddit_bot/reddit(venv) reddit  mkdir management(venv) reddit  mkdir management/commands# Inside reddit_bot/reddit_bot/reddit/management/commands/lead_finder.py fileimport datetime as DTimport reimport prawfrom django.conf import settingsfrom django.core.management.base import BaseCommandfrom django.utils import timezonefrom django.utils.timezone import make_awarefrom reddit_bot.reddit.models import LeadKEYWORD = "facebook"SUBREDDIT = 'marketing'reddit = praw.Reddit(    client_id=settings.REDDIT_CLIENT_ID,    client_secret=settings.REDDIT_SECRET,    user_agent=settings.REDDIT_USERAGENT,)def convert_to_ts(unix_time):    try:        ts = make_aware(DT.datetime.fromtimestamp(unix_time))        return ts    except:        print(f"Converting utc failed for {unix_time}")        return Nonedef populate_lead(keyword, subreddit):    for submission in reddit.subreddit(subreddit).hot(limit=100):        if re.search(keyword, submission.title, re.IGNORECASE):            if not Lead.objects.filter(post_id = submission.id):                Lead.objects.create(post_id=submission.id,                                    title=submission.title,                                    url=submission.permalink,                                    content=submission.selftext,                                    posted_at=convert_to_ts(submission.created_utc))class Command(BaseCommand):    help = 'Populating leads'    def handle(self, *args, **kwargs):        try:            current_time = timezone.now()            self.stdout.write(f'Populating leads at {(current_time)}')            populate_lead(KEYWORD, SUBREDDIT)        except BaseException as e:            current_time = timezone.now().strftime('%X')            self.stdout.write(self.style.ERROR(f'Populating feeds failed at {current_time} because {str(e)}'))        current_time = timezone.now()        self.stdout.write(self.style.SUCCESS(f'Successfully populated new leads at {current_time}'))        return

Some explanation:

This script has 3 parts, the convert_to_ts function converts a UNIX time to human-readable format. Reddit stored the timestamp when a post was first created in the format of a big integer.

The populate_lead uses the same logic in our last section and saves the new lead (if it has not already been saved in our table, remember, we enforced the post_id as the primary key in our Lead model definition)

Lastly, we created a Command class so that we can execute the populate_lead in a command line. There are other ways to execute a script on the command line, but this way is more of a Django style, in my opinion.

Finally, we can try to execute the script and populate some leads.

(venv) reddit_bot  python manage.py lead_finderPopulating leads at 2022-09-18 17:50:35.045981+00:00Successfully populated new leads at 2022-09-18 17:50:36.558052+00:00

Let's open up the Django console to verify the results are saved successfully.

from reddit_bot.reddit.models import Leadfor lead in Lead.objects.all():        print(f'''title: {lead.title}
posted_at:{lead.posted_at}
url: {lead.url}
''')title: Facebook ad numbers don't ad upposted_at:2022-09-17 21:14:24+00:00url: /r/marketing/comments/xgxv9c/facebook_ad_numbers_dont_ad_up/title: Facebook & Instagram Ads campaign Setupposted_at:2022-09-17 08:01:38+00:00url: /r/marketing/comments/xgghl9/facebook_instagram_ads_campaign_setup/title: Measuring the impact of Facebookposted_at:2022-09-16 20:49:14+00:00url: /r/marketing/comments/xg2kbi/measuring_the_impact_of_facebook/title: How many interests is too many Interest - Facebook Adsposted_at:2022-09-16 01:32:28+00:00url: /r/marketing/comments/xfdwsg/how_many_interests_is_too_many_interest_facebook/

Here we go. All four leads persisted successfully!

Leads report view
It's cool we can see the data in the console, but it will be easier if we can view the leads in a table from a browser.

Inside the view.py file, let's create a ListView.

#inside reddit_bot/reddit_bot/reddit/views.pyfrom django.views.generic import ListViewfrom .models import Lead# Create your views here.class LeadView(ListView):    model = Lead    template_name = 'lead_list.html'lead_view = LeadView.as_view()

We also need to create a HTML file 'lead_list.html' inside of a new directory called templates under the reddit app.

{% extends 'base.html' %}    {% block content %}        <table class="table table-striped">        <thead>        <tr>        <th scope="col">ID</th>        <th scope="col">Title</th>        <th scope="col">Posted At</th>        <th scope="col">Content</th>        </tr>        </thead>        <tbody>        {% for lead in object_list %}            <tr>            <th scope="row">{{ lead.post_id }}</th>            <td><a href="https://reddit.com{{ lead.url }}"> {{ lead.title }}</a></td>            <td>{{ lead.posted_at }}</td>            <td>{{ lead.content }}</td>            </tr>        {% endfor %}        </tbody>        </table>    {% endblock %}

Next, we need to add a URL path to access this view.

Create a new file: urls.py under the reddit app.

inside reddit_bot/reddit_bot/reddit/urls.py

from django.urls import pathfrom reddit_bot.reddit.views import lead_viewapp_name = "reddit"urlpatterns = [    path("leads/", view=lead_view, name="leads"),]

Finally, we must include the reddit app's URLs file on the project level.

Final step: check the page, open your browser, and visit: http://localhost:8000/reddit/leads/

And if you click on the title, you will be redirected to the Reddit post page, where you can engage with your target customers. How cool is that!

Schedule a cron job to check Reddit periodically.
We are almost done, and if you are like me, we like to automate our tasks; how about we schedule the job to be run automatically?

And that's super easy.

In the terminal, type crontab -e and enter.

Add the following line (you will need to edit the path of the reddit_bot Django project)

1 * * * * cd ~/reddit_bot; source venv/bin/activate; source .env; python manage.py lead_finder >/tmp/stdout.log 2>/tmp/stderr.log

It will run every hour at the 1 minute past that hour, for example, if now is 11:35 am, and the next time this job will run at 12:01 pm, and 13:01 pm, etc.

Of course, you can change the schedule that works best for you. You can use the following to customize your cron job.

https://crontab.guru/

Conclusion
We've gone through how to set up your own reddit keywords monitoring with python.

If you'd like to use our productized Reddit monitoring service, feel free to checkout
segue.co


Original Link: https://dev.to/leonwei/reddit-monitoring-with-python-34pb

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To