An Interest In:
Web News this Week
- April 21, 2024
- April 20, 2024
- April 19, 2024
- April 18, 2024
- April 17, 2024
- April 16, 2024
- April 15, 2024
Reddit Monitoring with Python
Originally published at https://segue.co/blog/reddit-monitoring-python
All source code can be found here:
https://github.com/theleonwei/reddit_bot
Introduction:
Reddit is the second-most popular website in the United States, with more than 300 million unique visitors per month.
It's also one of the most trafficked sites on the internet and has become an important part of online marketing strategy for brands across industries.
This article will show you how to programmatically set up a keyword monitor service on Reddit using Python and the PRAW library.
You will also learn how to set up this web service using Django and run it on your local machine to monitor Reddit, save the leads into a database automatically.
All of the code and step-by-step instructions are based on the assumption that you are on a Mac.
Table of contents:
- Setting up the Django Project with the CookieCutter template
- Register a Reddit application and install the PRAW library
- Keyword monitoring with regular expression
- Persisting data to a Postgres database
- Leads report view
- Schedule a cron job to check Reddit periodically
Setting up the Django Project with the CookieCutter template
Cookiecutter Django is a framework for jumpstarting production-ready Django projects quickly.
To learn more, you can visit their official website on Github:
https://github.com/cookiecutter/cookiecutter-django
Step 1: install cookiecutter.
Open up your favorite terminal app (mine is iTerm2), and install the latest cookiecutter.
pip install "cookiecutter>=1.7.0"
Step 2: start a Django project.
cookiecutter https://github.com/cookiecutter/cookiecutter-django
Follow the instruction, answer the questions, and set up the project; here are my choices:
project_name [My Awesome Project]: Reddit Botproject_slug [reddit_bot]:description [Behold My Awesome Project!]: My awesome Reddit Bot Projectauthor_name [Daniel Roy Greenfeld]: Leon Wdomain_name [example.com]:email [[email protected]]:version [0.1.0]:Select open_source_license:1 - MIT2 - BSD3 - GPLv34 - Apache Software License 2.05 - Not open sourceChoose from 1, 2, 3, 4, 5 [1]: 5timezone [UTC]: US/Pacificwindows [n]:use_pycharm [n]: yuse_docker [n]:Select postgresql_version:1 - 142 - 133 - 124 - 115 - 10Choose from 1, 2, 3, 4, 5 [1]:Select cloud_provider:1 - AWS2 - GCP3 - NoneChoose from 1, 2, 3 [1]:Select mail_service:1 - Mailgun2 - Amazon SES3 - Mailjet4 - Mandrill5 - Postmark6 - Sendgrid7 - SendinBlue8 - SparkPost9 - Other SMTPChoose from 1, 2, 3, 4, 5, 6, 7, 8, 9 [1]:use_async [n]: nuse_drf [n]: nSelect frontend_pipeline:1 - None2 - Django Compressor3 - GulpChoose from 1, 2, 3 [1]:use_celery [n]: nuse_mailhog [n]: nuse_sentry [n]: nuse_whitenoise [n]: nuse_heroku [n]: ySelect ci_tool:1 - None2 - Travis3 - Gitlab4 - GithubChoose from 1, 2, 3, 4 [1]:keep_local_envs_in_vcs [y]: ndebug [n]: n [SUCCESS]: Project initialized, keep up the good work!
Step 3: Install dependencies
cd reddit_bot; ls
Those are the files and directories that we have got so far.
Procfile locale reddit_bot setup.cfgREADME.md manage.py requirements utilityconfig merge_production_dotenvs_in_dotenv.py requirements.txtdocs pytest.ini runtime.txt
Step 3.1: Create a virtual environment.
reddit_bot python3 -m venv ./venv
After, you should see a newly created venv folder.
Procfile locale reddit_bot setup.cfgREADME.md manage.py requirements utilityconfig merge_production_dotenvs_in_dotenv.py requirements.txt venvdocs pytest.ini runtime.txt
Step 3.2: Activate the virtual environment and install all dependencies.
source venv/bin/activate
Notice there is a (venv) prompt, which means you have successfully activated the virtual environment.
Note: if you have not installed a Postgres database on your Mac, you must install it first.
Check this article on Postgres installation on Mac for more details.
Install the dependencies from the local.txt files (slightly different than production requirements as it gives you more tools for debugging and testing.)
(venv) reddit_bot (venv) reddit_bot pip install -r requirements/local.txt
Step 3.3: Create a local database reddit_bot
(venv) reddit_bot createdb reddit_bot
After that, start the Django web server for testing.
(venv) reddit_bot python manage.py runserver
You will probably see something like the following.
Watching for file changes with StatReloaderINFO 2022-09-17 11:28:04,131 autoreload 17789 4335895936 Watching for file changes with StatReloaderPerforming system checks...System check identified no issues (0 silenced).You have 28 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): account, admin, auth, contenttypes, sessions, sites, socialaccount, users.Run 'python manage.py migrate' to apply them.September 17, 2022 - 11:28:04Django version 3.2.15, using settings 'config.settings.local'Starting development server at http://127.0.0.1:8000/Quit the server with CONTROL-C.[17/Sep/2022 11:28:16] "GET / HTTP/1.1" 200 13541[17/Sep/2022 11:28:16] "GET /static/debug_toolbar/css/toolbar.css HTTP/1.1" 200 11815[17/Sep/2022 11:28:16] "GET /static/css/project.css HTTP/1.1" 200 228[17/Sep/2022 11:28:16] "GET /static/debug_toolbar/css/print.css HTTP/1.1" 200 43[17/Sep/2022 11:28:16] "GET /static/debug_toolbar/js/toolbar.js HTTP/1.1" 200 12528[17/Sep/2022 11:28:16] "GET /static/js/project.js HTTP/1.1" 200 45[17/Sep/2022 11:28:16] "GET /static/debug_toolbar/js/utils.js HTTP/1.1" 200 4479[17/Sep/2022 11:28:16] "GET /static/images/favicons/favicon.ico HTTP/1.1" 200 8348
Since the database is brand new, we need to initialize it with some built-in Django tables.
(venv) reddit_bot python manage.py migrate
Once the migration is completed, restart the server.
(venv) reddit_bot python manage.py migrate
open your favorite browser (I am using the latest Chrome) and visit localhost:8000 make sure you see the website succesfully.
Congratulations on finishing setting up your local Django server; next, let's set up our Reddit account and install the Reddit API library: PRAW.
Register a Reddit application and install the PRAW library
We assume you already have a Reddit account to set up a Reddit developer account. If not, simply visit https://reddit.com and create one, then come back.
You must first register an application of the appropriate type on Reddit.
Then Visit https://www.reddit.com/prefs/apps/
Note: sometimes, I am having some page redirect issues when visiting the above page; if that happens to you, try visiting the following instead:
https://old.reddit.com/prefs/apps/
Scroll down and click the create another app... button
reddit monitoring tutorial | register application
For the name of your app, anything should be fine
On the app type: since we will be building something running in the backend, choose script.
And for redirect uri, enter http://localhost:8000
Then click the create app button to finish this step.
reddit monitoring app client secret | segue.co
There are two tokens that we will need for our service to run:
client id: the one beneath personal use script
client secret: the one to the right of secret
Finally, we need to install the PRAW library and try to connect with Reddit using the secret keys from the last step.
(venv) reddit_bot pip install praw
Next, we append PRAW to the dependency (otherwise, the service won't work when we deploy to production:
(venv) reddit_bot pip freeze | grep praw >> requirements/base.txt
It's a common best practice not to save sensitive information such as your app secret tokens in the git repository, so let's create a new file .env at the root of your project so we can access the app secret on the local machine.
(venv) reddit_bot vim .env
Replace the secret and the client_id with yours from the last step.
For the user agent, it's not that important. To find out exactly what your user agent is, simply go to google and type "find my user agent" and copy and paste yours into the .env file.
Next, make sure you add .env into the list of files that will not be checked into the git repository by editing the .gitignore file.
(venv) reddit_bot vim .gitignore
And add .env to the file; after that, run the following to load the environment variables.
Test your Reddit connection. In your terminal, run the following:
(venv) reddit_bot source .env
We also need to load the variables into Django's environment variables.
Open up the config/settings/local.py file (I am using Pycharm) and add the following:
# Reddit settings:REDDIT_SECRET=env('SECRET')REDDIT_CLIENT_ID=env('CLIENT_ID')REDDIT_USERAGENT=env('REDDIT_USERAGENT')
Reddit Monitoring Django Load Reddit App Secret | segue.co
In the terminal
Launch the Django console
(venv) reddit_bot python manage.py shellIn [1]: from django.conf import settingsIn [2]: import prawIn [3]: reddit = praw.Reddit( ...: client_id=settings.REDDIT_CLIENT_ID, ...: client_secret=settings.REDDIT_SECRET, ...: user_agent=settings.REDDIT_USERAGENT, ...: )
If you don't receive any error message, it means you have successfully created a Reddit instance through PRAW.
Next, let's run a simple task to check the connection.
In [4]: for submission in reddit.subreddit("marketing").hot(limit=10): ...: print(submission.title) ...:New Job ListingsSorry if this isn't allowed, but I recently created a subreddit focused on the business side of art, and would love for people to go there and share their knowledge and experiences.I read privacy and policies of Tiktok, IG and Other Platforms. Heres what I learned about Social Media Platforms!Has anyone actually worked with an impressive agency?How to bring first customers to shop?Facebook ad numbers don't ad upBeginning my career in marketing, looking to go in to an AgencyI hate digital marketing - help me find a new roleWhere to start for ecom store?Making a website
If you see something similar to the above, congratulations, you've successfully connected with Reddit's official API and retrieved the top 10 hot posts from r/marketing, congratulations!
Keyword monitoring with regular expression
Assuming you run a Facebook ad agency and your target customers are new to Facebook ads, wouldn't it be nice if you could respond to someone who has questions about Facebook ads on Reddit?
Chiming in and joining a conversation on Reddit will help:
Establish your reputation as a Facebook ads expert;
Spread the word about your service to the world's largest online community and drive quality traffic to your website;
If you get enough votes, your response may become a backlink to improve your SEO
For this article, we will show you how to find any posts whose title contains the phrase 'facebook'.
Of course, you can continue optimizing this matching rule and develop your own solution.
For a more advanced matching algorithm, feel free to check out Segue's Reddit lead generation engine based on a state-of-the-art NLP semantics search.
Again, we first open up a Django console.
(venv) reddit_bot python manage.py shell
Inside of the Django console:
import prawfrom django.conf import settingsimport re # new, the python regular expression librarykeyword = "facebook"reddit = praw.Reddit( client_id=settings.REDDIT_CLIENT_ID, client_secret=settings.REDDIT_SECRET, user_agent=settings.REDDIT_USERAGENT,)for submission in reddit.subreddit("marketing").hot(limit=100): # we are searching 100 hottest posts on the marketing subreddit if re.search(keyword, submission.title, re.IGNORECASE): # new, notice we are ignoring the case sensitivity print(submission.title)
Your results may differ from mine (as we ran this search on September 18th, 2022).
In [2]: for submission in reddit.subreddit("marketing").hot(limit=100): ...: if re.search(keyword, submission.title, re.IGNORECASE): # new, notice we are ignoring the case sensitivity ...: print(submission.title) ...:Facebook ad numbers don't ad upFacebook & Instagram Ads campaign SetupMeasuring the impact of FacebookHow many interests is too many Interest - Facebook AdsWe've found four discussions about Facebook without too much work. How awesome is it!Let's save those results and other metadata such as URLs, post date, and content into a database so we don't lose them.
Persisting data to a Postgres database.
Let's first create a new app
# In the root dir of your project(venv) reddit_bot django-admin startapp reddit
Note: we also need to move the newly created app to the reddit_bot subdirectory. This step is important due to the way cookie-cutter structured our project.
mv reddit ./reddit_bot
Next, let's update the apps.py config file.
The default name is "reddit", we need to update it to "reddit_bot.reddit" since we have moved it from the root directory to the subdirectory.
And don't forget to include this app in the base.py config file.
Now let's open up the models.py file and add our first class.
class Lead(models.Model): post_id = models.CharField(max_length=10) # Original post id title = models.TextField() content = models.TextField() posted_at = models.DateTimeField() url = models.URLField(max_length=500)
In the command line, let's install the app and model and make the migration.
(venv) reddit_bot python manage.py makemigrationsMigrations for 'reddit': reddit_bot/reddit/migrations/0001_initial.py - Create model Lead(venv) reddit_bot python manage.py migrateOperations to perform: Apply all migrations: account, admin, auth, contenttypes, reddit, sessions, sites, socialaccount, usersRunning migrations: Applying reddit.0001_initial... OK
Next, let's create a command line script to execute the keyword matching and save the results into the Lead model.
For this script, let's call it lead_finder.py and put it under reddit/management/commands folder.
First we need to create the two folders:
# Move to the reddit app directory(venv) reddit_bot cd reddit_bot/reddit(venv) reddit mkdir management(venv) reddit mkdir management/commands# Inside reddit_bot/reddit_bot/reddit/management/commands/lead_finder.py fileimport datetime as DTimport reimport prawfrom django.conf import settingsfrom django.core.management.base import BaseCommandfrom django.utils import timezonefrom django.utils.timezone import make_awarefrom reddit_bot.reddit.models import LeadKEYWORD = "facebook"SUBREDDIT = 'marketing'reddit = praw.Reddit( client_id=settings.REDDIT_CLIENT_ID, client_secret=settings.REDDIT_SECRET, user_agent=settings.REDDIT_USERAGENT,)def convert_to_ts(unix_time): try: ts = make_aware(DT.datetime.fromtimestamp(unix_time)) return ts except: print(f"Converting utc failed for {unix_time}") return Nonedef populate_lead(keyword, subreddit): for submission in reddit.subreddit(subreddit).hot(limit=100): if re.search(keyword, submission.title, re.IGNORECASE): if not Lead.objects.filter(post_id = submission.id): Lead.objects.create(post_id=submission.id, title=submission.title, url=submission.permalink, content=submission.selftext, posted_at=convert_to_ts(submission.created_utc))class Command(BaseCommand): help = 'Populating leads' def handle(self, *args, **kwargs): try: current_time = timezone.now() self.stdout.write(f'Populating leads at {(current_time)}') populate_lead(KEYWORD, SUBREDDIT) except BaseException as e: current_time = timezone.now().strftime('%X') self.stdout.write(self.style.ERROR(f'Populating feeds failed at {current_time} because {str(e)}')) current_time = timezone.now() self.stdout.write(self.style.SUCCESS(f'Successfully populated new leads at {current_time}')) return
Some explanation:
This script has 3 parts, the convert_to_ts function converts a UNIX time to human-readable format. Reddit stored the timestamp when a post was first created in the format of a big integer.
The populate_lead uses the same logic in our last section and saves the new lead (if it has not already been saved in our table, remember, we enforced the post_id as the primary key in our Lead model definition)
Lastly, we created a Command class so that we can execute the populate_lead in a command line. There are other ways to execute a script on the command line, but this way is more of a Django style, in my opinion.
Finally, we can try to execute the script and populate some leads.
(venv) reddit_bot python manage.py lead_finderPopulating leads at 2022-09-18 17:50:35.045981+00:00Successfully populated new leads at 2022-09-18 17:50:36.558052+00:00
Let's open up the Django console to verify the results are saved successfully.
from reddit_bot.reddit.models import Leadfor lead in Lead.objects.all(): print(f'''title: {lead.title}
posted_at:{lead.posted_at}
url: {lead.url}
''')title: Facebook ad numbers don't ad upposted_at:2022-09-17 21:14:24+00:00url: /r/marketing/comments/xgxv9c/facebook_ad_numbers_dont_ad_up/title: Facebook & Instagram Ads campaign Setupposted_at:2022-09-17 08:01:38+00:00url: /r/marketing/comments/xgghl9/facebook_instagram_ads_campaign_setup/title: Measuring the impact of Facebookposted_at:2022-09-16 20:49:14+00:00url: /r/marketing/comments/xg2kbi/measuring_the_impact_of_facebook/title: How many interests is too many Interest - Facebook Adsposted_at:2022-09-16 01:32:28+00:00url: /r/marketing/comments/xfdwsg/how_many_interests_is_too_many_interest_facebook/
Here we go. All four leads persisted successfully!
Leads report view
It's cool we can see the data in the console, but it will be easier if we can view the leads in a table from a browser.
Inside the view.py file, let's create a ListView.
#inside reddit_bot/reddit_bot/reddit/views.pyfrom django.views.generic import ListViewfrom .models import Lead# Create your views here.class LeadView(ListView): model = Lead template_name = 'lead_list.html'lead_view = LeadView.as_view()
We also need to create a HTML file 'lead_list.html' inside of a new directory called templates under the reddit app.
{% extends 'base.html' %} {% block content %} <table class="table table-striped"> <thead> <tr> <th scope="col">ID</th> <th scope="col">Title</th> <th scope="col">Posted At</th> <th scope="col">Content</th> </tr> </thead> <tbody> {% for lead in object_list %} <tr> <th scope="row">{{ lead.post_id }}</th> <td><a href="https://reddit.com{{ lead.url }}"> {{ lead.title }}</a></td> <td>{{ lead.posted_at }}</td> <td>{{ lead.content }}</td> </tr> {% endfor %} </tbody> </table> {% endblock %}
Next, we need to add a URL path to access this view.
Create a new file: urls.py under the reddit app.
inside reddit_bot/reddit_bot/reddit/urls.py
from django.urls import pathfrom reddit_bot.reddit.views import lead_viewapp_name = "reddit"urlpatterns = [ path("leads/", view=lead_view, name="leads"),]
Finally, we must include the reddit app's URLs file on the project level.
Final step: check the page, open your browser, and visit: http://localhost:8000/reddit/leads/
And if you click on the title, you will be redirected to the Reddit post page, where you can engage with your target customers. How cool is that!
Schedule a cron job to check Reddit periodically.
We are almost done, and if you are like me, we like to automate our tasks; how about we schedule the job to be run automatically?
And that's super easy.
In the terminal, type crontab -e and enter.
Add the following line (you will need to edit the path of the reddit_bot Django project)
1 * * * * cd ~/reddit_bot; source venv/bin/activate; source .env; python manage.py lead_finder >/tmp/stdout.log 2>/tmp/stderr.log
It will run every hour at the 1 minute past that hour, for example, if now is 11:35 am, and the next time this job will run at 12:01 pm, and 13:01 pm, etc.
Of course, you can change the schedule that works best for you. You can use the following to customize your cron job.
Conclusion
We've gone through how to set up your own reddit keywords monitoring with python.
If you'd like to use our productized Reddit monitoring service, feel free to checkout
segue.co
Original Link: https://dev.to/leonwei/reddit-monitoring-with-python-34pb
Dev To
An online community for sharing and discovering great ideas, having debates, and making friendsMore About this Source Visit Dev To