Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
October 30, 2020 12:46 pm GMT

My most ambitious dev project - Automated newsfeed in 3 days

I spend hours browsing through news & articles to find some quality content around remote work. Mostly, I end up knowing nothing new. If I am lucky, I stumble upon a great piece or a "breaking news" item.

So I built something simple but crazy interesting - an automated feed that gives you the best bite-sized remote work content.

I have launched it on Product Hunt today and would really love your support there :)

In this article, I will delve into the specifics of how I built the newsfeed. For more details on the problem & solution, check out this Twitter thread:

Architecture

Alt Text

Content Fetch & Update

This component primarily interfaces with the Twitter API. The content fetcher is periodically invoked by a set of CRON jobs to fetch & update content.

For fetching new content,

1) Twitter Search API (advanced search)

  • List of terms like 'remote work', 'work from home' etc.
  • Twitter offers its own ranking of top tweets configured by using result_type: "popular".

2) Twitter Timeline API (user timeline)

  • List of user handles pre-populated in DB.
  • Store the last tweet fetched for every user handle & fetch new tweets only after that.

For updating content, loop through all tweets stored in DB over the last 4 days and call the Twitter update API.

Content Processing & Storage

1) Automated categorisation

  • Top tweets shown on the feed need to be categorised so that it provides a smooth flow of reading

  • The first level of processing that happens on a tweet is to categorise it basis:

    • Keywords in the tweet content (e.g. if it has 'remote job', it is probably a job tweet).
    • External URLs (possibly a blog/ articles shared).
    • Tweets from a list of handles (e.g. if it is from NY Post, it is a news item).

2) Calculate score before saving to DB

  • Every tweet stored in the DB has a score attached to it.
  • Before every save, score is refreshed.
  • Score is computed basis a combination of likes, replies & retweets for the tweet.

Cron Jobs

The content fetch modules are triggered at a regular frequency through a set of cron jobs. These are made independent & spaced out to ensure Twitter limits are not breached now or in future.

Following are the cron jobs & their frequency

  • Timeline API - 20 min of every hour
  • Search API - 40 min of every hour
  • Update meta (# of likes, retweets etc.) for each tweet - 60 min of every hour
  • Deleting past tweets - weekly

Display

Display module interfaces with the DB but has no say in deciding what will be shown in the feed. Top 10 tweets basis score are fetched from the DB directly and displayed.

Tweets are displayed in the feed first categorised by day and then by category (e.g. opinions, articles, news etc.). To display tweets, wanted to naturally use Twitter's own embed code but it resulted in high page load time.

Problem: Twitter embed script is super heavy and it takes good 10-15s for the page to load if there are > 7-8 tweets.

Solution:

  • Had to write custom CSS replicating the tweet UI
  • Embed code is fetched from Twitter API but I explicitly removed the script & unneeded meta data.

There's also a subscribe box (for daily newsletter) at the top and a countdown (written in vanilla JS) that shows when the next update is coming.

Integration with Community

Aim:

  • Make it super easy for users to start or join a discussion on the nugget they like
  • Browse through the top posts from Remote Clan

Solution:

  • Sticky right sidebar with top posts from Remote Clan (mention twitter hack)
  • Start a discussion - 1-click button to create an automated post on Remote Clan.
  • Join discussion - If there's already a post, show the linked post instead.
  • Join the community & # of members online CTAs

Alt Text

Admin panel

Automated feed is the default but manual override needed to clean & rearrange content.

When you are in hurry,

  • Queueing system to mark tweets invisible from feed. When you mark a tweet invisible, a fresh candidate shows up.

Alt Text

When there's plenty of time for curation,

  • Manually look through all tweets and override the algorithm. Explicitly force a tweet to show up in the feed.

Alt Text

Future updates

  • Infinite scroll
  • Improve accuracy of curation algorithm
  • Include other sources of content (HN, Reddit etc.)

If you liked what you read, would love your support on Product Hunt :)


Original Link: https://dev.to/hrishikesh1990/my-most-ambitious-dev-project-automated-newsfeed-in-3-days-2e62

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To