An Interest In:
Web News this Week
- April 25, 2024
- April 24, 2024
- April 23, 2024
- April 22, 2024
- April 21, 2024
- April 20, 2024
- April 19, 2024
Diving into Dev's Relevancy Feed Builder
SQL Composition by Configuration
In we added an alternate mechanism for building the relevancy feed for DEV and other instances of the Forem code base. After a bit of configuration, that became the default means of producing the relevancy feed.
Between then and now, weve been experimenting with the levers of the relevancy feed to try to bring a more relevant feed experience.
On , we merged a significant refinement to how we build the relevancy feed. By design, this new mechanism produces the same query as the prior mechanism.
This new approach, allows us to more easily experiment with the relevancy feed; and capture, record, analyze, and annotate the results of those experiments.
And heres the best part: because this is open source, you can see what experiments were running and how we configure those levers.
Data Modeling Forems Feed Builder
Below is a rough sketch of the ERD I would use were we putting all of this in the database.
PlantUML Textual Version of ERD
@startuml!theme amigaentity experiment {* id--* label* start date* end date}entity variant_experiment {* experiment_id--* variant_id* probability of using this variant}entity variant {* id--* label* order_by_lever_id}entity variant_relevancy_lever {* variant_id* relevancy_lever_id--* config (e.g. relevancy score range)}entity order_by_lever {* id--* order_by_fragment}entity relevancy_lever {* id--* label* user_required* select_fragment* joins_fragments* group_by_fragment}experiment |||{ variant_experimentvariant ||o{ variant_experimentrelevancy_lever ||o{ variant_relevancy_levervariant ||o{ variant_relevancy_leverorder_by_lever ||o{ variant@enduml
We are not putting this in our database for a few reasons:
- The database is more opaque.
- The database requires more consideration regarding synchronization across Forem installations.
Instead, were putting the above information in our application. This allows us to test each experiment and variant.
It also means that how we build the relevancy feed is transparent. Building from the above diagram we have 6 conceptual entities:
experiments
experiment_variants
variants
order_by_levers
variant_relevancy_levers
relevancy_levers
The experiments
and experiment_variants
are defined in the config/field_test.yml
. The variants
are defined in config/feed-variants
; each named variant will have a corresponding JSON file in the config/feed-variants
directory. Each variant in the config/field-variants
directory defines the variant_relevancy_levers
and order_by_lever
used; see config/feed-variants/original.json
for an example. The relevancy_levers
and order_by_levers
are defined in the Articles::Feeds
module.
Assembling Forems Relevancy Feed
Below is a conceptual sequence diagram of how we build the relevancy feed:
PlantUML Textual Version of Conceptual Feed Query Builder Sequence
@startuml!theme amigaparticipant GET relevancy feed as GetRequestparticipant AbExperiment.get as AbExperimentparticipant Articles::Feeds::
VariantQuery.build_for as VariantQueryparticipant Articles::Feeds::
VariantAssembler.call as Assemblerparticipant /config/feed-variants/*.json as VariantConfigparticipant Articles::Feeds
.lever_catalog as LeverCatalogGetRequest > AbExperiment : with :userGetRequest < AbExperiment : :variantGetRequest > VariantQuery : with :user, :variantVariantQuery > Assembler : with :variantAssembler > VariantConfig : with :variantAssembler < VariantConfig : :variant_configAssembler > LeverCatalog : with :variant_configAssembler < LeverCatalog : :query_configVariantQuery < Assembler : :query_configVariantQuery > Article : with :user, :query_configGetRequest < Article : Article::ActiveRecord::Relation@enduml
When you come to the DEV homepage, we retrieve the your randomly assigned variant. With that variant, we assemble the query configuration and then perform the query to return the articles you see in your relevancy feed.
Show Me the SQL Already
Below is the evaluated SQL of the 2022-04-15 incumbent feed variant. And for those curious, the challenger will join the fray on April 25th; once we merge Create new feed-variant 20220422. You can also checkout the inline documentation on configuring the VariantQuery.
And now a wall of SQL
PosgreSQL Select Statement
SELECT "articles"."path", "articles"."title", "articles"."id", "articles"."published", "articles"."comments_count", "articles"."public_reactions_count", "articles"."cached_tag_list", "articles"."main_image", "articles"."main_image_background_hex_color", "articles"."updated_at", "articles"."slug", "articles"."video", "articles"."user_id", "articles"."organization_id", "articles"."video_source_url", "articles"."video_code", "articles"."video_thumbnail_url", "articles"."video_closed_caption_track_url", "articles"."experience_level_rating", "articles"."experience_level_rating_distribution", "articles"."cached_user", "articles"."cached_organization", "articles"."published_at", "articles"."crossposted_at", "articles"."description", "articles"."reading_time", "articles"."video_duration_in_seconds", "articles"."last_comment_at"FROM "articles" INNER JOIN ( SELECT articles.id, ( ( CASE ( current_date - articles.published_at :: date ) WHEN 0 THEN 1.0 WHEN 1 THEN 0.99 WHEN 2 THEN 0.985 WHEN 3 THEN 0.98 WHEN 4 THEN 0.975 WHEN 5 THEN 0.97 WHEN 6 THEN 0.965 WHEN 7 THEN 0.96 WHEN 8 THEN 0.955 WHEN 9 THEN 0.95 WHEN 10 THEN 0.945 WHEN 11 THEN 0.94 WHEN 12 THEN 0.935 WHEN 13 THEN 0.93 WHEN 14 THEN 0.925 ELSE 0.9 END ) * ( CASE COUNT(comments_by_followed.id) WHEN 0 THEN 0.95 WHEN 1 THEN 0.98 WHEN 2 THEN 0.99 ELSE 0.93 END ) * ( CASE articles.comments_count WHEN 0 THEN 0.8 WHEN 1 THEN 0.82 WHEN 2 THEN 0.84 WHEN 3 THEN 0.86 WHEN 4 THEN 0.88 WHEN 5 THEN 0.9 WHEN 6 THEN 0.92 WHEN 7 THEN 0.94 WHEN 8 THEN 0.96 WHEN 9 THEN 0.98 ELSE 1.0 END ) * ( CASE ( CASE articles.featured WHEN true THEN 1 ELSE 0 END ) WHEN 1 THEN 1.0 ELSE 0.85 END ) * ( CASE COUNT(followed_user.follower_id) WHEN 0 THEN 0.8 WHEN 1 THEN 1.0 ELSE 1.0 END ) * ( CASE COUNT(followed_org.follower_id) WHEN 0 THEN 0.95 WHEN 1 THEN 1.0 ELSE 1.0 END ) * ( CASE ( current_date - MAX(comments.created_at):: date ) WHEN 0 THEN 1.0 WHEN 1 THEN 0.9988 ELSE 0.988 END ) * ( CASE LEAST( 10.0, SUM(followed_tags.points) ):: integer WHEN 0 THEN 0.7 WHEN 1 THEN 0.7303 WHEN 2 THEN 0.7606 WHEN 3 THEN 0.7909 WHEN 4 THEN 0.8212 WHEN 5 THEN 0.8515 WHEN 6 THEN 0.8818 WHEN 7 THEN 0.9121 WHEN 8 THEN 0.9424 WHEN 9 THEN 0.9727 ELSE 1.0 END ) * ( CASE ( CASE WHEN articles.privileged_users_reaction_points_sum < -10 THEN -1 WHEN articles.privileged_users_reaction_points_sum > 10 THEN 1 ELSE 0 END ) WHEN -1 THEN 0.2 WHEN 1 THEN 1.0 ELSE 0.95 END ) * ( CASE articles.public_reactions_count WHEN 0 THEN 0.9988 WHEN 1 THEN 0.9988 WHEN 2 THEN 0.9988 WHEN 3 THEN 0.9988 ELSE 1.0 END ) ) as relevancy_score FROM articles LEFT OUTER JOIN user_blocks ON user_blocks.blocked_id = articles.user_id AND user_blocks.blocked_id IS NULL AND user_blocks.blocker_id = : user_id LEFT OUTER JOIN follows AS followed_user ON articles.user_id = followed_user.followable_id AND followed_user.followable_type = 'User' AND followed_user.follower_id = : user_id AND followed_user.follower_type = 'User' LEFT OUTER JOIN comments AS comments_by_followed ON comments_by_followed.commentable_id = articles.id AND comments_by_followed.commentable_type = 'Article' AND followed_user.followable_id = comments_by_followed.user_id AND followed_user.followable_type = 'User' AND comments_by_followed.deleted = false AND comments_by_followed.created_at > '2022-04-08 12:22:04.501182' LEFT OUTER JOIN follows AS followed_org ON articles.organization_id = followed_org.followable_id AND followed_org.followable_type = 'Organization' AND followed_org.follower_id = : user_id AND followed_org.follower_type = 'User' LEFT OUTER JOIN comments ON comments.commentable_id = articles.id AND comments.commentable_type = 'Article' AND comments.deleted = false AND comments.created_at > '2022-04-08 12:22:04.501182' LEFT OUTER JOIN taggings ON taggings.taggable_id = articles.id AND taggable_type = 'Article' INNER JOIN tags ON taggings.tag_id = tags.id LEFT OUTER JOIN follows AS followed_tags ON tags.id = followed_tags.followable_id AND followed_tags.followable_type = 'ActsAsTaggableOn::Tag' AND followed_tags.follower_type = 'User' AND followed_tags.follower_id = : user_id AND followed_tags.explicit_points >= 0 WHERE articles.published = true AND articles.published_at > '2022-04-08 12:22:04.501182' AND articles.published_at < '2022-04-23 12:22:04.501592' GROUP BY articles.id, articles.published_at, articles.comments_count, articles.featured, articles.privileged_users_reaction_points_sum, articles.public_reactions_count ORDER BY relevancy_score DESC, articles.published_at DESC LIMIT 50 ) AS article_relevancies ON articles.id = article_relevancies.idORDER BY article_relevancies.relevancy_score DESC, articles.published_at DESC
I generated the above query by adding the following to our Rspec tests of each of the variant configs suite:
File.open(Rails.root.join("tmp/#{variant}.sql").to_s, "w+") do |file| file.puts query_call.to_sqlend
I also removed the test specific user id, replacing it with :user_id
, and tidied up the SQL.
Original Link: https://dev.to/devteam/diving-into-devs-relevancy-feed-builder-30m6
Dev To
An online community for sharing and discovering great ideas, having debates, and making friendsMore About this Source Visit Dev To