Interactive dashboard for newspaper article rankings

2021-04-19 6 minutes

Series - News Article Ranking

Warning

The dashboard only works on desktop size screens, not your phone. Sorry. If it looks off even on desktop, try zooming your web browser until the dashboard gets a three-column layout.

Some years ago¹ I worked with the problem of ranking news article teasers on a newspaper’s digital front page. There were no shortage of ideas for how to rank articles, we played around with machine learning approaches like collaborative filtering and other ways to personalize content. In the end, however, we went for a weighted sum of simpler rankers (explained below). Even for a simple approach like this, it is difficult to imagine how changing a single ranker’s parameter or weight will affect the front page’s article order. To help stakeholders get some understanding and intuition for such a system, it is hard to beat a good interactive visualization! Inspired by a colleague making something similar in React, I wanted to try out a javascript-based framework and had heard good things about Svelte. Here it is (code available here):

Brief explainer

The equation for an article’s ranker Score in the dashboard is: $\text{Score} = \text{weight}_{\text{TimeDecay}} \cdot \text{TimeDecay} + \text{weight}_{\text{CtrScore}} \cdot \text{CtrScore}$

The TimeDecay (TD) ranker is a function of an article’s time since published; the lifetime² and news_value³ editorial parameters; and the starting_value and half_life decay parameters. CtrScore⁴ (CS) is a function of article teaser clicks and article teaser impressions (more on the CtrScore methodology in the other article in this series).

If you’re thinking this is simple, you are correct⁵. The point here was not to come up with a sophisticated ranker algorithm, but rather showcase a dashboard that could help stakeholders understand:

why an article’s current rank is what it is, and
get the complete picture of the front page; an individual article’s ranker Score is of little use unless you see it compared with other “competing” articles’ ranker Scores.

Allowing stakeholders to get some intuition about how such “combinatorially complex”⁶ products work is of utmost importance for it to be successful, in my opinion. In particular for a newspaper frontpage, where the editorial considerations are at least as important as the algorithms working behind the scenes. The news desk and editorial departments getting intuition about how the product ranking works also opens the door for collaborative development of new rankers and optimization of ranker parameters.

Would I use Svelte again?

In 2024⁷, the first tool I would reach for when creating simple, interactive visualizations is Streamlit. Streamlit is extremely productive for making decent-enough interactive visualizations with a Python backend. But for this productivity you’re giving up some flexibility and control; if you want to have full control over every detail in a dashboard running in the browser you should consider a front-end framework like React, Vue or Svelte. I went with Svelte mainly because I had heard good things about it and I immediately liked its syntax and gentle learning curve compared to for example React. I also appreciated the “lightning fast” reactivity⁸ and how easy it is to make informative animations (like the flip animation between the cards in the right-most column, which makes it clearer exactly when the order of articles changes).

Interestingly, there is now some community backlash on the complexity involved in developing web apps with single-page front-end frameworks like React, Vue and Svelte. See for instance the discussions here and here related to htmx. And, I must say, I have felt this complexity pain myself. Writing a Svelte application running purely in the front-end is pretty straightforward, but if I wanted to integrate it with a backend, it is unclear to me what the best options are. I am fond of the idea of using tools like htmx, Streamlit or Shiny where you are mainly working on the backend/server-side, and then “enriching” the front-end with whatever data you’ve got. If you need some new data from somewhere, you can simply write a few lines of (backend) code to fetch it from a bucket or a database, and then providing it to the dashboard is easy peasy. In another Svelte Proof-Of-Concept app I worked on, we used PostgREST as the “backend”, where the Svelte app made user-authorized fetch requests to the PostgREST API endpoint to get the user’s latest data. That user could only access her data, as we used Postgres’ row level security policy. Even though this was neat, it took more time to build the backend this way than with e.g. Python’s fastapi or Go’s chi and it does not provide the same level of flexibility since you’re limited to Postgres SQL and Haskell⁹ for providing the functionality you need.

My dream combination for advanced dashboards would be something like Svelte for the front-end and chi for the backend in an integrated, easy-to-develop-and-deploy package¹⁰. For my next greenfield web app project, I’d probably try htmx with something like templ, as discussed in this video, which looks relatively easy and allows me to work in my preferred backend language, Go.

This post is written in January 2024, but since the dashboard was made in April 2021 I am using that as the post publish date. ↩︎
A subjective metric (set by the news desk and journalist) for how long an article is considered to be relevant, from a few hours to time-independent “evergreens”. ↩︎
A subjective metric (set by the news desk and journalist) for how important the news article is considered to be. ↩︎
CTR = Click-Through Rate. ↩︎
A simple baseline ranker like this has turned out – after extensive A/B testing on user engagement metrics – to be pretty difficult to beat, even for more complex, personalized algorithms. ↩︎
Even though each ranker is simple in their own right, the parameter space here is large, with $\prod_{i=1}^{n} p_i$ possible parameter combinations (for $n$ parameters with the $i$-th parameter having a parameter space of size $p_i$). ↩︎
This post is written in January 2024, but since the dashboard was made in April 2021 I am using that as the post publish date. ↩︎
Remember, all data “lives” in the front-end in this case, even the articles’ metadata, there are no backend requests happening. This is cheating, as for most dashboards in production you’d need to fetch up-to-date data somehow. Fetching it on page load time could lead to the same feel as this dashboard, at the expense of increased load time. ↩︎
It is kind of cool that PostgREST is written in Haskell, but I also found it limiting. Adding or changing functionality in PostgREST is pretty hard unless you’re well-versed in Haskell, which is not that common. Haskell also creates a higher threshold for community contributions. This is why I would pick pREST (very similar to PostgREST but written in Go) the next time (if there is a next time for such a tool 🤷). ↩︎
Update: a couple of weeks after writing this post, I discovered a package called golte, which looks pretty close to my “dream combination” for advanced dashboards. Curious to see if this package gets any traction. ↩︎