When I built redditreads.com a few years ago, my data processing scripts broke almost immediately thanks to changing 3rd-party APIs and my own poorly written code. It’s taken until now for me to revisit this project but redditreads.com is back and better than ever before! The new site will make it much easier to add new books each month, and the improved dataset is much easier for me to query for interesting blog-worthy morsels.
While working on the tech has been fun, I’m most excited about rolling up my sleeves and looking through the dataset. There’s so much in there that isn’t surfaced when grouping by year and subreddit, and I’d like to post those investigations on this blog.
But first I’m going to start with monthly updates: new books and subreddits. That’s an obtainable goal I think for now.
The old site had a strange build system: I rendered all the html for all possible subreddit/time-period combinations in advance. This made hosting a breeze: it was just a bunch of html files. No database, no backend.
When I pulled in the data up to 2021 it was clear this was a bad system: reddit has gotten much larger even since 2018, and the builds were taking forever. It made it hard to iterate on design, and uploading the built site was taking a significant amount of time.
So the new site is a more traditional setup: comment-level data in a sqlite database, small Flask app, fronted by nginx. Because the site is read-only and stateless, sql queries and generated html are both aggressively cached. The cache is cleared when new data is added, or when the site is redeployed.
The frontend is built with TailwindCSS.