RSS Rebuild
If you want to test out the current implementation you can check it out here here
Preamble Ramble
This one has been a long time in the making. Many, many moons ago when I was first starting creating "proper" coding projects I built an RSS reader. I had been making do with several disparate methods of content subscription for quite some time, and with Google Reader on the way out I was looking to move to a fresh all in one solution. In my researching I stumbled upon Inoreader which was brand new to the scene, and for some weeks I made do before becoming increasingly irritated with its advertising and lack of speed.
In parallel I had been building increasingly complex (to me at the time) projects and so I decided "how hard could it be" to create an RSS reader to use and dove in. It is worth noting that my experience up until this point had been almost entirely self contained terminal programs. I had created websites and dabbled with CSS while in school and had (I thought) a general grasp of how the parts of a web-based application would fit together, and so I researched and I coded and I put together a python, js, php monstrosity that was functional enough that I have been using it to this day.
Over time the flaws and sharp edges have become second nature and smoothed over. For example there is one particular podcast that cannot be marked as read by individual episode due to an encoding issue (easily fixable but I've lived with it long enough that it's kind of endearing) but can be marked as read as part of another group. Every few months it will fall over and require a restart (I still don't know why). Finally the UI is functional(ish), but not pretty, and the mobile version was bolted on as an afterthought as it was always intended for PC only.
All this being said it has seen daily use with over 120 feeds subscribed to and I've repeatedly rationalised rebuilding it as unnecessary and akin to killing off a favoured pet for a new puppy. All things must come to an end however and with a burgeoning interest in Rust I've decided now is the time. The plan (which may not survive) is to write a blisteringly fast back-end in Rust with a proper API, before moving on to build a web-based front end putting all the CSS and design I've learnt over the years. Then finally if that all goes well I may dip my toes back into Kotlin and build a proper application for mobile (but don't hold your breath).
Get On With It Already
Jumping straight in lets get an environment sorted for rust.
cargo new rss && cd rss
We're going to use https://github.com/rust-syndication/rss at least for now, so let's see if we can get a basic example of the ground in src/main.py
. Pasting the code from Github we get:
use std::error::Error;
use rss::Channel;
async fn example_feed() -> Result<Channel, Box<dyn Error>> {
let content = reqwest::get("http://example.com/feed.xml")
.await?
.bytes()
.await?;
let channel = Channel::read_from(&content[..])?;
Ok(channel)
}
So to get this cooking we're going to need a few things. To start with we're going to need rss adding to our dependencies in Cargo.toml
along with reqwest. Now as this example is using an asynchronous function I'm going to have to brush up on exactly how that works with Rust, as up until now my asynchronous coding had been non-Rust. It seems that Tokio is one of the more popular runtimes to use so we're going to go ahead and add that too.
[package]
name = "rss"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
rss = "2.0"
reqwest = "0.11.9"
tokio = {version = "1", features = ["full"] }
Secondly we're going to need to flesh out the bare bones example to get it to run. We're also going to replace the feed with one from 99 Percent Invisible which I'm choosing as from what I remember it is one of the largest and is well formatted (hopefully).
Let's go ahead and add a main function with a runtime and call our example_feed function
use std::error::Error;
use rss::Channel;
async fn example_feed() -> Result<Channel, Box<dyn Error>> {
let content = reqwest::get("https://feeds.99percentinvisible.org/99percentinvisible")
.await?
.bytes()
.await?;
let channel = Channel::read_from(&content[..])?;
Ok(channel)
}
fn main() {
let rt = tokio::runtime::Runtime::new().unwrap();
let channel = rt.block_on(example_feed()).unwrap();
println!("{}",channel.title());
}
Lets cargo run
It was at this point that I realise I did not have libssl-dev
installed, but that was quickly fixed, and running again I was able to get my code to run after waiting for all the crates to build and compile.
Finally we get the expected output "99% Invisible" showing that the RSS feed has been read and parsed correctly.
Let's extend upon this by checking all of our RSS feeds. To do this I've added them all to a text file placed at rss/RSS_feeds.txt
, one feed per line. I've then edited the code to add some print statements for any feeds that are not working, and updated the test function to accept a feed as input.
use rss::Channel;
use std::error::Error;
async fn example_feed(feed: &str) -> Result<Channel, Box<dyn Error>> {
let content = reqwest::get(feed).await?.bytes().await?;
let channel = Channel::read_from(&content[..])?;
Ok(channel)
}
fn main() {
let rt = tokio::runtime::Runtime::new().unwrap();
let feeds = include_str!("../RSS_feeds.txt").lines();
for feed in feeds {
let channel = rt.block_on(example_feed(feed));
match channel {
Ok(channel) => println!("{}", channel.title()),
Err(error) => println!("{} - {}", feed, error),
}
}
}
Running this we can see quite a lot of errors, a lot more than I'd hoped. Checking more closely they all appear to be of the form "the input did not begin with an rss tag", and taking a look at the feeds themselves they are all atom feeds. This was not an issue for the previous iteration of this RSS Reader as the parser I was using handled atom feeds, but checking the documentation I'm stuck with RSS feeds only.
As a side effect of looking through the feeds I also noticed I dumped the entire table of feeds, many of which were marked as inactive, either because whatever they were tracking finished or I got bored of it. I went back through and made sure to find all feeds that were retired, and also retired a few more which were no longer in use, bringing the total down to under 100. Then having updated my list of RSS feeds I went in search of something to reed atom feeds and found feed_rs.
Documentation for feed_rs was.... lacking, but eventually I got something that was approximating my previous code.
use std::error::Error;
use feed_rs::parser;
async fn example_feed(feed: &str) -> Result<feed_rs::model::Feed, Box<dyn Error>> {
let content = reqwest::get(feed)
.await?
.bytes()
.await?;
let channel = parser::parse(&content[..])?;
Ok(channel)
}
fn main() {
let rt = tokio::runtime::Runtime::new().unwrap();
let feeds = include_str!("../RSS_feeds.txt").lines();
for feed in feeds {
let channel = rt.block_on(example_feed(feed));
match channel {
Ok(channel) => println!("{}",channel.title.unwrap().content),
Err(error) => println!("{} - {}",feed,error),
}
}
}
Running this code all of the errors I had previously been experiencing. Well all except one "http://www.cgpgrey.com/blog?format=rss - unable to parse feed: no root element". Checking this feed one thing stood out when compared to the other feeds which is that there was a comment:
If I Save the RSS feed however I'm able to parse without any issues, so it doesn't seem like the comment is the issue. I go back to my rust code and add in dbg!(&content);
and it's here I hit upon the issue, the response is a 404 and what looks like some mangled CSS. This could be something to do with squarespace as my current machine is making calls over a VPN, but a quick test with curl and wget prove this is not the case. It seems specific to something reqwests is doing, but before I look into an alternative I want to double check it's not an issue with cookies/headers or similar.
use feed_rs::parser;
use std::error::Error;
use reqwest::header::USER_AGENT;
async fn example_feed(feed: &str) -> Result<feed_rs::model::Feed, Box<dyn Error>> {
let client = reqwest::Client::new();
let content = client.get(feed).header(USER_AGENT, "My Stinky Butt").send().await?.bytes().await?;
let channel = parser::parse(&content[..])?;
Ok(channel)
}
fn main() {
let rt = tokio::runtime::Runtime::new().unwrap();
let feeds = include_str!("../RSS_feeds.txt").lines();
for feed in feeds {
let channel = rt.block_on(example_feed(feed));
match channel {
Ok(channel) => println!("{}", channel.title.unwrap().content),
Err(error) => println!("{} - {}", feed, error),
}
}
}
And like magic, no further issues. Lack of "proper" fingerprint is often an issue when web scraping or using curl, websites will block specific user agents and all sorts of things, but this is the first case I've come across where lack of a user agent fails, but curl's default user agent succeeds. In some ways it makes sense, but the fact that RSS feeds are literally designed for machines not humans make me believe this is an oversight on the part of (I assume) squarespace. In any case as long as there is some User Agent it will apparently work. I could copy one from firefox, use a random string, or take inspiration from Brooklyn Nine-Nine.
Running some quick speed tests there is not a substantial difference in terms of speed (though obviously this is in part due to the RSS feeds being queried one after another rather than asynchronously). Documentation and examples are much harder to come by for feed_rs and it feels a little more clunky to use, but not to an extent that it's a game breaker. Now I just need to make sure that all of the information I retrieved in my old code can be retrieved, which means it's time to do a little grave digging and look at the old SQL tables.
Unearthing The Past
Opening up my old python code I can see I originally created two tables, one for all of the feeds being read, and one for all the articles:
db.execute('CREATE TABLE IF NOT EXISTS feeds (id INTEGER PRIMARY KEY, title TEXT, category TEXT, link TEXT, valid INTEGER, updated TEXT)')
db.execute('CREATE TABLE IF NOT EXISTS articles (id INTEGER PRIMARY KEY, uid TEXT, title TEXT, link TEXT, image TEXT, feed_id INTEGER, read INTEGER, added TEXT, publish_date TEXT, updated TEXT, FOREIGN KEY (feed_id) REFERENCES feeds (id))')
For the feeds it all seems fairly self explanatory. Each row has an id as primary key, the title of the RSS feed, what category I've sorted it under, the feed link, if the feed is valid, and when it was updated. I was originally using an sqlite database as it was simple and did the job which is why the updated column is specified as text and valid is integer rather than boolean. This time around I may upgrade from sqlite to something with a bit more oomph, but equally I may decide not to - the current database did the job and is delightfully simple to work with. Other comments on this - it seems like category was a free-form text field, rather than being linked to another table with predefined categories. I'm probably going to modify this so that categories are added separately and then assigned from a drop down when adding a new field. This stops me from making a spelling mistake and having fun new misspelt categories cropping up so easily, but it's not a big change. Finally for this table while I plan to keep the valid column, so as to be able to decommission feeds without deleting their data, I do recall the ability to update and modify feeds being limited last time around, so I'm making a mental note to flesh that out later.
Now moving on to the second table for feeds we've rather more columns. Skipping over the standard integer primary key for each row we've a feed_id foreign key linking back to the feeds table, after which all columns should be details on the feed. Title is obviously the title of the item, while link should be a link to the item on its website (though 99 Percent Invisible has a nasty habit of linking directly to their home page rather than the podcast in question).
Image is a link to an image for the item if one was supplied, as while I did consider downloading these images and having local copies, I seem to recall it was not worth the hassle as I was intending this to be used on non capped internet and nobody was using insanely large images.
Read is simply a boolean masquerading as an integer telling me if I've marked the article as read or not.
Moving onto uid this is the guid from a given item in the feed. No I have no idea why I called it uid and not guid, but hey. This should be a unique value for each item in a feed, and often ends up using the same value as link because you shouldn't have different items with the same link (unless you're 99PI as mentioned earlier).
Next we have added and publish_date. Added is when the article was added to the database by the RSS reader, while publish_date corresponds to when the article was published, derived from the pubDate tag in a feed. I seem to recall debating over whether the articles should be sorted by publish_date or added, eventually settling for added. This was because it would stop articles popping in behind existing articles on refresh if for some reason two articles had been released at the same time but feeds were read several minutes apart. Something else to consider looking at this time around.
Finally we have the updated column and I'll be honest I have no idea why this exists. The original reader never updated items from a feed, if an item was published with wrong information it stayed that way unless it was republished with a new guid. Checking the DB every row has a Null value for this as expected. For the moment I'm going to remove it from the table as I think if I'm going to do updating I will have one row per update, with all rows for an item having the same guid, but only have one that is valid. I'll need to do some testing about how often updates actually happen and what fields I care about. I know for example youtube feeds update based on view count and similar which is utterly worthless to me.
So with all that done lets see if we can ferret out the relevant values using feed_rs
- guid
- title
- link
- image
- publish_date
Looking through our parsed feed it becomes clear that some of these are going to be simpler than others. For example guid is required and is simply returned by item.id
, whereas on the other end of the scale image(s) are found under item.media
, don't have to exist, can exist in multiple resolutions and have all sorts of other gotchas. To have a general feel for the results I went ahead and hacked together some dirty code to grab the first item from every RSS feed and let me have a look at to see if there are any outliers. I think in general image is going to be the only major issue, as I'll likely need some logic to decide which image to take (though I could just default to the first) and make sure other media such as mp3 clips are not accidentally grabbed. That's a future problem though, for now I just want to poke around:
use feed_rs::parser;
use std::error::Error;
use reqwest::header::USER_AGENT;
use std::io::{stdin, Read};
async fn example_feed(feed: &str) -> Result<feed_rs::model::Feed, Box<dyn Error>> {
let client = reqwest::Client::new();
let content = client.get(feed).header(USER_AGENT, "My Stinky Butt").send().await?.bytes().await?;
let channel = parser::parse(&content[..])?;
Ok(channel)
}
fn main() {
let rt = tokio::runtime::Runtime::new().unwrap();
let feeds = include_str!("../RSS_feeds.txt").lines();
for feed in feeds {
let channel = rt.block_on(example_feed(feed));
match channel {
Ok(channel) => {
println!("{}", channel.title.unwrap().content);
for item in channel.entries {
dbg!(item.id);
dbg!(item.title);
dbg!(item.links);
dbg!(item.media);
dbg!(item.published);
break;
}
},
Err(error) => println!("{} - {}", feed, error),
}
stdin().read(&mut [0]).unwrap();
}
}
Having cycled through my feeds I've noticed a few things.
Atom feeds from blogspot.com were the only ones to have multiple links rather than just one. Looking more deeply these feeds seem like a hot mess. There are links back to the atom feed, multiple links to the article, some are duplicates, while others link directly to IDs on the page. For the moment I'm adding these feeds to the naughty bin, and I'm going to come back at the end.
Moving on and all other problems have to do with images/media. There are several categories:
- No image anywhere
- Image embedded into the description (generally comics)
- Media contains an mp3 rather than an image, or in one case both (generally podcasts)
- Same image multiple resolutions
- Multiple different images
- Video and thumbnail (mostly youtube)
For the moment I'm going to simply take a thumbnail if it exists, and otherwise assume the first media element is an image. If there's not a valid media element then I'll default to a local image. Once everything else is sorted I'll come back to this as well, as parsing all the special cases into images is going to be a major pain.
use feed_rs::parser;
use reqwest::header::USER_AGENT;
use std::error::Error;
async fn example_feed(feed: &str) -> Result<feed_rs::model::Feed, Box<dyn Error>> {
let client = reqwest::Client::new();
let content = client
.get(feed)
.header(USER_AGENT, "blah")
.send()
.await?
.bytes()
.await?;
let channel = parser::parse(&content[..])?;
Ok(channel)
}
fn get_image(item: feed_rs::model::Entry) -> String {
match item.media.len() {
0 => "http://192.168.0.45/RSS/noImage.png".to_string(),
1 => match item.media[0].thumbnails.len() {
0 => item.media[0].content[0].url.as_ref().unwrap().as_str(),
_ => item.media[0].thumbnails[0].image.uri.as_str(),
}.to_string(),
_ => "Something has gone horribly wrong".to_string(),
}
}
fn main() {
let rt = tokio::runtime::Runtime::new().unwrap();
let feeds = include_str!("../RSS_feeds.txt").lines();
for feed in feeds {
let channel = rt.block_on(example_feed(feed));
match channel {
Ok(channel) => {
println!("{}", channel.title.unwrap().content);
for item in channel.entries {
println!("{}", item.id);
println!("{}", item.title.as_ref().unwrap().content);
println!("{}", item.links[0].href);
println!("{}", item.published.as_ref().unwrap());
println!("{}", get_image(item));
println!("");
break;
}
}
Err(error) => println!("{} - {}", feed, error),
}
}
}
Running this code I notice a few edge cases that will need fixing, but for the most part everything is mostly working. I originally started to move onto setting up a basic database, before deciding I'd prefer to work on an API which interfaces with the database, and then build the feed parser on top of that. So for the moment we're putting this code to one side and looking into API's.
Are you there webserver? It's Me, Margaret.
To begin with I checked the book and followed along through the basics, before moving on to look for suitable crates to use. In my search I found this excellent article which I'm going to shamelessly rip off borrow heavily from. I've also been heavily cribbing from stripe who have been held up as a great example of API documentation.
To begin with I need to decide what endpoints I want to set up, which means delving back into the past to dredge up some very ugly PHP. Looking through I found the following SQL calls:
- List all valid feeds.
- Set a feed as invalid.
- Add a new feed.
- List all categories with unread articles.
- List all feeds with unread articles for a given category.
- List all unread articles for a given category.
- List all unread articles for a given feed.
- List all unread articles.
- Mark a specific article read.
- Mark all articles from a specific feed/category read.
- Mark all articles read.
- Embarrassingly enough two additional queries which as far as I can tell were never used and don't work
Let's see if we can sketch out some endpoints. It's worth noting that this is entirely new code unrelated to the previous testing, once I've got a handle on it I'll knit the two together. To get all of this running we're going to need to add a few dependencies to Cargo.toml
[dependencies]
actix-web = "4"
actix-rt = "2.6.0"
serde = { version = "1.0.117", features = ["derive"] }
serde_json = "1.0.59"
rusqlite = "0.26.3"
chrono = { version = "0.4", features = ["serde"] }
main.rs
use actix_web::{web, App, HttpServer};
mod api;
mod db;
#[actix_web::main]
async fn main() -> std::io::Result<()> {
let pool = db::startup();
println!("starting HTTP server at http://localhost:8080");
HttpServer::new(move || {
App::new()
.app_data(web::Data::new(pool.clone()))
.service(api::list_endpoints)
.service(api::feeds_list)
.service(api::feeds_add)
.service(api::feeds_mark_valid)
.service(api::feeds_mark_invalid)
.service(api::articles_list_all)
.service(api::articles_list_all_unread)
.service(api::articles_list_category)
.service(api::articles_list_category_unread)
.service(api::articles_list_feed)
.service(api::articles_list_feed_unread)
/*.service(api::articles_list_searchterm)
.service(api::articles_mark_read)
.service(api::articles_mark_unread)
.service(api::articles_mark_read_undo)*/
})
.bind(("0.0.0.0", 8080))?
.run()
.await
}
Here we're calling our database module to set up a connection pool for our database and make sure that the tables we need exist. I was originally planning to do this manually, but quickly found r2d2 which was both clean and simple to implement (and come on it's called r2d2 what more could you want). After creating our connection pool for our database we set Actix Web going which is what we will be using for our actual API. We clone the pool to allow Actix to take ownership and pass it on as application data so that our future API calls can make use of it.
api.rs
use actix_web::{get, http::header::ContentType, post, put, web, HttpResponse};
use r2d2_sqlite::SqliteConnectionManager;
use serde::{Deserialize, Serialize};
use crate::db;
#[derive(Debug, Deserialize, Serialize)]
pub struct IncomingFeed {
pub title: String,
pub category: String,
pub link: String,
}
#[derive(Debug, Deserialize, Serialize)]
pub struct IncomingFeeds {
feeds: Vec<IncomingFeed>,
}
#[derive(Debug, Deserialize, Serialize)]
pub struct FeedIds {
ids: Vec<i32>,
}
#[derive(Debug, Deserialize, Serialize)]
pub struct Category {
name: String,
}
#[get("/")]
pub async fn list_endpoints() -> HttpResponse {
HttpResponse::Ok()
.content_type(ContentType::json())
.json(vec![
"feeds/list",
"feeds/add",
"feeds/mark_valid",
"feeds/mark_invalid",
"articles/list_all",
"articles/list_all_unread",
"articles/list_category",
"articles/list_category_unread",
"articles/list_feed",
"articles/list_feed_unread",
"articles/list_search",
"articles/mark_read",
"articles/mark_unread",
"articles/mark_read_undo",
])
}
#[get("feeds/list")]
pub async fn feeds_list(db: web::Data<r2d2::Pool<SqliteConnectionManager>>) -> HttpResponse {
let feeds = db::get_feeds(&db);
HttpResponse::Ok()
.content_type(ContentType::json())
.json(feeds)
}
#[post("feeds/add")]
pub async fn feeds_add(
db: web::Data<r2d2::Pool<SqliteConnectionManager>>,
incomingfeeds: web::Json<IncomingFeeds>,
) -> HttpResponse {
db::add_feeds(&db, incomingfeeds.into_inner().feeds);
HttpResponse::Ok()
.content_type(ContentType::plaintext())
.body("Seemed to work")
}
#[put("feeds/mark_valid")]
pub async fn feeds_mark_valid(
db: web::Data<r2d2::Pool<SqliteConnectionManager>>,
feedids: web::Json<FeedIds>,
) -> HttpResponse {
db::mark_feeds_valid(&db, feedids.into_inner().ids);
HttpResponse::Ok()
.content_type(ContentType::plaintext())
.body("Seemed to work")
}
#[put("feeds/mark_invalid")]
pub async fn feeds_mark_invalid(
db: web::Data<r2d2::Pool<SqliteConnectionManager>>,
feedids: web::Json<FeedIds>,
) -> HttpResponse {
db::mark_feeds_invalid(&db, feedids.into_inner().ids);
HttpResponse::Ok()
.content_type(ContentType::plaintext())
.body("Seemed to work")
}
#[get("articles/list_all")]
pub async fn articles_list_all(
db: web::Data<r2d2::Pool<SqliteConnectionManager>>,
) -> HttpResponse {
let feeds = db::get_articles(&db, false);
HttpResponse::Ok()
.content_type(ContentType::json())
.json(feeds)
}
#[get("articles/list_all_unread")]
pub async fn articles_list_all_unread(
db: web::Data<r2d2::Pool<SqliteConnectionManager>>,
) -> HttpResponse {
let feeds = db::get_articles(&db, true);
HttpResponse::Ok()
.content_type(ContentType::json())
.json(feeds)
}
#[get("articles/list_category")]
pub async fn articles_list_category(
db: web::Data<r2d2::Pool<SqliteConnectionManager>>,
category: web::Json<Category>,
) -> HttpResponse {
let feeds = db::get_articles_by_category(&db, &category.name, false);
HttpResponse::Ok()
.content_type(ContentType::json())
.json(feeds)
}
#[get("articles/list_category_unread")]
pub async fn articles_list_category_unread(
db: web::Data<r2d2::Pool<SqliteConnectionManager>>,
category: web::Json<Category>,
) -> HttpResponse {
let feeds = db::get_articles_by_category(&db, &category.name, true);
HttpResponse::Ok()
.content_type(ContentType::json())
.json(feeds)
}
#[get("articles/list_feed")]
pub async fn articles_list_feed(
db: web::Data<r2d2::Pool<SqliteConnectionManager>>,
id: web::Json<i32>,
) -> HttpResponse {
let feeds = db::get_articles_by_feed(&db, *id, false);
HttpResponse::Ok()
.content_type(ContentType::json())
.json(feeds)
}
#[get("articles/list_feed_unread")]
pub async fn articles_list_feed_unread(
db: web::Data<r2d2::Pool<SqliteConnectionManager>>,
id: web::Json<i32>,
) -> HttpResponse {
let feeds = db::get_articles_by_feed(&db, *id, true);
HttpResponse::Ok()
.content_type(ContentType::json())
.json(feeds)
}
/*
#[get("articles/list_all")]
pub async fn articles_list_all(
db: web::Data<r2d2::Pool<SqliteConnectionManager>>,
unread_only: web::Json<UnreadOnly>,
) -> HttpResponse {
let feeds = db::get_articles(&db, unread_only.value);
HttpResponse::Ok()
.content_type(ContentType::json())
.json(feeds)
}
#[get("articles/list_feed")]
pub async fn articles_list_feed(
db: web::Data<r2d2::Pool<SqliteConnectionManager>>,
) -> HttpResponse {
let feeds = db::get_articles(&db, unread_only.value);
HttpResponse::Ok()
.content_type(ContentType::json())
.json(feeds)
}
#[get("articles/list_searchterm")]
pub async fn articles_list_searchterm(
db: web::Data<r2d2::Pool<SqliteConnectionManager>>,
) -> HttpResponse {
let feeds = db::get_articles(&db, unread_only.value);
HttpResponse::Ok()
.content_type(ContentType::json())
.json(feeds)
}
#[put("articles/mark_read")]
pub async fn articles_mark_read(
db: web::Data<r2d2::Pool<SqliteConnectionManager>>,
) -> HttpResponse {
let feeds = db::get_articles(&db, unread_only.value);
HttpResponse::Ok()
.content_type(ContentType::json())
.json(feeds)
}
#[put("articles/mark_unread")]
pub async fn articles_mark_unread(
db: web::Data<r2d2::Pool<SqliteConnectionManager>>,
) -> HttpResponse {
let feeds = db::get_articles(&db, unread_only.value);
HttpResponse::Ok()
.content_type(ContentType::json())
.json(feeds)
}
#[put("articles/mark_read_undo")]
pub async fn articles_mark_read_undo(
db: web::Data<r2d2::Pool<SqliteConnectionManager>>,
) -> HttpResponse {
let feeds = db::get_articles(&db, unread_only.value);
HttpResponse::Ok()
.content_type(ContentType::json())
.json(feeds)
}
Here we're defining all API endpoints which will then call database functions from:
db.rs
use r2d2::Pool;
use r2d2_sqlite::SqliteConnectionManager;
use rusqlite::params;
use serde::{Deserialize, Serialize};
use crate::api;
#[derive(Debug, Deserialize, Serialize)]
pub struct Feed {
id: i32,
title: String,
category: String,
link: String,
valid: bool,
updated: String,
}
#[derive(Debug, Deserialize, Serialize)]
pub struct Article {
id: i32,
uid: String,
title: String,
link: String,
image: String,
feed_id: i32,
read: bool,
added: String,
publish_date: String,
}
pub fn startup() -> r2d2::Pool<SqliteConnectionManager> {
let manager = SqliteConnectionManager::file("rss.sqlite");
let pool = r2d2::Pool::new(manager).unwrap();
pool.get()
.unwrap()
.execute(
"CREATE TABLE IF NOT EXISTS feeds (
id integer primary key,
title text,
category text,
link text,
valid integer,
updated text
)",
[],
)
.unwrap();
pool.get()
.unwrap()
.execute(
"create table if not exists articles (
id integer primary key,
uid text,
title text,
link text,
image text,
feed_id integer,
read integer,
added text,
publish_date text,
foreign key (feed_id) references feeds (id)
)",
[],
)
.unwrap();
pool
}
pub fn get_feeds(pool: &Pool<SqliteConnectionManager>) -> Vec<Feed> {
let pool = pool.clone();
let conn = pool.get().unwrap();
let mut stmt = conn.prepare("SELECT * FROM feeds").unwrap();
stmt.query_map([], |row| {
Ok(Feed {
id: row.get(0).unwrap(),
title: row.get(1).unwrap(),
category: row.get(2).unwrap(),
link: row.get(3).unwrap(),
valid: row.get(4).unwrap(),
updated: match row.get(5) {
Ok(v) => v,
Err(_) => "Never".to_string(),
},
})
})
.unwrap()
.map(Result::unwrap)
.collect()
}
pub fn add_feeds(pool: &Pool<SqliteConnectionManager>, feeds: Vec<api::IncomingFeed>) {
let pool = pool.clone();
let mut conn = pool.get().unwrap();
let transact = conn.transaction().unwrap();
for feed in feeds {
transact.execute("INSERT INTO feeds (title, category, link, valid, updated) VALUES (?1, ?2, ?3, ?4, ?5)",
params![feed.title, feed.category, feed.link, 1 as i32, "never".to_string()],
).unwrap();
}
transact.commit();
}
pub fn mark_feeds_valid(pool: &Pool<SqliteConnectionManager>, ids: Vec<i32>) {
let pool = pool.clone();
let mut conn = pool.get().unwrap();
let transact = conn.transaction().unwrap();
for id in ids {
transact
.execute("UPDATE feeds SET valid=1 WHERE id = (?)", params![id])
.unwrap();
}
transact.commit();
}
pub fn mark_feeds_invalid(pool: &Pool<SqliteConnectionManager>, ids: Vec<i32>) {
let pool = pool.clone();
let mut conn = pool.get().unwrap();
let transact = conn.transaction().unwrap();
for id in ids {
transact
.execute("UPDATE feeds SET valid=0 WHERE id = (?)", params![id])
.unwrap();
}
transact.commit();
}
pub fn get_articles(pool: &Pool<SqliteConnectionManager>, unread_only: bool) -> Vec<Article> {
let pool = pool.clone();
let conn = pool.get().unwrap();
let query = if unread_only {
"SELECT * FROM articles WHERE read=0"
} else {
"SELECT * FROM articles LIMIT 200"
};
let mut stmt = conn.prepare(query).unwrap();
stmt.query_map([], |row| {
Ok(Article {
id: row.get(0).unwrap(),
uid: row.get(1).unwrap(),
title: row.get(2).unwrap(),
link: row.get(3).unwrap(),
image: row.get(4).unwrap(),
feed_id: row.get(5).unwrap(),
read: row.get(6).unwrap(),
added: row.get(7).unwrap(),
publish_date: match row.get(8) {
Ok(v) => v,
Err(_) => "none".to_string(),
},
})
})
.unwrap()
.map(Result::unwrap)
.collect()
}
pub fn get_articles_by_category(pool: &Pool<SqliteConnectionManager>, category: &str, unread_only: bool) -> Vec<Article> {
let pool = pool.clone();
let conn = pool.get().unwrap();
let query = if unread_only {
"SELECT * FROM articles WHERE feed_id IN (SELECT id from feeds WHERE category = (?) AND valid = 1 ) and read=0"
} else {
"SELECT * FROM articles WHERE feed_id IN (SELECT id from feeds WHERE category = (?) AND valid = 1 ) LIMIT 200"
};
let mut stmt = conn.prepare(query).unwrap();
stmt.query_map([category], |row| {
Ok(Article {
id: row.get(0).unwrap(),
uid: row.get(1).unwrap(),
title: row.get(2).unwrap(),
link: row.get(3).unwrap(),
image: row.get(4).unwrap(),
feed_id: row.get(5).unwrap(),
read: row.get(6).unwrap(),
added: row.get(7).unwrap(),
publish_date: match row.get(8) {
Ok(v) => v,
Err(_) => "none".to_string(),
},
})
})
.unwrap()
.map(Result::unwrap)
.collect()
}
pub fn get_articles_by_feed(pool: &Pool<SqliteConnectionManager>, feed_id: i32, unread_only: bool) -> Vec<Article> {
let pool = pool.clone();
let conn = pool.get().unwrap();
let query = if unread_only {
"SELECT * FROM articles WHERE feed_id IN (SELECT id from feeds WHERE id = (?) AND valid = 1 ) and read=0"
} else {
"SELECT * FROM articles WHERE feed_id IN (SELECT id from feeds WHERE id = (?) AND valid = 1 ) LIMIT 200"
};
let mut stmt = conn.prepare(query).unwrap();
stmt.query_map([feed_id], |row| {
Ok(Article {
id: row.get(0).unwrap(),
uid: row.get(1).unwrap(),
title: row.get(2).unwrap(),
link: row.get(3).unwrap(),
image: row.get(4).unwrap(),
feed_id: row.get(5).unwrap(),
read: row.get(6).unwrap(),
added: row.get(7).unwrap(),
publish_date: match row.get(8) {
Ok(v) => v,
Err(_) => "none".to_string(),
},
})
})
.unwrap()
.map(Result::unwrap)
.collect()
}
This module contains the initialisation of our database pool as mentioned when discussing main.rs. In addition we have defined two structures to mirror our two database tables, for easy transformations between SQL and JSON. It's worth noting that for the moment I am using a copy of the database from my currently running RSS Reader. It's about 60MB/500K rows and a little messy in places, but for the moment it's useful for setting up the ground work. Due to this there are a couple of ugly match statements to catch a few malformed rows. In future the DB will be being modified slightly and actual error handling will be implemented properly.
Running this code we can check our API works (at least it does now after 10m of fixing several daft mistakes). Testing using a pi we have a response time of ~25ms for calling feeds and 250ms for calling articles. I'm not thrilled with a 250ms response time, but for half a million rows using sqlite and with no indexes it's well within tolerance for now.