what we blog

Getting started with Rust on the command line

This post is intended for people with no previous knowledge in Rust. It assumes some knowledge about programming, but none about Rust.

Rust has come a long way in the recent 2 years, from a promising new language to a practical day-to-day tool. Indeed, I noticed more and more people considering Rust as an alternative to Ruby and Python for everyday programs.

I'd like to present why Rust is a feasoble option, by writing a small, but useful command line tool. We'll go through it in high detail, so don't be intimidated by the length of this.

I do encourage you to read the source before reading this post and see how much you understand without explanation. I expect this will be quite a lot, already!

If you want to know more about Rust or your company wants to adopt it, remember that we offer courses and development and consulting.

readrust.net

The Rust team started a call for community blog posts recently. A kind soul started collecting all of these on the website readrust.net, both providing an RSS and a JSON feed.

As a small exercise, I'll show you how to write a small program that fetches the JSON feed, parses in and outputs it on the console in a formatted fashion.

If you already have Rust installed and are comfortable with cargo, please skip to "let's get started".

Getting Rust

The usual way to get Rust for development is by getting rustup. rustup might also be available in your distributions package management.

rustup manages development toolchains. It allows you to change the version of Rust in use, manage additional development tools such as the Rust Langauge server or download the development toolchain for different targets.

For deployment, I recommend checking if your package manager ships a rust compiler and target that version. Assuming you have rustup installed, run:

$ rustup install stable

For this example, you need at least Rust 1.20, because of some dependencies requiring it.

Looking around

This installed the Rust compiler, the package and build manager cargo, the documentation and some other tools. Notably, rustup doc opens the documentation of the compiler in use in your favourite browser. rustup doc --std does the same, but directly jumps for the standard library.

Setting up a project: cargo

cargo manages and builds Rust projects. We want to build a small program, which is why we request cargo to build a binary project.

$ cargo init --bin readrust-cli

This will create a directory called readrust-cli. Let's have a look around in that directory:

.
├── Cargo.toml
└── src
    └── main.rs

You will notice that the project structure is quite small: it only contains our program (src/main.rs) and (Cargo.toml). Let's have a look at Cargo.toml:

[package]
name = "readrust-cli"
version = "0.1.0"
authors = ["Florian Gilcher <florian.gilcher@asquera.de>"]

[dependencies]

Currently, it only holds a bit of metadata on our project. Note that the dependencies section is currently empty. main.rs contains a small "Hello world!"-program by default. Let's run it:

$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/readrust-cli`
Hello, world!

Nice! Everything works. cargo run automatically invokes the compiler, rustc, builds the programm and then runs it. cargo also detects any changes to the sources that we make and recompiles if we want. This is really convenient.

Let's get started!

Let's plan ahead. We want to write a proper commandline tool. On the other hand, we want to solve our problem and not solve too many side-problems. Let's look at what we need:

  • A command line argument parser
  • An HTTP client to download the feed
  • A JSON parser to parse the feed
  • A pretty-printer to put stuff on the console

Feature-wise, I'd like to introduce a little bit of flexibility, enough to get started. The program should take an option --count, which makes the program print the number of current posts instead of printing the list. Additionally, it should have an option --number [NUMBER] which only prints the given number of posts instead of all of them.

CLAP 👏

CLAP stands for command line argument parser. That was easy, was it? CLAP comes with extensive documentation, and we're just going to reuse one of their examples.

First, we must add clap to our dependencies. For that, we just specify the name and the version in our Cargo.toml.

[dependencies]
clap = "2.29"

Now, if you run cargo build, clap will be compiled along with the project. To use it, we still need tell Rust that we are using an external library, as so called crate. We also need to import the types that we are going to use. Clap has a very nice API that lets us configure our application at our hearts extent. We're going to settle for something minimal.

extern crate clap;

use clap::App;

fn main() {
    let app = App::new("readrust")
        .version("0.1")
        .author("Florian G. <florian.gilcher@asquera.de>")
        .about("Reads readrust.net")
        .args_from_usage("-n, --number=[NUMBER] 'Only print the NUMBER most recent posts'
                          -c, --count           'Show the count of posts'");

    let matches = app.get_matches();
}

Now, let's compile and run this, passing the flag --help to get the help output of this program.

readrust 0.1
Florian G. <florian.gilcher@asquera.de>
Reads readrust.net

USAGE:
    readrust [FLAGS] [OPTIONS]

FLAGS:
    -c, --count      Show the count of posts
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -n, --number <NUMBER>    Only print the NUMBER most recent posts

Nice! Just a couple of simple lines and we got proper help instructions for our program!

If you want to learn more about fancy command line parsing, have a look at the extensive docs.

Getting the data

Now, before we can actually use the arguments we just parsed, we need the data to used them on. Let's wrap that into a function, the signature of which is:

fn get_feed() -> String {
    // implementation
}

A good HTTP client for this task is reqwest. There is also hyper by the same author. Hyper is more intended as a low-level library, while reqwest covers "do something reasonable quick".

[dependencies]
reqwest = "0.8"

Implementing this function is very easy:

extern crate reqwest;

pub static URL: &str = "http://readrust.net/rust2018/feed.json";

fn get_feed() -> String {
    let client = reqwest::Client::new();
    let mut request = client.get(URL);

    let mut resp = request.send().unwrap();

    assert!(resp.status().is_success());

    resp.text().unwrap()
}

This is looking very similar to how you'd do it in most programming languages. The flow is usual: we construct a client, which allows us to create a request. We could further configure that request. By calling send(), we send the request and get a response back. Calling text() on the response will return the response body as a String.

Note the usage of mut here. In Rust, we need to declare things we later want to change as mutable. We're sending the request away and reading the response, so the mutability is needed.

Finally, unwrap and assert. Sending a request or reading the respons are fallible operations (the most obvious case being the network going away in between). For that reason, send (which sends the request) and text (which reads the response body) return a type called Result. Rust expects us to handle that one. unwrap makes the program panic in that case: it quits, but cleans up all memory before doing so. If there was no error, we just get the expected value back. assert! is similar: it will panic if the condition doesn't hold. The request might be successful in the sense that the backend answered, but the HTTP status is not 200 SUCCESS (maybe, because the server had an internal error). The assert guards against that.

In many scripting languages, you would just get an unhandled exception at this place. In Rust, we opt in to stopping execution at this place. This effect is similar, we're just coming from two different directions.

Don't feel bad about using unwrap everywhere at the beginning: You can refactor away from it and in our context, it's a reasonable approach!

Parsing JSON: Serde

We're already quite far! Now, we need to parse this JSON feed. For that, Rust has serde, a SERialisation/DEserialisation framework. Serde can not only handle JSON, but also many other formats. Serde also provides convenient ways to define serialisable and deserialisable types, so called derives.

For that reason, we need to depend on three different libraries: serde, serde_derive and serde_json.

[dependencies]
serde = "1.0"
serde_derive = "1.0"
serde_json = "1.0"

serde_json gives you 2 ways to handle JSON: either, you parse into a very generic JSON tree, or you tell serde the structure of the data you expect. The second is both faster and more convenient. You should only use the generic structure if you don't know what to expect.

Looking at the feed definition, we find that there's 3 main types: authors, items and the feed itself. Both the feed and the item have an author. So let's start with that:

extern crate serde;
#[macro_use]
extern crate serde_derive;
extern crate serde_json;

#[derive(Debug, Deserialize, Serialize)]
struct Author {
    name: String,
    url: String,
}

To keep our first iteration brief, we'll model the URL as a plain string, that could be changed, later. Note that we just define a plain datastructure with two fields here. The names match those in the JSON. The magic happens in the derive part. This is the instruction to the compiler to generate additional things based on that structure. Debug generates a debug representation and is generally recommended, Deserialize and Serialize generate the serde machinery to deserialise and serialise. Note that his is still independent of the format we'll later use.

Let's continue with the feed items:

#[derive(Debug, Deserialize, Serialize)]
struct FeedItem {
    id: String,
    title: String,
    content_text: String,
    url: String,
    date_published: String,
    author: Author,
}

This is all pretty similar, again, we interpret dates as Strings for the first iteration. But as you see, we can just reuse the Author type we used before. We call the type FeedItem instead of the very generic Item, as this communicates clearer what the type is. So, let's see what the feed looks like:

#[derive(Debug, Deserialize, Serialize)]
struct Feed {
    version: String,
    title: String,
    home_page_url: String,
    feed_url: String,
    description: String,
    author: Author,
    items: Vec<FeedItem>,
}

There's not many news here, except items: here, we wrap Item into Vec. A Vec, or "vector" is the standard type in Rust for a list of things. Vec can hold any type of things, as long as they are all the same. For those used to languages with generics, the notation is familiar: vectors are generic over the type they contain, and in this case, we decide it to be Item.

Phew. That was a bit verbose, but mostly just busywork. So, let's see how we use it. Let's make get_feed return Feed instead of String.

fn get_feed() -> Feed {
    let client = reqwest::Client::new();
    let mut request = client.get(URL);

    let mut resp = request.send().unwrap();

    assert!(resp.status().is_success());

    let json = resp.text().unwrap();

    serde_json::from_str(&json).unwrap()
}

There's only two things to add: We bind the returned text to a variable json, and then, we just use the appropriate parsing function on it. As always, parsing can fail, so we encounter unwrap again.

Rust finds out that we want to parse the json text into the Feed structure, because we changed the return value of get_feed. If the json contains something different, the program will stop with an error. This is very convenient and excuses the busywork of writing down all those types: they not only made serde happy, the give us a lot more convenience. If readrust.net changes format or it sends us junk, we would detect that at the first possible moment.

Counting

We're close to the end, we still need to show something to our user. Let's start with the simple part: just show the count of items. This function prints the count of items in a Feed:

fn print_count(feed: &Feed) {
    println!("Number of posts: {}", feed.items.len());
}

Note the &: Rust is a systems language and provides two ways of passing arguments. "Owned" and "borrowed". "Owned" means that the caller will lose access of it (they "pass on ownership"). With owned values, you can do anything: destroy them, ignore them, manipulate them. "Borrowed" means that they only let you look at it, but the caller wants it back afterwards. They can even decide if you can manipulate it or not. In this case, we don't need to, so we take a borrow that doesn't allow manipulation.

Usage is easy. Back in main, we do:

    let matches = app.get_matches();

    let feed = get_feed();

    if matches.is_present("count") {
        print_count(&feed);
    }

get_matches gave use the matched arguments from CLAP back. The call to is_present just tells us if the user requested a count. Note that we have to use the ampersand here in the call as well, to indicate at the call site that we want to borrow. This ensures the situation is visually clear both at this place.

Let's run it:

[ skade readrust-cli ] cargo run -- --count
   Compiling readrust v0.1.0 (file:///Users/skade/Code/rust/readrust-cli)
    Finished dev [unoptimized + debuginfo] target(s) in 2.46 secs
     Running `target/debug/readrust --count`
Number of posts: 82

Great!

Prettyprinting

The final hill to climb is a nice output of the items. I decided for a tabular representation. There's a beautiful library for that, prettytable. It comes with a lot of examples and can do quite some complex stuff.

[dependencies]
prettytable-rs = "0.6"

Let's take one of the examples and turn it into a print function:

#[macro_use]
extern crate prettytable;

fn print_feed_table<'feed, I: Iterator<Item = &'feed FeedItem>>(items: I) {
    let mut table = prettytable::Table::new();

    table.add_row(row!["Title", "Author", "Link"]);

    for item in items {
        let title = if item.title.len() >= 50 {
            &item.title[0..49]
        } else {
            &item.title
        };

        table.add_row(row![title, item.author.name, item.url]);
    }

    table.printstd();
}

Again, there's a couple of particularities here. The construction of the table and the add_row API should be pretty clear. Also, the for loop is quite accessible: given a list of items, loop over them. Because some of our blog post authors are not into short titles, we need to shorten some of them. Let's look at the if-expression: in Rust, all expressions return values, similar to Ruby. That means we can assign the result of if to title. Looking at the two branches, you'll notice the ampersand again. This is called "taking a slice". If the title is longer then 50 characters, we take the first 50 as a subslice of the orginal String. That avoids copying it. If it's shorter, we just take the whole string as a slice. The similarity to & for borrows isn't accidental. We are borrowing that subslice. It's just that the operation is so common that it has a name.

And then, the function signature looks special: fn print_feed_table<'feed, I: Iterator<Item = &'feed FeedItem>>(items: I). Functions can be generic in Rust and I decided to implement print_feed_table in a fashion where it takes anything that implements Iterator and hands out borrows to the items. The things an Iterator hands out are called Item (as a parameter), the type we iterate over in our example are our FeedItems we introduced earlier. Finally, there's 'feed. Rust checks our borrows so that they are always pointing to something: the pointed data must be alive. It communicates that through function signatures. In our case, to hand out borrows to the feed items, the feed must still be in memory. Roughly speaking, what the expression <'feed, I: Iterator<Item = &'feed FeedItem>> communicates is: there's something outside of the function, existing for a certain time 'feed. It provides the Items we borrow. We take an iterator I, which iterates over it, handing out borrows that can only exist shorter then the data pointed to. Lifetimes express that relationship, without pulling in all contextual info. Don't be intimitated by that, you can't break anything if you forget any of these.

Usage, again, is simple:

    if matches.is_present("count") {
        print_count(&feed);
    } else {
        let iter = feed.items.iter();

        if let Some(string) = matches.value_of("number") {
            let number = string.parse().unwrap();
            print_feed_table(iter.take(number))
        } else {
            print_feed_table(iter)
        }
    }

And here we see the reason why I chose that implementation: To implement the --number option, I chose to take an Iterator over the feed items. If a number is given, I parse that number (which, again, can fail if the user passes junk), and transform the iterator into a Take iterator. Take takes up to a number of items from the original iterator and then finishes. That is robust against feeds that have less items then the given number.

We're done!

Again, you can find the source code here.

How to go on from here

This gives you a fun project to play around that you can extend. For example, try:

  • Do some statistics over the items
  • Filter the titles by regular expression
  • More unified error management with some diagnostics
  • Make dates proper dates and filter by them
  • Follow the links and try to get data out of the posts

Conclusion

Let's see what we ended up with: we have a short program 79 lines in total, of which 23 are type definitions). It does full error management, even if our current strategy is "just stop". The JSON format is parsed a safe fashion, disallowing format errors.

The program is robust and handles many edge cases already. For example, by using iteration over indexing, we are safe against feed length changes. It efficiently uses type information for programmer benefit, and doesn't make types become a chore.

Additionally, it is constructed out of convenient, well-documented libraries that you can quickly get up to speed with. Rust tooling for CLI programs has vastly improved in the last 2 years, with many useful and mature libraries around.

The program is cross-platform and can be compiled on many platforms. You can ship it to other people by just copying the binary around.