Parsing line-based protocol using Rust and nom part 1

Let’s suppose we have some kind of line protocol and we would like to parse it in Rust. Here is an example how this protocol might look like:

"#MEAS_NUM;voltage;20.1;V\n"
"#MEAS_TEXT;serial;CAFEBABE\n"
"#INPUT;Is it broken?;YES,NO,MAYBE\n"

As you can see there are different messages that must be parsed. Let’s create some structs to describe their payloads:

// "#MEAS_NUM;voltage;20.1;V"
struct Num {
    name: String,
    value: f32,
    unit: String
}

// "#MEAS_TEXT;serial;CAFEBABE"
struct Str {
    name: String,
    value: String,
}

// "#INPUT;Is it broken?;YES,NO,MAYBE"
struct Input {
    message: String,
    variants: Vec<String>
}

We also need a Message type to combine all these possible messages:

enum Message {
    Number(Num),
    Text(Str),
    Input(Input),
}

This allows us to define a parser function that takes in a line and outputs a Message:

fn parser(line: &str) -> Result<Message> {
    ...
}

And this function can be used like this:

for line in input.lines() {
    match parser(line) {
        Ok(Message::Number(n) => {...}
        Ok(Message::Text(t) => {...}
        Ok(Message::Input(i) => {...}
        Err(e) => println!("parser error: {:?}", e); 
    }
}

Nice and Rusty.

Parsing #

Lets go back to actual parsing. Notice that all fields are separated by common delimiter ; therefore we can use split for segmentation.

let line = "#MEAS_NUM;voltage;20.1;V";
let v = line.split(";").collect::<Vec<&str>>();
println!("{:?}", v);

> ["#MEAS_NUM", "voltage", "20.1", "V"]

Excellent, everything is now separated. So lets make an actual parser function based on that idea. Firstly some error handling is needed. For making life easier, anyhow error handling crate is used. We also need to convert from str to f32. Therefore we need these 2 imports:

use std::str::FromStr;
use anyhow::{Result, anyhow};
fn parse(line: &str) -> Result<Message> {
    let mut token = line.split(";");
    let msg = match token.next() {
        Some(m) if m == "#MEAS_NUM" => {
            let name = token
                .next()
                .ok_or(anyhow!("no name"))
                .map(|s| s.to_string())?;
            let value = token
                .next()
                .ok_or(anyhow!("no value"))
                .and_then(|s| Ok(f32::from_str(s)))??;
            let unit = token
                .next()
                .ok_or(anyhow!("no unit"))
                .map(|s| s.to_string())?;

            Ok(Message::Number(Num {name, value, unit}))
        }
        Some(m) if m == "#MEAS_TEXT" => {
            let name = token
                .next()
                .ok_or(anyhow!("no name"))
                .map(|s| s.to_string())?;
            let value = token
                .next()
                .ok_or(anyhow!("no value"))
                .map(|s| s.to_string())?;

            Ok(Message::Text(Str {name, value}))
        }
        Some(m) if m == "#INPUT" => {
            let message = token
                .next()
                .ok_or(anyhow!("no message"))
                .map(|s| s.to_string())?;
            let variants = token
                .next()
                .ok_or(anyhow!("no variants"))
                .map(|s| s.split(",")
                    .map(|v| v.to_string()
                )
                .collect::<Vec<_>>())?;

            Ok(Message::Input(Input {message, variants}))
        }
        Some(m) => {
            Err(anyhow!("incorrect header {:}", m))
        }
        None => {
            Err(anyhow!("incorrect input {:}", line))
        }
    };

    msg
}

Parsing begins by splitting line into tokens and after that it is checked whether first token represents some header like #MEAS_NUM or #MEAS_TEXT from protocol.
Lets suppose it was #MEAS_NUM. Next it must be checked if name field is also there.

let name = token
    .next()
    .ok_or(anyhow!("no name"))
    .map(|s| s.to_string())?;

It could be that .next() produces None meaning that there isn’t any data here. Parser should report this as an error because we are unable to parse any further. Conveniently .ok_or can be used to convert from Option type to Result type and anyhow!("no name") creates an Error with a custom message. We are using String as data type so .map(|s| s.to_string()) does conversion from &str to String. Notice that ? is used in the end. It unwraps a value if everything was successful or returns Error if something failed.

Parsing value: f32 is also interesting:

let value = token
    .next()
    .ok_or(anyhow!("no value"))
    .and_then(|s| Ok(f32::from_str(s)))??;

Here we also use and_then combinator which runs a function if previous result was successful. So it runs only if .next() has returned at least something. Return value from f32::from_str(s) is wrapped to Ok() because without that Error types do not match. Notice that Rust compiler gives us nice hint here:

40 |                 .and_then(|s| f32::from_str(s))?;
   |                               ^^^^^^^^^^^^^^^^
   |                               |
   |                               expected struct `anyhow::Error`, found struct `std::num::ParseFloatError`
   |                               help: try using a variant of the expected enum: `Ok(f32::from_str(s))`
   |

Finally double ?? is used to unwrap actual value from Result<Result<f32>> or stop parsing and return with an error.

Tests #

All right, lets try it out with correct inputs first.

let s = "#MEAS_NUM;voltage;20.1;V";
println!("parsed: {:?}", parse(s));

let s = "#MEAS_TEXT;serial;CAFEBABE";
println!("parsed: {:?}", parse(s));

let s = "#INPUT;Is it broken?;YES,NO,MAYBE";
println!("parsed: {:?}", parse(s));

> parsed: Ok(Number(Num { name: "voltage", value: 20.1, unit: "V" }))
> parsed: Ok(Text(Str { name: "serial", value: "CAFEBABE" }))
> parsed: Ok(Input(Input { message: "Is it broken?", variants: ["YES", "NO", "MAYBE"] }))

Very good everything is parsed as expected. And now with incorrect input:

let s = "#MEAS_NUM;voltage;20.1";
println!("parsed: {:?}", parse(s));

let s = "#MEAS_NUM;voltage;twenty;V";
println!("parsed: {:?}", parse(s));

> parsed: Err(no unit)
> parsed: Err(invalid float literal)

As expected errors are returned. However error messages itself are not that good.

What next? Could it be possible to improve such a parser. There is already quite a bit of manual error handling in current version. Maybe somebody has already figured out how to do it in a clean manner and with better error messages?

You betcha! Enter nom - Rust parser combinator framework, which I am going to use in part 2.

 
27
Kudos
 
27
Kudos

Now read this

Convert subprocess stdout stream into non-blocking iterator in Rust

In one of my programs I had to interact with another subprocess. This subprocess took data from stdin and wrote result to stdout. It wasn’t just simple reading and writing - it took constant data stream from stdin and somewhere in the... Continue →