Parsing line-based protocol using Rust and nom part 1
Let’s suppose we have some kind of line protocol and we would like to parse it in Rust. Here is an example how this protocol might look like:
"#MEAS_NUM;voltage;20.1;V\n"
"#MEAS_TEXT;serial;CAFEBABE\n"
"#INPUT;Is it broken?;YES,NO,MAYBE\n"
As you can see there are different messages that must be parsed. Let’s create some structs to describe their payloads:
// "#MEAS_NUM;voltage;20.1;V"
struct Num {
name: String,
value: f32,
unit: String
}
// "#MEAS_TEXT;serial;CAFEBABE"
struct Str {
name: String,
value: String,
}
// "#INPUT;Is it broken?;YES,NO,MAYBE"
struct Input {
message: String,
variants: Vec<String>
}
We also need a Message
type to combine all these possible messages:
enum Message {
Number(Num),
Text(Str),
Input(Input),
}
This allows us to define a parser function that takes in a line
and outputs a Message
:
fn parser(line: &str) -> Result<Message> {
...
}
And this function can be used like this:
for line in input.lines() {
match parser(line) {
Ok(Message::Number(n) => {...}
Ok(Message::Text(t) => {...}
Ok(Message::Input(i) => {...}
Err(e) => println!("parser error: {:?}", e);
}
}
Nice and Rusty.
Parsing #
Lets go back to actual parsing. Notice that all fields are separated by common delimiter ;
therefore we can use split for segmentation.
let line = "#MEAS_NUM;voltage;20.1;V";
let v = line.split(";").collect::<Vec<&str>>();
println!("{:?}", v);
> ["#MEAS_NUM", "voltage", "20.1", "V"]
Excellent, everything is now separated. So lets make an actual parser function based on that idea. Firstly some error handling is needed. For making life easier, anyhow error handling crate is used. We also need to convert from str
to f32
. Therefore we need these 2 imports:
use std::str::FromStr;
use anyhow::{Result, anyhow};
fn parse(line: &str) -> Result<Message> {
let mut token = line.split(";");
let msg = match token.next() {
Some(m) if m == "#MEAS_NUM" => {
let name = token
.next()
.ok_or(anyhow!("no name"))
.map(|s| s.to_string())?;
let value = token
.next()
.ok_or(anyhow!("no value"))
.and_then(|s| Ok(f32::from_str(s)))??;
let unit = token
.next()
.ok_or(anyhow!("no unit"))
.map(|s| s.to_string())?;
Ok(Message::Number(Num {name, value, unit}))
}
Some(m) if m == "#MEAS_TEXT" => {
let name = token
.next()
.ok_or(anyhow!("no name"))
.map(|s| s.to_string())?;
let value = token
.next()
.ok_or(anyhow!("no value"))
.map(|s| s.to_string())?;
Ok(Message::Text(Str {name, value}))
}
Some(m) if m == "#INPUT" => {
let message = token
.next()
.ok_or(anyhow!("no message"))
.map(|s| s.to_string())?;
let variants = token
.next()
.ok_or(anyhow!("no variants"))
.map(|s| s.split(",")
.map(|v| v.to_string()
)
.collect::<Vec<_>>())?;
Ok(Message::Input(Input {message, variants}))
}
Some(m) => {
Err(anyhow!("incorrect header {:}", m))
}
None => {
Err(anyhow!("incorrect input {:}", line))
}
};
msg
}
Parsing begins by splitting line into tokens and after that it is checked whether first token represents some header like #MEAS_NUM
or #MEAS_TEXT
from protocol.
Lets suppose it was #MEAS_NUM
. Next it must be checked if name
field is also there.
let name = token
.next()
.ok_or(anyhow!("no name"))
.map(|s| s.to_string())?;
It could be that .next()
produces None
meaning that there isn’t any data here. Parser should report this as an error because we are unable to parse any further. Conveniently .ok_or
can be used to convert from Option
type to Result
type and anyhow!("no name")
creates an Error
with a custom message. We are using String
as data type so .map(|s| s.to_string())
does conversion from &str
to String
. Notice that ?
is used in the end. It unwraps a value if everything was successful or returns Error
if something failed.
Parsing value: f32
is also interesting:
let value = token
.next()
.ok_or(anyhow!("no value"))
.and_then(|s| Ok(f32::from_str(s)))??;
Here we also use and_then
combinator which runs a function if previous result was successful. So it runs only if .next()
has returned at least something. Return value from f32::from_str(s)
is wrapped to Ok()
because without that Error
types do not match. Notice that Rust compiler gives us nice hint here:
40 | .and_then(|s| f32::from_str(s))?;
| ^^^^^^^^^^^^^^^^
| |
| expected struct `anyhow::Error`, found struct `std::num::ParseFloatError`
| help: try using a variant of the expected enum: `Ok(f32::from_str(s))`
|
Finally double ??
is used to unwrap actual value from Result<Result<f32>>
or stop parsing and return with an error.
Tests #
All right, lets try it out with correct inputs first.
let s = "#MEAS_NUM;voltage;20.1;V";
println!("parsed: {:?}", parse(s));
let s = "#MEAS_TEXT;serial;CAFEBABE";
println!("parsed: {:?}", parse(s));
let s = "#INPUT;Is it broken?;YES,NO,MAYBE";
println!("parsed: {:?}", parse(s));
> parsed: Ok(Number(Num { name: "voltage", value: 20.1, unit: "V" }))
> parsed: Ok(Text(Str { name: "serial", value: "CAFEBABE" }))
> parsed: Ok(Input(Input { message: "Is it broken?", variants: ["YES", "NO", "MAYBE"] }))
Very good everything is parsed as expected. And now with incorrect input:
let s = "#MEAS_NUM;voltage;20.1";
println!("parsed: {:?}", parse(s));
let s = "#MEAS_NUM;voltage;twenty;V";
println!("parsed: {:?}", parse(s));
> parsed: Err(no unit)
> parsed: Err(invalid float literal)
As expected errors are returned. However error messages itself are not that good.
What next? Could it be possible to improve such a parser. There is already quite a bit of manual error handling in current version. Maybe somebody has already figured out how to do it in a clean manner and with better error messages?
You betcha! Enter nom - Rust parser combinator framework, which I am going to use in part 2.