In my experience, the key to remaining motivated on a hobby programming project is interactivity— The longer I go without the ability to play with what I've written, the more likely that I'll lose interest in the project and stop. This is a particular problem for infrastructure projects, like a game engine, as there are usually a lot of hidden pices that have to fit together before there's much in the way of visible progress.
In particular, we'll need to get some kind of input, output, and logic systems in place before can happen. The final version of these will be quite complicated, and there's quite a large risk of dropping the project due to boredom before these are done. It's worth the effort, then, to start with some temporary I/O systems to use as we develop the versions we'll actually want to use.
The idea of an interactive programming language isn't new. Lisp was one of the first with its Read-Eval-Print Loop (REPL): The main program simply reads a statement, evaluates it, and prints the result in an endless loop. This paradigm has remained popular to the present day, from early home computers that booted into a BASIC interpreter by default to some of the most popular modern programming languages, like Python.
Most modern games also have something similar: the debug console. In it, you can type commands that either inspect hidden state or the state of the game in arbitrary ways as it's running. This allows developers to try different settings quickly before settling on the final values, or to artificially recreate rare situations for testing.
The complicated part of this, of course, is the evaluate step: How do you turn a string of text into stuff happening? This is the piece we'll be discussing in this chapter. For now, all input and output will be via the terminal— Once we have the graphics, keyboard input, and font rendering up and running, we'll revisit this and write an in-game debug console that interacts with the same system.
There are two different interfaces that we need to consider when designing this system: What does the language look like to the user, and how does the programmer integrate the language into their program?
The language should be simple and straightforward. There's no need, for example, for most control structures like loops or conditionals: The user wants to make some specific change now, not write long routines to be executed in the future.
A reasonable model for this is the Bourne shell, where each command is a single line with space-separated arguments. The command will then have the opportunity to interact with the rest of the program, and then produce some textual output for the user's benefit. Additionally, the system should be self-documenting: The user should be able to get a list of available commands as well as documentation about any particular command.
From the programmer's point of view, we want defining commands and evaluating command strings to be as trivial as possible. In particular, defining a new command should look as much like a normal Rust function definition as possible, with the library taking care of the parsing details. It should also be possible to spread command definitions throughout the codebase, so that they can be defined near the code that they interface to.
Global mutable state is problematic in many ways, so the command definitions should accept an &mut
argument to the context they should operate on. Instead of specifying a particular context type at the library level, it should be possible to define different command sets for different context types.
The natural way to invoke a command on a particular context object, is through a trait method which accepts a String
argument and returns another String
representing the result of the operation. There should also be an opportunity for the command to return some structured result data as well.
Objects which can accept commands implement the Dispatch
trait, which provides the dispatch
method. The associated type Dispatch::Status
defines the type of structured information which is returned from each command beside the user-facing message. These two responses are packaged together in the Status
structure; Status::status
will be None
if the user entered an invalid command.
pub trait Dispatch {
type Status;
fn dispatch(&mut self, cmdline: &str)->Status<Self::Status>;
/* ... */
}
pub struct Status<T> {
pub status: Option<T>,
pub message: String
}
To enable definitions spread around the codebase, possibly in disparate crates, I use the linkme
module, which uses linker attributes to combine static
variables from all over the program into a single slice at compile time. Without linkme
, it would be necessary to either define all commands in the same place or have some kind of startup system that collects all of the defined commands at runtime.
Here, this is a slice of command definitions Cmd
, with a function pointer and string fields describing the command name, argument list, and long-form documentation:
pub struct Cmd<T: ?Sized + Dispatch> {
pub key: &'static str,
pub args: &'static str,
pub help: &'static str,
pub cmd: fn(&mut T, &str) -> Status<T::Status>,
}
The Dispatch
trait is always implemented via the derive_dispatch!
macro:
enum CmdResult { ... }
struct MyContext { name: String, ... }
derive_dispatch! { MyContext -> CmdResult;
errfmt(&self, "MyContext({}): ", &self.name)
}
This both defines the Dispatch
implementation and sets up the linkme
distributed slice that holds the commands. The -> CmdResult
clause specifies CmdResult
to be the Dispatch::Status
type; if this is omitted, std::ops::ControlFlow<()>
is used by default. The errfmt
clause is also optional; it defines a prefix that will be added to all error results.
The emitted dispatch
method delegates the main work to a dispatch
free function in the library. This splits off the first portion of the string and uses it as a lookup key. The implementation of the help
command is hardcoded here, so that it has access to the command list; anything else is looked up from the distributed slice:
pub fn dispatch<T:?Sized + Dispatch>(
cmds: &[Cmd<T>],
target: &mut T,
mut buf: &str
) -> Status<T::Status> {
buf = buf.trim();
let (key, args) = buf.split_once(" ").unwrap_or((buf, ""));
let args = args.trim();
if key.eq_ignore_ascii_case("help") { /* return ... */ }
for cmd in cmds.iter() {
if cmd.key.eq_ignore_ascii_case(key) { return (cmd.cmd)(target, args) }
}
Status::err(&*target, format_args!("Unknown command {:?}", key))
}
Instead of defining these Cmd
structures manually, the register!
macro gives the programmer a more natural syntax; a typical invocation is reproduced below. It starts by identifying the target type, which is followed by one or more function definitions that represent commands. The documentation comments are required, and are made available to the user via the help
command. The first argument (with no specified type) is a mutable reference to the context object, and the remaining arguments are parsed from the command string by the library.
// struct Roster(BTreeMap<String, BTreeSet<String>>);
register!{ Roster =>
/// Terminate the program
quit(_roster) { Status::halt("") }
/// Add employee to department
hire(roster, employee:String, dept:String) {
roster.0.entry(dept).or_default().insert(employee);
Status::silent_ok()
}
/* ... */
}
Each command returns a Status
structure which matches the the context object's Dispatch
implementation. Several constructors are provided for the most common situations; in each case, message
is any value which implements Display
:
Status::err(&context, message)
represents a parsing error, and will prepend context
's error format to the messageStatus::silent(status)
represents a completed command with no user-facing messageStatus::msg(status, message)
represents a completed command with the given result messageControlFlow<()>
:Status::ok(message)
represents a Continue
with the given messageStatus::silent_ok()
represents a Continue
with no user-facing messageStatus::halt(message)
represents a Break
with the given messagenom
, a parser-combinator library, to parse the command arguments. Each type that can appear as an argument implements the ParseArg
trait, which provides a nom
parsing function to construct the type:
pub type ArgParser<T> = Box<
dyn FnOnce(&str)->IResult<&str, T, nom::error::Error<&str>>
>;
pub trait ParseArg: Sized {
/// Attempt to parse this argument from the start of `args`.
/// On success, `*args` points to the start of the next argument.
/// On failure, `*args` is unchanged.
fn parse_arg<'i>(args: &mut &'i str)->Result<Self, String> {
/* default implementation ... */
}
fn parser()->ArgParser<Self>;
}
The programmer can add support for their custom types by implementing ParseArg::parser()
, and the library provides implementations for several primitive types:
String
will parse a single bare word or anything enclosed in either single '
or double "
quotes. Currently, no escape sequences are supported.Vec<T>
will parse 0 or more space-delimited T
s surrounded by square brackets []
.Additionally, a number of newtypes are provided with alternate parsing rules:
Rest(pub String)
will consume and return the entirety of the remaining command line. This is particularly useful for delegating part of the command processing to a second dispatch()
call, possibly with a different context object.Braces<T>(pub Vec<T>)
will parse a sequence of T
s surrounded by curly braces {}
Parens<T>(pub Vec<T>)
will parse a sequence of T
s surrounded by parentheses ()
Testing this crate comes in two parts: unit tests that verify individual elements and an intractive example that verifies the design is actually useful.
Unit tests for the different argument parsing routines all follow essentially the same pattern: You list a bunch of strings, and for each you either assert that they produce a parse error or that they parse to a particular, specific value. We can reduce the boilerplate required for these tests by defining a macro that takes care of it for us. Here, for example, is the unit test for i32
parsing:
#[test]
fn parse_i32() {
assert_parse!{i32:
"42" => 42;
"0x40" => 64;
"42\n" => 42;
"0x40 " => 64;
"-42" => -42;
"-0x40" => -64;
! "hello";
! "";
! "2x4";
}
}
This specifies that the type we're testing is i32
, followed by a bunch of clauses in two different forms:
"str" => expected
asserts that the input "str"
parses to the result of the expression expected
, and! "str"
asserts that the input "str"
results in a parse error.The macro is written in two parts: First, the assert_parse_single!
macro describes the assertions that should be emitted for each clause. It has two match arms, one for each clause. In addition to the text of the clause, these also take the type being tested as the first argument:
#[macro_export]
macro_rules! assert_parse_single {
($ty:ty: !$inp:literal) => {
assert!{ <$ty as $crate::ParseArg>::parse_arg(&mut $inp).is_err() }
};
($ty:ty: $inp:literal => $expected:expr) => {
assert_eq!{ Ok($expected), <$ty as $crate::ParseArg>::parse_arg(&mut $inp) };
};
}
With just this macro, the example above could be written like this, which still has more repetition than we'd like:
#[test]
fn parse_i32() {
assert_parse_single!{i32: "42" => 42 };
assert_parse_single!{i32: "0x40" => 64 };
assert_parse_single!{i32: "42\n" => 42 };
assert_parse_single!{i32: "0x40 " => 64 };
assert_parse_single!{i32: "-42" => -42 };
assert_parse_single!{i32: "-0x40" => -64 };
assert_parse_single!{i32: ! "hello" };
assert_parse_single!{i32: ! "" };
assert_parse_single!{i32: ! "2x4" };
}
Collapsing all of these into a single macro invocation that only mentions the type being tested once is the job of a so-called tt
-muncher macro. The idea behind this style of macro is that it fills an accumulator one token at a time until it has collected an entire clause, and then it processes that clause.
As this is a generally useful technique and assert_parse!
is a relatively straightforward example, let's walk through how it works on a sample input of:
assert_parse!{i32:
"42" => 42;
! "hello";
}
The last match arm in assert_parse!
is a catch-all that adds an empty accumulator @()
if there isn't one present; after the first expansion, the code looks like this:
/* Applying:
// No accumulator present, add one and get started
($ty:ty: $($rest:tt)*) => {
$crate::assert_parse!{$ty: @() $($rest)*}
}
*/
assert_parse!{i32: @()
"42" => 42;
! "hello";
}
From this point, the macro will collect tokens into the accumulator until there's a semicolon ;
in the first position.
/* Applying:
// Building an assertion
($ty:ty: @($($tok:tt)*) $next:tt $($rest:tt)* ) => {
$crate::assert_parse!{$ty: @($($tok)* $next) $($rest)*}
};
*/
// Once...
assert_parse!{i32: @("42")
=> 42;
! "hello";
}
// Twice...
assert_parse!{i32: @("42" =>)
42;
! "hello";
}
// Three times...
assert_parse!{i32: @("42" => 42)
;
! "hello";
}
At this point, the semicolon causes a different match arm to apply. This one empties the accumulator and emits a call to assert_parse_single!
:
/* Applying:
// End of one assertion, more to come
($ty:ty: @($($tok:tt)*) ; $($rest:tt)* ) => {
$crate::assert_parse_single!($ty: $($tok)*);
$crate::assert_parse!{$ty: @() $($rest)*}
};
*/
assert_parse_single!(i32: "42" => 42);
assert_parse!{i32: @()
! "hello";
}
This process then repeats, gathering ! "hello
into the accumulator in two steps before the semicolon triggers another invocation of assert_parse_single!
. At this point, the emitted code looks like this:
assert_parse_single!(i32: "42" => 42);
assert_parse_single!(i32: ! "hello");
assert_parse!{i32: @()}
Finally, the base-case match arm recognizes that there's nothing left to do and stops the process:
/* Applying:
// Nothing left to do
($ty:ty: @()) => {};
*/
assert_parse_single!(i32: "42" => 42);
assert_parse_single!(i32: ! "hello");
Unit tests can verify that all of the individual pieces work as expected, but the ultimate test of a library is whether or not it's ergonomic to use for its intended purpose. In this case, that means writing an interactive, command driven program. One of the exercises from the Rust book asks for exactly that:
Using a hash map and vectors, create a text interface to allow a user to add employee names to a department in a company. For example, “Add Sally to Engineering” or “Add Amir to Sales.” Then let the user retrieve a list of all people in a department or all people in the company by department, sorted alphabetically.
Our core datastructure is a Roster
, which contains a map of department names to sets of employee names. As we want to be reporting results in alphabetical order, BTree maps and sets are an appropriate choice for the containers:
#[derive(Default)]
struct Roster(BTreeMap<String, BTreeSet<String>>);
derive_dispatch!{Roster}
Next, we need to set up our REPL so that we can interact with the program. We create a Roster
to work with, and then loop forever printing a prompt, reading input, and dispatching on the read text. We also define a quit
command to gracefully exit the program.
register!{ Roster =>
/// Terminate the program
quit(_roster) { Status::halt("") }
}
fn main() {
let mut roster = Roster::default();
loop {
eprint!("> ");
let mut cmd:String = String::default();
std::io::stdin().read_line(&mut cmd).expect("Read error");
let cmd_out = roster.dispatch(&cmd);
println!("{}", cmd_out.message);
if let Some(ControlFlow::Break(_)) = cmd_out.status { break; }
}
}
At this point, a session with the program might look like this:
> help
Available commands:
Command Desc
------- ----
quit Terminate the program
> help quit
quit
Terminate the program
> quit
With this framework in place, we need to write commands to perform the actions specified in the brief. For example, hiring and firing an employee:
register!{ Roster =>
/// Hire an employee
/// employee: Name of employee to hire
/// dept: Department name
hire(roster, employee:String, dept:String) {
roster.0.entry(dept).or_default().insert(employee);
Status::silent_ok()
}
/// Fire an employee
/// employee: Name of employee to fire
/// dept: Department name
fire(roster, employee:String, dept:String) {
match roster.0.entry(dept) {
Entry::Vacant(e) => Status::ok(format_args!("Unknown department {:?}", e.into_key())),
Entry::Occupied(mut e) => match e.get_mut().remove(&employee) {
true => Status::silent_ok(),
false => Status::ok(format_args!("No employee named {:?} in department", employee))
}
}
}
}
The display routines are similar, and can be specified in a separate register!
block. In the end, we have a program which allows sessions like this one:
> help
Available commands:
Command Desc
------- ----
depts List all departments
fire Fire an employee
hire Hire an employee
print Prints employees in a department
print_all Prints all employees
quit Terminate the program
> hire Alice Engineering
> hire Bob PR
> depts
Engineering
PR
> print_all
Engineering
Alice
PR
Bob
> fire Bob Engineering
No employee named "Bob" in department
> fire Bob PR
> print_all
Engineering
Alice
PR
>
((TODO: Relate to future work; can't do that until I know what the future work is...))