2-71828.com

In my experience, the key to remaining motivated on a hobby programming project is interactivity— The longer I go without the ability to play with what I've written, the more likely that I'll lose interest in the project and stop. This is a particular problem for infrastructure projects, like a game engine, as there are usually a lot of hidden pices that have to fit together before there's much in the way of visible progress.

In particular, we'll need to get some kind of input, output, and logic systems in place before can happen. The final version of these will be quite complicated, and there's quite a large risk of dropping the project due to boredom before these are done. It's worth the effort, then, to start with some temporary I/O systems to use as we develop the versions we'll actually want to use.

The idea of an interactive programming language isn't new. Lisp was one of the first with its Read-Eval-Print Loop (REPL): The main program simply reads a statement, evaluates it, and prints the result in an endless loop. This paradigm has remained popular to the present day, from early home computers that booted into a BASIC interpreter by default to some of the most popular modern programming languages, like Python.

Most modern games also have something similar: the debug console. In it, you can type commands that either inspect hidden state or the state of the game in arbitrary ways as it's running. This allows developers to try different settings quickly before settling on the final values, or to artificially recreate rare situations for testing.

The complicated part of this, of course, is the evaluate step: How do you turn a string of text into stuff happening? This is the piece we'll be discussing in this chapter. For now, all input and output will be via the terminal— Once we have the graphics, keyboard input, and font rendering up and running, we'll revisit this and write an in-game debug console that interacts with the same system.

Design Goals

There are two different interfaces that we need to consider when designing this system: What does the language look like to the user, and how does the programmer integrate the language into their program?

User's View

The language should be simple and straightforward. There's no need, for example, for most control structures like loops or conditionals: The user wants to make some specific change now, not write long routines to be executed in the future.

A reasonable model for this is the Bourne shell, where each command is a single line with space-separated arguments. The command will then have the opportunity to interact with the rest of the program, and then produce some textual output for the user's benefit. Additionally, the system should be self-documenting: The user should be able to get a list of available commands as well as documentation about any particular command.

Programmer's View

From the programmer's point of view, we want defining commands and evaluating command strings to be as trivial as possible. In particular, defining a new command should look as much like a normal Rust function definition as possible, with the library taking care of the parsing details. It should also be possible to spread command definitions throughout the codebase, so that they can be defined near the code that they interface to.

Global mutable state is problematic in many ways, so the command definitions should accept an &mut argument to the context they should operate on. Instead of specifying a particular context type at the library level, it should be possible to define different command sets for different context types.

The natural way to invoke a command on a particular context object, is through a trait method which accepts a String argument and returns another String representing the result of the operation. There should also be an opportunity for the command to return some structured result data as well.

Implementation

Objects which can accept commands implement the Dispatch trait, which provides the dispatch method. The associated type Dispatch::Status defines the type of structured information which is returned from each command beside the user-facing message. These two responses are packaged together in the Status structure; Status::status will be None if the user entered an invalid command.

pub trait Dispatch {
    type Status;
    fn dispatch(&mut self, cmdline: &str)->Status<Self::Status>;

    /* ... */
}

pub struct Status<T> {
    pub status: Option<T>,
    pub message: String
}

To enable definitions spread around the codebase, possibly in disparate crates, I use the linkme module, which uses linker attributes to combine static variables from all over the program into a single slice at compile time. Without linkme, it would be necessary to either define all commands in the same place or have some kind of startup system that collects all of the defined commands at runtime.

Here, this is a slice of command definitions Cmd, with a function pointer and string fields describing the command name, argument list, and long-form documentation:

pub struct Cmd<T: ?Sized + Dispatch> {
    pub key: &'static str,
    pub args: &'static str,
    pub help: &'static str,
    pub cmd: fn(&mut T, &str) -> Status<T::Status>,
}

Implementing Dispatch

The Dispatch trait is always implemented via the derive_dispatch! macro:

enum CmdResult { ... }
struct MyContext { name: String, ... }

derive_dispatch! { MyContext -> CmdResult;
    errfmt(&self, "MyContext({}): ", &self.name)
}

This both defines the Dispatch implementation and sets up the linkme distributed slice that holds the commands. The -> CmdResult clause specifies CmdResult to be the Dispatch::Status type; if this is omitted, std::ops::ControlFlow<()> is used by default. The errfmt clause is also optional; it defines a prefix that will be added to all error results.

The emitted dispatch method delegates the main work to a dispatch free function in the library. This splits off the first portion of the string and uses it as a lookup key. The implementation of the help command is hardcoded here, so that it has access to the command list; anything else is looked up from the distributed slice:

pub fn dispatch<T:?Sized + Dispatch>(
    cmds: &[Cmd<T>],
    target: &mut T,
    mut buf: &str
) -> Status<T::Status> {
    buf = buf.trim();
    let (key, args) = buf.split_once(" ").unwrap_or((buf, ""));
    let args = args.trim();

    if key.eq_ignore_ascii_case("help") { /* return ... */ }

    for cmd in cmds.iter() {
        if cmd.key.eq_ignore_ascii_case(key) { return (cmd.cmd)(target, args) }
    }
    Status::err(&*target, format_args!("Unknown command {:?}", key))
}

Defining Commands

Instead of defining these Cmd structures manually, the register! macro gives the programmer a more natural syntax; a typical invocation is reproduced below. It starts by identifying the target type, which is followed by one or more function definitions that represent commands. The documentation comments are required, and are made available to the user via the help command. The first argument (with no specified type) is a mutable reference to the context object, and the remaining arguments are parsed from the command string by the library.

// struct Roster(BTreeMap<String, BTreeSet<String>>);

register!{ Roster =>
    /// Terminate the program
    quit(_roster) { Status::halt("") }

    /// Add employee to department
    hire(roster, employee:String, dept:String) {
        roster.0.entry(dept).or_default().insert(employee);
        Status::silent_ok()
    }

    /* ... */
}

Each command returns a Status structure which matches the the context object's Dispatch implementation. Several constructors are provided for the most common situations; in each case, message is any value which implements Display:

Defined for all status types:
- Status::err(&context, message) represents a parsing error, and will prepend context's error format to the message
- Status::silent(status) represents a completed command with no user-facing message
- Status::msg(status, message) represents a completed command with the given result message
Defined for the default status type ControlFlow<()>:
- Status::ok(message) represents a Continue with the given message
- Status::silent_ok() represents a Continue with no user-facing message
- Status::halt(message) represents a Break with the given message

Argument Parsing

nom, a parser-combinator library, to parse the command arguments. Each type that can appear as an argument implements the ParseArg trait, which provides a nom parsing function to construct the type:

pub type ArgParser<T> = Box<
    dyn FnOnce(&str)->IResult<&str, T, nom::error::Error<&str>>
>;

pub trait ParseArg: Sized {
    /// Attempt to parse this argument from the start of `args`.
    ///   On success, `*args` points to the start of the next argument.
    ///   On failure, `*args` is unchanged.
    fn parse_arg<'i>(args: &mut &'i str)->Result<Self, String> {
        /* default implementation ... */
    }

    fn parser()->ArgParser<Self>;
}

The programmer can add support for their custom types by implementing ParseArg::parser(), and the library provides implementations for several primitive types:

String will parse a single bare word or anything enclosed in either single ' or double " quotes. Currently, no escape sequences are supported.
Integer types will parse both decimal and hexadecimal representations
Vec<T> will parse 0 or more space-delimited Ts surrounded by square brackets [].

Additionally, a number of newtypes are provided with alternate parsing rules:

Rest(pub String) will consume and return the entirety of the remaining command line. This is particularly useful for delegating part of the command processing to a second dispatch() call, possibly with a different context object.
Braces<T>(pub Vec<T>) will parse a sequence of Ts surrounded by curly braces {}
Parens<T>(pub Vec<T>) will parse a sequence of Ts surrounded by parentheses ()

Testing

Testing this crate comes in two parts: unit tests that verify individual elements and an intractive example that verifies the design is actually useful.

Unit Tests: Argument Parsing

Unit tests for the different argument parsing routines all follow essentially the same pattern: You list a bunch of strings, and for each you either assert that they produce a parse error or that they parse to a particular, specific value. We can reduce the boilerplate required for these tests by defining a macro that takes care of it for us. Here, for example, is the unit test for i32 parsing:

#[test]
fn parse_i32() {
    assert_parse!{i32:
        "42"    =>  42;
        "0x40"  =>  64;
        "42\n"  =>  42;
        "0x40 " =>  64;
        "-42"   => -42;
        "-0x40" => -64;
      ! "hello";
      ! "";
      ! "2x4";
    }
}

This specifies that the type we're testing is i32, followed by a bunch of clauses in two different forms:

"str" => expected asserts that the input "str" parses to the result of the expression expected, and
! "str" asserts that the input "str" results in a parse error.

The macro is written in two parts: First, the assert_parse_single! macro describes the assertions that should be emitted for each clause. It has two match arms, one for each clause. In addition to the text of the clause, these also take the type being tested as the first argument:

#[macro_export]
macro_rules! assert_parse_single {
    ($ty:ty: !$inp:literal) => {
        assert!{ <$ty as $crate::ParseArg>::parse_arg(&mut $inp).is_err() }
    };
    ($ty:ty: $inp:literal => $expected:expr) => {
        assert_eq!{ Ok($expected), <$ty as $crate::ParseArg>::parse_arg(&mut $inp) };
    };
}

With just this macro, the example above could be written like this, which still has more repetition than we'd like:

#[test]
fn parse_i32() {
    assert_parse_single!{i32:   "42"    =>  42 };
    assert_parse_single!{i32:   "0x40"  =>  64 };
    assert_parse_single!{i32:   "42\n"  =>  42 };
    assert_parse_single!{i32:   "0x40 " =>  64 };
    assert_parse_single!{i32:   "-42"   => -42 };
    assert_parse_single!{i32:   "-0x40" => -64 };
    assert_parse_single!{i32: ! "hello"        };
    assert_parse_single!{i32: ! ""             };
    assert_parse_single!{i32: ! "2x4"          };
}

Collapsing all of these into a single macro invocation that only mentions the type being tested once is the job of a so-called tt-muncher macro. The idea behind this style of macro is that it fills an accumulator one token at a time until it has collected an entire clause, and then it processes that clause.

As this is a generally useful technique and assert_parse! is a relatively straightforward example, let's walk through how it works on a sample input of:

assert_parse!{i32:
    "42"    =>  42;
  ! "hello";
}

The last match arm in assert_parse! is a catch-all that adds an empty accumulator @() if there isn't one present; after the first expansion, the code looks like this:

/* Applying:
    // No accumulator present, add one and get started
    ($ty:ty: $($rest:tt)*) => {
        $crate::assert_parse!{$ty: @() $($rest)*}
    }
*/

assert_parse!{i32: @()
    "42"    =>  42;
  ! "hello";
}

From this point, the macro will collect tokens into the accumulator until there's a semicolon ; in the first position.

/* Applying:
    // Building an assertion
    ($ty:ty: @($($tok:tt)*) $next:tt $($rest:tt)* ) => {
        $crate::assert_parse!{$ty: @($($tok)* $next) $($rest)*}
    };
*/

// Once...
assert_parse!{i32: @("42")
    =>  42;
  ! "hello";
}

// Twice...
assert_parse!{i32: @("42" =>)
  42;
  ! "hello";
}

// Three times...
assert_parse!{i32: @("42" => 42)
  ;
  ! "hello";
}

At this point, the semicolon causes a different match arm to apply. This one empties the accumulator and emits a call to assert_parse_single!:

/* Applying:
    // End of one assertion, more to come
    ($ty:ty: @($($tok:tt)*) ; $($rest:tt)* ) => {
        $crate::assert_parse_single!($ty: $($tok)*);
        $crate::assert_parse!{$ty: @() $($rest)*}
    };
*/

assert_parse_single!(i32: "42" => 42);
assert_parse!{i32: @()
  ! "hello";
}

This process then repeats, gathering ! "hello into the accumulator in two steps before the semicolon triggers another invocation of assert_parse_single!. At this point, the emitted code looks like this:

assert_parse_single!(i32: "42" => 42);
assert_parse_single!(i32: ! "hello");
assert_parse!{i32: @()}

Finally, the base-case match arm recognizes that there's nothing left to do and stops the process:

/* Applying:
    // Nothing left to do
    ($ty:ty: @()) => {};
*/

assert_parse_single!(i32: "42" => 42);
assert_parse_single!(i32: ! "hello");

Example: Employee Management

Unit tests can verify that all of the individual pieces work as expected, but the ultimate test of a library is whether or not it's ergonomic to use for its intended purpose. In this case, that means writing an interactive, command driven program. One of the exercises from the Rust book asks for exactly that:

Using a hash map and vectors, create a text interface to allow a user to add employee names to a department in a company. For example, “Add Sally to Engineering” or “Add Amir to Sales.” Then let the user retrieve a list of all people in a department or all people in the company by department, sorted alphabetically.

Our core datastructure is a Roster, which contains a map of department names to sets of employee names. As we want to be reporting results in alphabetical order, BTree maps and sets are an appropriate choice for the containers:

#[derive(Default)]
struct Roster(BTreeMap<String, BTreeSet<String>>);
derive_dispatch!{Roster}

Next, we need to set up our REPL so that we can interact with the program. We create a Roster to work with, and then loop forever printing a prompt, reading input, and dispatching on the read text. We also define a quit command to gracefully exit the program.

register!{ Roster =>
    /// Terminate the program
    quit(_roster) { Status::halt("") }
}

fn main() {
    let mut roster = Roster::default();
    loop {
        eprint!("> ");
        let mut cmd:String = String::default();
        std::io::stdin().read_line(&mut cmd).expect("Read error");
        let cmd_out = roster.dispatch(&cmd);
        println!("{}", cmd_out.message);
        if let Some(ControlFlow::Break(_)) = cmd_out.status { break; }
    }
}

At this point, a session with the program might look like this:

> help
Available commands:

  Command       Desc
  -------       ----
  quit          Terminate the program

> help quit
quit

 Terminate the program

> quit

With this framework in place, we need to write commands to perform the actions specified in the brief. For example, hiring and firing an employee:

register!{ Roster =>
    /// Hire an employee
    ///   employee: Name of employee to hire
    ///   dept:     Department name
    hire(roster, employee:String, dept:String) {
        roster.0.entry(dept).or_default().insert(employee);
        Status::silent_ok()
    }

    /// Fire an employee
    ///   employee: Name of employee to fire
    ///   dept:     Department name
    fire(roster, employee:String, dept:String) {
        match roster.0.entry(dept) {
            Entry::Vacant(e) => Status::ok(format_args!("Unknown department {:?}", e.into_key())),
            Entry::Occupied(mut e) => match e.get_mut().remove(&employee) {
                true => Status::silent_ok(),
                false => Status::ok(format_args!("No employee named {:?} in department", employee))
            }
        }
    }
}

The display routines are similar, and can be specified in a separate register! block. In the end, we have a program which allows sessions like this one:

> help
Available commands:

  Command       Desc
  -------       ----
  depts         List all departments
  fire          Fire an employee
  hire          Hire an employee
  print         Prints employees in a department
  print_all     Prints all employees
  quit          Terminate the program

> hire Alice Engineering

> hire Bob PR

> depts
Engineering
PR

> print_all
Engineering
    Alice

PR
    Bob


> fire Bob Engineering
No employee named "Bob" in department
> fire Bob PR

> print_all
Engineering
    Alice

PR


>