fakeformat: from header-only to dirty to continuous integration

Fakeformat

Some time ago I’ve started a little string-formatting rapid prototyping library called fakeformat (fakeformat@github). The motivation was to have .Net String.Format-like string formatting cheaply without having to use any large library. Fakeformat would allow simple eager formatting of strings like that:

ff::format("Hello {1}!").with("world").now();

(note the index starting from 1, like in Boost.Locale).

The first header-only version allowed the index as well as the format specifier index and the standard library elements used in fakeformat to be configured.

The implementation is a wrapper around the standard string stream std::stringstream.

Extending format specifiers

Boost.Locale’s format specifiers allow key-value format modifiers. These may come handy in many formatting tasks. I decided to implement some format modifiers that do not require complex locale information. The first task was to extend the parser of the format specifiers (or placeholders). The first manual approach to parsing deemed a futile task, hence a dedicated component into light. Parsing could be done via a dedicated generated parser or using a simple generated state machine.

Parser generators

There are wonderful parser generators, such as ANTLR, Bison, Lemon or my favourite, Coco/R. After looking at the generated code I decided against all, still hoping to create something minimal, allowing header-only use.

Minimalistic parsers can be made with Boost libraries Spirit or Xpressive, but if someone is using Boost, there’s no need in fakeformat.

State machine compilers

Having sketched the state machine for parsing the format specifiers, I’ve decided to generate a state machine and incorporate that into fakeformat. Once again, the choices are numerable, but I’ve been having some constraints in mind: while generated code doesn’t have to be “clean”, I’d like it to be. I’d also like it to compile without warnings on modern C++ compilers. Another constraint comes from The Pragmatic Programmer: “Don’t Use Wizard Code You Don’t Understand – Wizards can generate reams of code. Make sure you understand all of it before you incorporate it into your project.”.

A typical parser-related state machine compiler is Ragel. The examples failed to compile without warnings. I had some experience using the SMC but decided to use the old, buggy, but still quite clean-code finite state machine generator by Uncle Bob (pdf). It has a very simple syntax, and getting started is quite bumpy. So, for those trying to figure it out, here’s the command line:

java -cp smc.jar smc.Smc format.sm -g smc.generator.cpp.SMCppGenerator -f

where format.sm is the input file. The ready state machine definition after many cycles:

Context FormatContext
FSMName FormatParser
Initial General

pragma Header format_context.h
{
    General
    {
        ReadLeftBrace   ReadingPlaceholder  StartCollectingPlaceholder
    }

    ReadingPlaceholder
    {
        ReadRightBrace  General             { ParsePlaceholder
                                              FinishCollectingPlaceholder }
        ReadLeftBrace   ReadingPlaceholder  StartCollectingPlaceholder
        ReadComma       ReadingKey          { ParsePlaceholder StartKey }
    }

    ReadingKey
    {
        ReadRightBrace  General             FinishCollectingPlaceholder
        ReadLeftBrace   ReadingPlaceholder  StartCollectingPlaceholder
        ReadComma       ReadingKey          { AddKey ContinueCollectingKeys }
        ReadEqualsSign  ReadingValue        { AddKey StartAddingValue }
    }

    ReadingValue
    {
        ReadRightBrace  General             { AddValue 
                                              FinishCollectingPlaceholder }
        ReadLeftBrace   ReadingPlaceholder  StartCollectingPlaceholder
        ReadComma       ReadingKey          AddValue
    }
}

which can be read like

State
{
    Transition    Next_State   Actions
}

Dirty code

While working on the state machine I’ve committed clean-code-sin. My test has been manual, observing fancy colored console output using the library rlutil.

format_sm

The fancy coloring code is implemented in the state machine context file (s. source).

Extended state

The parsing state machine needs extended state so that the parsed tokens may be collected. The collection is implemented inside the state machine context. The driver of the state machine is however external:

FormatParser f;
f.SetString("bla {1} {2}{}{3,bla,blup}{4,k=akj,nl,jsl=22}{{5}} }}{{");	

while (!f.IsAtEnd()) {
	char c=f.Step();
	switch (c) {
			case '{': f.ReadLeftBrace(); break;
			case '}': f.ReadRightBrace(); break;
			case ',': f.ReadComma(); break;
			case '=': f.ReadEqualsSign(); break;
			default : f.Continue(); break;
	}
	std::cout<<c;
}

Incorporating parser into the formatter

While still hoping for a header-only library, I’ve written a lua script for embedding the generated state machine and the prepared token-collecting context class into fakeformat.hpp. This way I could still work on the token collection and use my fakeformat test to restore the functionality that has been broken since I’ve started working on the extension.

Still header only?

Well, the generated parser is meant to be compiled in one translation unit. I haven’t yet come up with a method to translate the text into the template code of the formatter, so now, a fakeformat.cpp has to be compiled. A simple test, instantiating the formatter from a second compilation unit confirms the usability. But with some amount of manual labor, the generated source file can be transferred into the header without functionality loss.

Formatter structure

Coming to the structure of the formatter:

  • The constructor calls the format string parser. Hence, the constructor

    auto fmt=ff::format("{1}{2}")

    is not trivial and preparses the specifiers.

  • The parameter addition methods with and also_with serialize the parameters eagerly and store them for final formatting. Note that each parameter may be formatted differently a number of times.
  • The final string formatting method now replaces the legal format string placeholders with the serialized parameters

Format modifiers

The following format modifiers are currently supported:

So, here’s a snipped of the Catch test:

REQUIRE(ff::format("{1}{1,width=3}{1}{1,w=0}").with(1).now()=="1  111");

Cleaning up

Before cleaning up, I’ve set up Travis-ci again (as for hiberlite and undoredo-cpp) → https://travis-ci.org/d-led/fakeformat, deleted manually generated Visual Studio project files and cleaned up the embedded parser from the fancy colors.

Build Status

Performance

The parsing of strings into integers is now done via the slowest, but safest version without using Boost or C++11. If performance is needed, changing the string_to_key functions can be helpful. There’s a superb article on options.

Try it out in the browser

http://ideone.com/kYcGJV

To do

Any ideas for further features or improvements?

Header only?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.