Contents
Fakeformat
Some time ago I’ve started a little string-formatting rapid prototyping library called fakeformat (fakeformat@github). The motivation was to have .Net String.Format-like string formatting cheaply without having to use any large library. Fakeformat would allow simple eager formatting of strings like that:
ff::format("Hello {1}!").with("world").now();
(note the index starting from 1, like in Boost.Locale).
The first header-only version allowed the index as well as the format specifier index and the standard library elements used in fakeformat to be configured.
The implementation is a wrapper around the standard string stream std::stringstream
.
Extending format specifiers
Boost.Locale’s format specifiers allow key-value format modifiers. These may come handy in many formatting tasks. I decided to implement some format modifiers that do not require complex locale information. The first task was to extend the parser of the format specifiers (or placeholders). The first manual approach to parsing deemed a futile task, hence a dedicated component into light. Parsing could be done via a dedicated generated parser or using a simple generated state machine.
Parser generators
There are wonderful parser generators, such as ANTLR, Bison, Lemon or my favourite, Coco/R. After looking at the generated code I decided against all, still hoping to create something minimal, allowing header-only use.
Minimalistic parsers can be made with Boost libraries Spirit or Xpressive, but if someone is using Boost, there’s no need in fakeformat.
State machine compilers
Having sketched the state machine for parsing the format specifiers, I’ve decided to generate a state machine and incorporate that into fakeformat. Once again, the choices are numerable, but I’ve been having some constraints in mind: while generated code doesn’t have to be “clean”, I’d like it to be. I’d also like it to compile without warnings on modern C++ compilers. Another constraint comes from The Pragmatic Programmer: “Don’t Use Wizard Code You Don’t Understand – Wizards can generate reams of code. Make sure you understand all of it before you incorporate it into your project.”.
A typical parser-related state machine compiler is Ragel. The examples failed to compile without warnings. I had some experience using the SMC but decided to use the old, buggy, but still quite clean-code finite state machine generator by Uncle Bob (pdf). It has a very simple syntax, and getting started is quite bumpy. So, for those trying to figure it out, here’s the command line:
java -cp smc.jar smc.Smc format.sm -g smc.generator.cpp.SMCppGenerator -f
where format.sm
is the input file. The ready state machine definition after many cycles:
Context FormatContext FSMName FormatParser Initial General pragma Header format_context.h { General { ReadLeftBrace ReadingPlaceholder StartCollectingPlaceholder } ReadingPlaceholder { ReadRightBrace General { ParsePlaceholder FinishCollectingPlaceholder } ReadLeftBrace ReadingPlaceholder StartCollectingPlaceholder ReadComma ReadingKey { ParsePlaceholder StartKey } } ReadingKey { ReadRightBrace General FinishCollectingPlaceholder ReadLeftBrace ReadingPlaceholder StartCollectingPlaceholder ReadComma ReadingKey { AddKey ContinueCollectingKeys } ReadEqualsSign ReadingValue { AddKey StartAddingValue } } ReadingValue { ReadRightBrace General { AddValue FinishCollectingPlaceholder } ReadLeftBrace ReadingPlaceholder StartCollectingPlaceholder ReadComma ReadingKey AddValue } }
which can be read like
State { Transition Next_State Actions }
Dirty code
While working on the state machine I’ve committed clean-code-sin. My test has been manual, observing fancy colored console output using the library rlutil.
The fancy coloring code is implemented in the state machine context file (s. source).
Extended state
The parsing state machine needs extended state so that the parsed tokens may be collected. The collection is implemented inside the state machine context. The driver of the state machine is however external:
FormatParser f; f.SetString("bla {1} {2}{}{3,bla,blup}{4,k=akj,nl,jsl=22}{{5}} }}{{"); while (!f.IsAtEnd()) { char c=f.Step(); switch (c) { case '{': f.ReadLeftBrace(); break; case '}': f.ReadRightBrace(); break; case ',': f.ReadComma(); break; case '=': f.ReadEqualsSign(); break; default : f.Continue(); break; } std::cout<<c; }
Incorporating parser into the formatter
While still hoping for a header-only library, I’ve written a lua script for embedding the generated state machine and the prepared token-collecting context class into fakeformat.hpp
. This way I could still work on the token collection and use my fakeformat test to restore the functionality that has been broken since I’ve started working on the extension.
Still header only?
Well, the generated parser is meant to be compiled in one translation unit. I haven’t yet come up with a method to translate the text into the template code of the formatter, so now, a But with some amount of manual labor, the generated source file can be transferred into the header without functionality loss.
Formatter structure
Coming to the structure of the formatter:
-
The constructor calls the format string parser. Hence, the constructor
auto fmt=ff::format("{1}{2}")
is not trivial and preparses the specifiers.
- The parameter addition methods
with
andalso_with
serialize the parameters eagerly and store them for final formatting. Note that each parameter may be formatted differently a number of times. - The final string formatting method
now
replaces the legal format string placeholders with the serialized parameters
Format modifiers
The following format modifiers are currently supported:
- num or number with parameters: hex, oct, sci or scientific, fix or fixed
- width or w (number parameter required)
- left
- right
- precision (number parameter required)
- fill (a single character is allowed. Literal
}
is not supported yet)
So, here’s a snipped of the Catch test:
REQUIRE(ff::format("{1}{1,width=3}{1}{1,w=0}").with(1).now()=="1 111");
Cleaning up
Before cleaning up, I’ve set up Travis-ci again (as for hiberlite and undoredo-cpp) → https://travis-ci.org/d-led/fakeformat, deleted manually generated Visual Studio project files and cleaned up the embedded parser from the fancy colors.
Performance
The parsing of strings into integers is now done via the slowest, but safest version without using Boost or C++11. If performance is needed, changing the string_to_key
functions can be helpful. There’s a superb article on options.
Try it out in the browser
To do
Any ideas for further features or improvements?
Header only?