Author Archives: DLed

fakeformat: from header-only to dirty to continuous integration

Fakeformat

Some time ago I’ve started a little string-formatting rapid prototyping library called fakeformat (fakeformat@github). The motivation was to have .Net String.Format-like string formatting cheaply without having to use any large library. Fakeformat would allow simple eager formatting of strings like that:

ff::format("Hello {1}!").with("world").now();

(note the index starting from 1, like in Boost.Locale).

The first header-only version allowed the index as well as the format specifier index and the standard library elements used in fakeformat to be configured.

The implementation is a wrapper around the standard string stream std::stringstream.

Extending format specifiers

Boost.Locale’s format specifiers allow key-value format modifiers. These may come handy in many formatting tasks. I decided to implement some format modifiers that do not require complex locale information. The first task was to extend the parser of the format specifiers (or placeholders). The first manual approach to parsing deemed a futile task, hence a dedicated component into light. Parsing could be done via a dedicated generated parser or using a simple generated state machine.

Parser generators

There are wonderful parser generators, such as ANTLR, Bison, Lemon or my favourite, Coco/R. After looking at the generated code I decided against all, still hoping to create something minimal, allowing header-only use.

Minimalistic parsers can be made with Boost libraries Spirit or Xpressive, but if someone is using Boost, there’s no need in fakeformat.

State machine compilers

Having sketched the state machine for parsing the format specifiers, I’ve decided to generate a state machine and incorporate that into fakeformat. Once again, the choices are numerable, but I’ve been having some constraints in mind: while generated code doesn’t have to be “clean”, I’d like it to be. I’d also like it to compile without warnings on modern C++ compilers. Another constraint comes from The Pragmatic Programmer: “Don’t Use Wizard Code You Don’t Understand – Wizards can generate reams of code. Make sure you understand all of it before you incorporate it into your project.”.

A typical parser-related state machine compiler is Ragel. The examples failed to compile without warnings. I had some experience using the SMC but decided to use the old, buggy, but still quite clean-code finite state machine generator by Uncle Bob (pdf). It has a very simple syntax, and getting started is quite bumpy. So, for those trying to figure it out, here’s the command line:

java -cp smc.jar smc.Smc format.sm -g smc.generator.cpp.SMCppGenerator -f

where format.sm is the input file. The ready state machine definition after many cycles:

Context FormatContext
FSMName FormatParser
Initial General

pragma Header format_context.h
{
    General
    {
        ReadLeftBrace   ReadingPlaceholder  StartCollectingPlaceholder
    }

    ReadingPlaceholder
    {
        ReadRightBrace  General             { ParsePlaceholder
                                              FinishCollectingPlaceholder }
        ReadLeftBrace   ReadingPlaceholder  StartCollectingPlaceholder
        ReadComma       ReadingKey          { ParsePlaceholder StartKey }
    }

    ReadingKey
    {
        ReadRightBrace  General             FinishCollectingPlaceholder
        ReadLeftBrace   ReadingPlaceholder  StartCollectingPlaceholder
        ReadComma       ReadingKey          { AddKey ContinueCollectingKeys }
        ReadEqualsSign  ReadingValue        { AddKey StartAddingValue }
    }

    ReadingValue
    {
        ReadRightBrace  General             { AddValue 
                                              FinishCollectingPlaceholder }
        ReadLeftBrace   ReadingPlaceholder  StartCollectingPlaceholder
        ReadComma       ReadingKey          AddValue
    }
}

which can be read like

State
{
    Transition    Next_State   Actions
}

Dirty code

While working on the state machine I’ve committed clean-code-sin. My test has been manual, observing fancy colored console output using the library rlutil.

format_sm

The fancy coloring code is implemented in the state machine context file (s. source).

Extended state

The parsing state machine needs extended state so that the parsed tokens may be collected. The collection is implemented inside the state machine context. The driver of the state machine is however external:

FormatParser f;
f.SetString("bla {1} {2}{}{3,bla,blup}{4,k=akj,nl,jsl=22}{{5}} }}{{");	

while (!f.IsAtEnd()) {
	char c=f.Step();
	switch (c) {
			case '{': f.ReadLeftBrace(); break;
			case '}': f.ReadRightBrace(); break;
			case ',': f.ReadComma(); break;
			case '=': f.ReadEqualsSign(); break;
			default : f.Continue(); break;
	}
	std::cout<<c;
}

Incorporating parser into the formatter

While still hoping for a header-only library, I’ve written a lua script for embedding the generated state machine and the prepared token-collecting context class into fakeformat.hpp. This way I could still work on the token collection and use my fakeformat test to restore the functionality that has been broken since I’ve started working on the extension.

Still header only?

Well, the generated parser is meant to be compiled in one translation unit. I haven’t yet come up with a method to translate the text into the template code of the formatter, so now, a fakeformat.cpp has to be compiled. A simple test, instantiating the formatter from a second compilation unit confirms the usability. But with some amount of manual labor, the generated source file can be transferred into the header without functionality loss.

Formatter structure

Coming to the structure of the formatter:

  • The constructor calls the format string parser. Hence, the constructor

    auto fmt=ff::format("{1}{2}")

    is not trivial and preparses the specifiers.

  • The parameter addition methods with and also_with serialize the parameters eagerly and store them for final formatting. Note that each parameter may be formatted differently a number of times.
  • The final string formatting method now replaces the legal format string placeholders with the serialized parameters

Format modifiers

The following format modifiers are currently supported:

So, here’s a snipped of the Catch test:

REQUIRE(ff::format("{1}{1,width=3}{1}{1,w=0}").with(1).now()=="1  111");

Cleaning up

Before cleaning up, I’ve set up Travis-ci again (as for hiberlite and undoredo-cpp) → https://travis-ci.org/d-led/fakeformat, deleted manually generated Visual Studio project files and cleaned up the embedded parser from the fancy colors.

Build Status

Performance

The parsing of strings into integers is now done via the slowest, but safest version without using Boost or C++11. If performance is needed, changing the string_to_key functions can be helpful. There’s a superb article on options.

Try it out in the browser

http://ideone.com/kYcGJV

To do

Any ideas for further features or improvements?

Header only?

Setting travis-ci with github for a c++ project for the first time

Intro

Here, I’m trying to use travis-ci, c++, github, CATCH, premake together with my undoredo-cpp library to reduce entropy, try out continuous integration and behavior-style tests.

As a “one-man show” programmer at least at home, I’m trying to keep the discipline of writing tests first. “Growing Object-Oriented Software Guided by Tests” is perhaps a good, although a comparatively dry book for those who are not yet convinced. The blog post by Phil Nash about his latest version of the c++ single-header testing framework CATCH, moved me to finally get my hands on the free continuous integration service travis-ci, along with CATCH with a goal to rewrite the tests for my undo-redo c++ adventure in a more behavior-driven-style.

The undo-redo library is already there, and the tests as well – in gtest (see the master branch). I’d label them “explorative” at the moment since there are just too many assertions per test case, which means I’m repeating myself.

Starting continuous integration at travis-ci for c++

To start my CATCH-“BDD” exploration I’ve setup the branch first: https://github.com/d-led/undoredo-cpp/tree/catchmoci. At my landing page the project is switched on for catching the commit hooks:

travis_setup

The following configuration file .travis.yml is placed for travis-ci to know what to do with my non-conforming repo:

language: cpp

branches:
  only:
   - catchmoci

before_script: ls

script:
  - make -C Build

As in all TDD practice, the build fails due to the reason that the build doesn’t work yet at all. Adding a status image to my README shines:

failing

Fixing the build

The makefiles are created using premake4, which is a single-file makefile generator based on lua. Unfortunately, I couldn’t force the CI-virtual machine execute my binary premake4, so I had to add the generated makefiles. Now that the make process works, the tests still don’t compile:

failing

Fixing the tests

Once the bulk of the assertions have been rewritten for CATCH, the build still failed due to an ambiguity in serializing std::nullptr_t. Fortunately, Phil has thought of (or rather tested) that, and has a macro which can be defined for the build, fixing it: CATCH_CONFIG_CPP11_NULLPTR.

Voila, travis-ci vm is happy:

passing

Just checking locally if test reporting is fine by adding a spurious test temporarily:

justchecking

The test-rewrite has been successful and all pass, the badge is green and I can go to bed

passing2

But! It’s not the end of the story! BDD! Mock-objects!