Author Archives: DLed

Batching Data by Time or Count Using Reactive Extensions (Rx)

Motivation

The time-series database InfluxDB provides a HTTP API to write data. Data points (measurements) are inserted via a Line protocol, which allows batching of data by writing multiple points in one HTTP request.

While experimenting for a simple InfluxDB C++ client, I wanted to create an asynchronous fire-and-forget API, so that the data points can be sent over HTTP without blocking the instrumented C++ code. Several “readymade” options to implement concurrency in this scenario are available.

A simple PAIR of ZeroMQ sockets would do the job, but I’d have to implement batching separately. Thus, I turned my attention to a higher-level abstraction: Rx

Rx Window Operator

Quickly looking through the cross-language Reactive Extensions site, I found the right operator: Window.

This operator has luckily been implemented in RxCpp, thus I proceeded with the experiment.

Batching Design

Batching using Rx Winodw Operator

Rx Window Operator (CC BY 3.0 reactivex.io)[1. Source: reactivex.io License: (CC BY 3.0)]

The window operator takes an observable sequence of data and splits it into windows (batches) of observables. To batch requests, the observable windows of data are aggregated to a single value upon the last value from the windows (via other aggregating Rx operators).

A Toy Problem

To validate the approach, the following problem is set:

Given a stream of integers, append the integers into a series of strings, either every second, or every N integers

String appends with integer-to-string conversions in C++ will be done via the {fmt} library.

Batching in One Line of Code

A stream of numbers batched either by time or count:

auto values = rxcpp::observable<>::range(1, 1000'000)
    .window_with_time_or_count(std::chrono::seconds(1), 100'000);

Note, there is an almost one-to-one translation into a C# version:

var values = Observable.Range(1, 1000000)
               .Window(TimeSpan.FromSeconds(1), 100000);

This indicates the power of the Rx abstraction across languages. The Rx website provides just the right sorting of the documentation to be able to translate Rx code from one language to another.

Aggregating the Batches

In order to do something useful with the batched data, the Scan operator is used to gather the data in a string buffer, and after the last value has been received, the string buffer is assembled into a string and processed:

values.subscribe(
    [](rxcpp::observable<int> window) {
        // append the number to the buffer
        window.scan(
            std::make_shared<fmt::MemoryWriter>(),
            [](std::shared_ptr<fmt::MemoryWriter> const& w, int v)
        {
            *w << v;
            return w;
        })

        // what if the window is empty? Provide at least one empty value
        .start_with(std::make_shared<fmt::MemoryWriter>())

        // take the last value
        .last()

        // print something fancy
        .subscribe([](std::shared_ptr<fmt::MemoryWriter> const& w) {
            fmt::print(
                "Len: {} ({}...)\n",
                w->size(),
                w->str().substr(0, 42)
            );
        });            
    }
);

The Tale of Two Bugs

In the initial (non-TDD) spike, the batching seemed to work, however, something caught my attention (the code bites back):

[window 0] Create window
Count in window: 170306
Len: 910731 (123456789101112131415161718192021222324252...)

the window wasn’t capped at 100’000. This could have been either a misunderstanding or a bug, thus I formulated a hesitant issue #277. As it turned out, it indeed was a bug, which was then fixed in no time. However, the first bug has hidden another one: the spike implementation started to crash at the end: when all the windows were capped by count, and not by time, last window was empty, as all values fit exactly into 10 batches.

The Last operator rightly caused an exception due to an empty sequence. Obviously, there’s no last value in an empty sequence. Rubber Ducking and a hint from Kirk Shoop fixed the issue by utilizing the StartWith operator to guarantee, the sequence is never empty. An empty string buffer can be ignored easier downstream.

Active Object

The active object pattern was applied to implement a fire-and-forget asynchronous API. A Rx Subject to bridge between the function call and the “control-inverted” observable:

struct async_api {
    //...
    rxcpp::subjects::subject<line> subj;
    //...

    async_api(...)
    {
        auto incoming_requests = subj
            .get_observable()
            .map([](auto line) {
                return line.to_string();
            });

        incoming_requests
            .window_with_time_or_count(
                window_max_ms,
                window_max_lines,
                // schedule window caps on a new thread
                rxcpp::synchronize_new_thread()
            )
            .subscribe(...)
        ;
    }

    // fire-and-forget
    void insert(line const& line)
    {
        subj
            .get_subscriber()
            .on_next(line);
    }
};

in order not to block the caller (which would be the default behavior), the observable watches the values from each window on a new thread. Here, scheduling on a thread pool (currently missing in RxCpp) would probably be beneficial.

While this implementation might not be an optimal one, the declarative nature of Rx, once the basics are understood, allows to “make it work and make it right” pretty quickly by composing the right operators.

Code

The runnable code of the example can be found at Github: C++ version.

In order to show, how similar the high level code can be between different languages when writing, I’ve “ported” the example to C# [2. The C# version appears to run faster on my windows machine while solving the same toy problem].

C++ version of ruby’s Integer::times via user-defined literals

Motivation

It occurred to me that I was missing Ruby’s 42.times do ... in C++. While a loop is still imperative, the inspiration to write the folowing code and the post came after watching the talk “Declarative Thinking, Declarative Practice” by Kevlin Henney. Thus, let it be a “declarative imperative”. While there’s no explicit intended use for the following code, it could make some tests read a bit less verbose. An implementation should be dependency-free, minimalist and copy-paste-able.

42_times([]{
    do_something();
});

First version

Luckily, C++11/C++14 feels like a new language[1. the original quote by Bjarne Stroustrup is about C++11] and has a nice feature that allows to fulfill wishes: user-defined literals. Thus, having settled on the implementation strategy, here’s how “.times” can look like in C++ as of today:

#include <iostream>

struct execute {
    const unsigned long long n;

    template<typename Callable>
    void operator() (Callable what) {
        for (auto i = 0; i < n; i++)
            what();
    }
};

execute operator"" _times(unsigned long long n) {
    return execute{n};
}

int main() {
    3_times([]{
        std::cout << "bla" << std::endl;
    });

    auto twice = 2_times;

    twice([]{
        std::cout << "blup" << std::endl;
    });
}

g++ -std=c++14 example.cpp & ./a.out

bla
bla
bla
blup
blup

I’d guess, this idea occurred to a number of people already

Update 1: loop index

@Ideone

With two simple options: with a counter and without the counter available in the closure:

#include <iostream>
#include <functional>

struct execute {
    const unsigned long long n;

    void operator() (std::function<void()> what) {
        for (auto i = 0; i < n; i++)
            what();
    }

    void operator() (std::function<void(unsigned long long)> what) {
        for (auto i = 0; i < n; i++)
            what(i);
    }
};

execute operator"" _times(unsigned long long n) {
    return execute{n};
}

int main() {
    3_times([]{
        std::cout << "bla" << std::endl;
    });

    auto twice = 2_times;

    twice([]{
        std::cout << "blup" << std::endl;
    });

    3_times([](unsigned long long i) {
        std::cout << "counting: " << i << std::endl;
    });
}

bla
bla
bla
blup
blup
counting: 0
counting: 1
counting: 2

Update 2: benchmark

A simple benchmark attempt shows that the modern C++ compilers can optimize a for loop that doesn’t produce side effects better than the code above. However, not all code needs to sacrifice readability for performance. If the loop is not on the critical path, it probably doesn’t need to be optimized, especially, if it’s just a test.

After some tinkering to disable optimizations for the function call, here’s a hayai-based benchmark:

struct TimesVsLoop : public ::hayai::Fixture{
    static int v;

    static void do_something() {
        v = v % 41 + 1;
    }

    static void do_something(unsigned long long i) {
        v = (i + v) % 41 + 1;
    }
};

int TimesVsLoop::v = 0;

BENCHMARK_F(TimesVsLoop, Times_Without_Index, 10, 100)
{
    1000000_times([]{
        do_something();
    });
}

BENCHMARK_F(TimesVsLoop, Loop_Without_Index, 10, 100)
{
    for (int i = 0; i < 1000000; i++) {
       do_something();
    }
}

BENCHMARK_F(TimesVsLoop, Times_With_Index, 10, 100)
{
    1000000_times([](unsigned long long i){
        do_something(i);
    });
}

BENCHMARK_F(TimesVsLoop, Loop_With_Index, 10, 100)
{
    for (int i = 0; i < 1000000; i++) {
       do_something(i);
    }
}

int main()
{
    hayai::ConsoleOutputter consoleOutputter;

    hayai::Benchmarker::AddOutputter(consoleOutputter);
    hayai::Benchmarker::RunAllTests();
    return 0;
}

g++ 1_benchmark.cpp -std=c++14 -O3 -I/usr/local/include/hayai && ./a.out

A performance hit is observable, but is probably insignificant in relevant use-cases:

Update 3: a short metaprogram

In order to address the issue of the std::function overhead, Kirk Shoop (@kirkshoop) has posted an alternative version with a small metaprogram:

struct execute {
    const unsigned long long n;

	template<typename F, class Check = decltype((*(F*)nullptr)())>
    void operator() (F what, ...) {
        for (auto i = 0; i < n; i++)
            what();
    }

	template<typename F, class Check = decltype((*(F*)nullptr)(0))>
    void operator() (F what) {
        for (auto i = 0; i < n; i++)
            what(i);
    }
};

This version appears to perform as good, if not sometimes better (an OS fluke, perhaps?) than the raw loop:

Update 4: CppCast & benchmark @ Github

This blog entry was briefly discussed in CppCast’s episode Catch 2 and C++ the Community with Phil Nash. Further ideas, reflections and analyses are welcome!

Benchmark code: cpp_declarative_times/github.com

Update 5: yet more declarative & responses

Jon Kalb has posted a very clear article with an in-depth take on a traditional C++ approach to implementing N_times(...). The Reddit responses point to the Boost.Hana implementation. These responses are in a way in tension with the original intent of writing a minimum sufficient amount of code to achieve the ruby-like syntax. However, indeed, a typical C++ solution involves metaprogramming partly to ensure that other uses apart from the one originally in mind are safe. The discussion of that phenomenon should be part of a separate article.

The other mentioned (metaprogramming) approaches inspire me to create an even more special [2. as in, opposite to generic], but still a “character-lightweight” solution. Coming back to declarative thinking and explicit and clear intent, a much simpler resolution of the two overloads can be achieved by simply creating another operator for the index overload:

struct execute_with_index {
    const unsigned long long count;

    template<typename CallableWithIndex>
    void operator() (CallableWithIndex what)
    {
        for (auto i = 0; i < count; i++)
            what(i);
    }
};

execute_with_index operator"" _times_with_index(unsigned long long count)
{
    return { count };
}

1000000_times_with_index([](unsigned long long i) {
    do_something(i);
});

this addition to the initial version solves the problem with the overloads and avoids both metaprogramming and std::function.

Next update: a more generic loop and CRTP.