Build Agent Infrastructure Testing in GoCD

In this post I would like to describe a simple technique for reducing the waiting time and stress related to build agent environment volatility when using Continuous Integration / Continuous Delivery tools like GoCD, via infrastructure testing.

The Problem

Given a modern CI server, such as GoCD, and a set of dedicated build machines (agents), it is possible to improve software development agility. Automated build/test/deploy pipelines, built to reflect the value stream, bring transparency and focus into the software delivery activities.

CI automation is software itself, and is thus susceptible to errors. Configuration management can optimize the set up of the environment, in which the build agents run. However, when computational resources are added to a CI infrastructure, i.e. to parallelize the build and thus reduce feedback times, a missing environment dependency can cause stress and pain that CI is trying to eliminate.

Consider a pipeline where a complete cycle (i.e. with slow integration tests followed by a reporting step at the end or long check-outs in the beginning) takes a significant amount of time. If one of the last tasks fails due to a configuration or an environment issue, the whole stage fails. The computational resources have been wasted just to find out that a compiler is missing. This can easily happen when there is variation in the capabilities of the build agents.

Improvement idea: fail fast! Don’t wait for environment or infrastructure mismatches

GoCD Resources as Requirements and Capabilities

If a step in a build pipeline requires a certain compiler or a particular environment, this can be conveniently expressed in the configuration of GoCD as a build agent resource. A resource can be seen as a requirement of a pipeline step that is fulfilled by a corresponding capability of a build agent.

Consider the following set-up with two build agents – one running on Windows, another one on Linux. Some tasks could be completely platform-independent, such as text processing, and thus could be potentially performed on any machine with a required interpreter installed.

agents

Build Agent is the Culprit

We have set up our environment, and have successfully tested our first commit, but the second one fails:

another_agent

The code is the same, why did the second build fail? The builds ran on different agents, but I expected them to behave similarly…

build_fails

Oh, that’s embarrassing. While yes, the script is platform-independent, there’s no executable named python2 in my Windows runner environment path.

With a one-file repository and a simple print statement this failure did not cause much damage, but as mentioned earlier, real life builds failing due to a missing executable might be costly.

Unhappy Picture

unhappy

Infrastructure Test Pipeline

In order to fail fast in situations where new agents are added to a CI infrastructure, or their environment is volatile, I propose to use a single independent pipeline that checks the assumptions that longer builds depend upon.

If a build step requires python in the path, there should be a test for it that gives this feedback in seconds without much additional waiting time. This can be as easy as calling python --version, which will fail with a non-zero return code if the binary is missing. More fine-grained assertions are possible, but should still remain fast.

If a certain binary should not be in the path, this can be asserted as well. The same goes for environment variables and file existence. Dedicated infrastructure testing tools, such as Serverspec could also be used, but having a response time under a minute is crucial in my view.

Run on All Agents

In order to validate the consistency of the CI infrastructure, the validation tasks should run on all agents that advertise a corresponding resource. This is where, in my view, the real power of GoCD comes to light, and the concepts used in it fit in the right places.

GoCD will run the test tasks on all agents that fulfill all resource requirements for the task.

run_on_all_g

Test fails

Now that we have all the tests, running them gives quick and precise feedback:

infrastructure_test_fails

Checking out the job run details reveals the offending agent. Note the test duration: under 1 second.

infrastructure_test_agent

Fixing the Infrastructure

resources_modified

Whatever the resolution of the infrastructure problem, when the infrastructure test has a good coverage of the prerequisites for a pipeline, adding new agents to the CI infrastructure should become as much fun as TDD is: write an infrastructure test, see it fail, fix the infrastructure, feel the good hormones. Add new build agents for speed — still works — great!

infrastructure_test_passes

Note how the resources that are available only on one machine are only run on one corresponding machine.

 

Happy Pictures and Developers

happy pipeline

When to Test

It is an open question, when to test the infrastructure. With the system being composed of the CI server and agents, the tests should probably run on any global state change, such as

  • added/removed/reconfigured agents
  • automatic OS updates (controversial)
  • restarts
  • network topology changes

It is also possible to schedule a regular environment check. Having the environment test pipeline be the input for other pipelines unfortunately will not do in the following sequence of events:

  • environment tests pass
  • faulty agents are added
  • downstream pipeline is triggered
  • environment failure causes a pipeline to fail

In any case, there is a REST API available for the GoCD server should automating the automation become a necessity.

Acknowledgments

I would like to thank all the great minds, authors and developers who have worked and are working to make lives of developers and software users better. Tools and ideas that work and provide value are indispensable.  The articles and the software linked in this blog entry are examples of knowledge that brings the software industry forward. I am also very grateful to my current employer for letting me learn, grow and make a positive impact.

 

 

A unit test suite is a spiderweb

A moment’s epiphany in a highly caffeinated brain; a metaphor:

a unit test suite is a spiderweb!1

Why?

  • It’s lightweight
  • It’s sufficiently robust
  • It catches a casual bug passing by
  • It’s most useful if the maintainer reacts to a bug catch immediately
  • It needs occasional maintenance
  • It’s not watertight, but good enough
  • Some are beautifully designed, some look like a tight cloth, some are a mess
  • …insert your analogy…

There is an already established XP metaphor of a safety net for the people and the business, to which I fully subscribe. This “lightweight net” metaphor, I hope, adds another flavor to the test-suite-as-a-net analogies, especially, to lightweight unit test suites, by describing which qualities unit test suites can have.

We now know that small (biological) bugs and other small creatures can cause a lot of harm, just as tiny software bugs can and do. 2

The analogy might have been a cheap trick of my brain. Nonetheless, it is now externalized, and I can go to sleep.

Build Status

  1. When done well
  2. As a corollary: one can catch larger things by being a large cat, and applying different powerful strategies, but that requires quite a muscle mass, and a good connection to the ground. P.S. no lightweight vs. non-lightweight preference comparison intended.

Итого: 2015 / Balance: 2015

2015 – A year of fast paced change

→ 2014

Why wait forever for the tests? Fast tests of slow software.

Time is volatile

Imagine writing a cron-like functionality that should produce some side-effect, such as cleanup. The intervals between such actions might be quite long. How does one test that? One can surely reason about the software, but given a certain complexity, test should be written, proving that certain important scenarios work as intended.

It’s common that software depends on time flow as dictated by the physical time flow, reflected via some clock provider. However, resetting the time to a year ahead won’t make the CPU work faster and make all the computations it should have performed within that year. A clock is also a volatile component that can be manipulated, thus if time is an issue, it’s probably a good idea not to depend on it directly, following the Stable Dependencies Principle and the Dependency Inversion Principle.

Luckily, there is an abstraction for time, at least in Reactive Extensions (Rx), which is the Scheduler.

Slow non-tests

Here’s a slow Groovy non-test, waiting for some output on the console using RxGroovy:

import rx.*
import java.util.concurrent.TimeUnit

def observable = Observable
	.just(1)
	.delay(5, TimeUnit.SECONDS)

observable.subscribe { println 'ah, OK, done! Or not?' }

Observable
	.interval(1,TimeUnit.SECONDS)
	.subscribe { println 'still waiting...' }

println 'starting to wait for the test to complete ...'

observable.toBlocking().last()

Running it produces the following slow-ticking output:

oldnontest 1

Interpreting such tests without color can be somewhat challenging 2.

Fast tests

Now let’s test something ridiculous, such as waiting for a hundred days using Spock. Luckily, RxJava & RxGroovy also do implement the test scheduler, thus enabling fast tests using virtual time:

import spock.lang.Specification

import rx.Observable
import rx.schedulers.TestScheduler
import java.util.concurrent.TimeUnit


class DontWaitForever extends Specification {
    def "why wait?"() {
        setup:
            def scheduler = new TestScheduler()

            // system under test: will tick once after a hundred days
            def observable = Observable
                .just(1)
                .delay(100, TimeUnit.DAYS, scheduler)
            def done = false

        when:
            observable.subscribe {
                done = true
            }

            // still in the initial state
            done == false

        and:
            scheduler.advanceTimeBy 100, TimeUnit.DAYS

        then:
            done == true
    }
}

fasttest 3

just checking, advancing the time by 99 days results in a failure:

just_checking

Delightful, groovy colors!

Source

github.com/d-led/dont_wait_forever_for_the_tests

  1. Caputured with the wonderful pragmatic tool LICEcap by the Reaper developers
  2. Here, the ‘still waiting’ subscription is terminated after the first subscription ends. Try exchanging the order of the subscribe calls.
  3. Building using Gradle

Deterministic Testing of Concurrent Behavior in RxCpp

A Retrospective

After getting inspired by The Reactive Manifesto, it is hard not to get excited about Reactive Extensions. Such excitement has lead to a series of hello-world articles and some code examples. While Reactive Extensions take over the programming world in C#, Java and JavaScript, it seems, the world of C++ is slow to adopt RxCpp.

The new ReactiveX Tutorial link list is a great place to start learning and grokking. This article is an attempt to bring RxCpp closer to C++ developers who might not see yet, how a reactive programming model might help writing better, more robust code.

Testing concurrency with RxCpp

A previous article showed how to test ViewModels in C# by parameterizing the ViewModels with a scheduler. In a UI setting, the scheduler usually involves some kind of synchronization with the GUI thread. Testing keystrokes arriving at certain speed would require some effort to simulate events, probably leading to brittle tests. With the scheduler abstraction, the concurrent behavior of a component is decoupled from physical time, and thus can be tested repeatedly and very fast. This was the C# test:

(new TestScheduler()).With(scheduler =>
{
    var ticker = new BackgroundTicker(scheduler);

    int count = 0;
    ticker.Ticker.Subscribe(_ => count++);
    count.Should().Be(0);

    // full control of the time without waiting for 1 second
    scheduler.AdvanceByMs(1000);
    count.Should().Be(1);
});

Show Me The Code

Without further ado, the C++ version is not very far from the C# version. In a simple test, we can parameterize a sequence of integer values arriving at specified intervals (a ticker) with a coordination (why coordination and not scheduler, read in the RxCpp developer manual:

auto seq = rxcpp::observable<>::interval(
            std::chrono::milliseconds(1),
            some_scheduler
);

The deterministic test scheduler API is currently available through a worker created on the test scheduler:

auto sc = rxcpp::schedulers::make_test();
auto worker = sc.create_worker();
auto test = rxcpp::identity_same_worker(worker);

The rest should read like English:

int count = 0;

WHEN("one subscribes to an observable sequence on the scheduler") {
  auto seq = rxcpp::observable<>::interval(
              std::chrono::milliseconds(1),
              test // on the test scheduler
             ).filter([](int i) { return i % 2; });

  seq.subscribe([&count](int){
    count++;
  });

  THEN("the sequence is not run at first") {
    worker.sleep(2 /* ms */);

    CHECK(count == 0);

    AND_WHEN("the test scheduler is advanced manually") {

      THEN("the sequence is run as expected") {
        worker.advance_by(8 /* ms */);
        CHECK(count == 5);
      }
    }
  }
}

The full test can be seen @github, and is built on Travis CI

RxCpp 2

RxCpp 2 and API

The last article on rxcpp was based on a now obsolete version of RxCpp. The key contributor to the library, Kirk Shoop, has kindly provided a rewrite based on the newer, 2.0 API of the library: see the pull request, upon which this article is based.

Since the first article, the project has been enriched with somewhat more readable GIVEN/WHEN/THEN-style tests using Catch 1.

Still Ticking: Scheduler and Coordination in RxCpp 2

The previous articles give examples of managing periodic events, such as ticker ticks and measurements in c++. The following example creates an event loop that will be used for coordinated output of various events to the console:

auto scheduler = rxcpp::schedulers::make_same_worker(
    rxcpp::schedulers::make_event_loop().create_worker()
);

auto coordination = rxcpp::identity_one_worker(scheduler);

One such sequence of events is some kind of measurement 2

auto measure = rxcpp::observable<>::interval(
        // when to start
        scheduler.now() + std::chrono::milliseconds(250),
        // measurement frequency
        std::chrono::milliseconds(250),
        coordination)
    // take Hz values instead of a counter
    .map([&FM](int) { return FM.Hz(); });

auto measure_subscription = measure
    .subscribe([](int val) {
        std::cout << val << std::endl;
    });

Why didn’t it tick?

If this code were the end of the main program, there wouldn’t be any observable ticks, as all the objects would be destroyed before the first scheduled event. To see the code in action, we shall wait for some condition that will change when we’re done. This step is not necessary if there’s a GUI toolkit event loop that keeps objects alive, but it has to be simulated for a console example.

To demonstrate the subscription change and wait for some time, we’ll wait twice for an atomic variable to become zero:

std::atomic<long> pending(2);

...

// after all subscriptions defined
while (pending) {
    sleep(1000); // wait for ticker and measure to finish
}

Tick and Stop

The other ticker will have another period, will only tick 10 times, and then decrement the pending counter:

auto ticker = rxcpp::observable<>::interval(
    scheduler.now() + std::chrono::milliseconds(500),
    std::chrono::milliseconds(500),
    coordination);

ticker
    .take(10)
    .subscribe([](int val) {
        std::cout << "tick " << val << std::endl;
    },[&](){
        --pending; // take completed the ticker
    });

Now, we can schedule the termination of the measurement (decrement pending) subscription halfway through the 10-tick run. This scheduling is done on the same scheduler that is running all the subscriptions:

scheduler.create_worker().schedule(scheduler.now() + std::chrono::seconds(2), 
    [&](const rxcpp::schedulers::schedulable&) {
        std::cout << "Canceling measurement ..." << std::endl;
        measure_subscription.unsubscribe(); // cancel measurement
        --pending; // signal measurement canceled
    });

The result:

63
tick 1
63
61
tick 2
63
61
tick 3
63
62
Canceling measurement ...
tick 4
tick 5
tick 6
tick 7
tick 8
tick 9
tick 10

Thanks, Kirk & other library contributors!

Code @ github

Next: deterministic testing of concurrent behavior

  1. i.e. create.cpp
  2. Observe the convergence of the API towards the C# version.

Automatic Lua Properties

Automatic Lua Properties?

Starting with an example using the Lua specification and testing framework Busted:

Here is a little exercise in Lua metaprogramming.

Code

https://github.com/d-led/automatic-lua-property-tables

Spec: autoprop_spec.lua
Implementation: autoprop.lua

P.S. Other implementations: lua-users wiki: Automagic Tables

Presenting at TU-Munich: testing on c++ projects, Thursday, March 26, 2015 7:00 PM

Expecting Thank you to all for a superb heated debate! next week

“no excuses for not testing on c++ projects”

Thursday, March 26, 2015
7:00 PM

details: http://www.meetup.com/MUCplusplus/events/220628575/

If only all test were comprehensible…

SCENARIO("acquiring wisdom") {

  GIVEN("an oracle") { 
    oracle gus;
    
    WHEN("I ask it to speak") {
      auto answer = gus.speak();

      THEN("wisdom is apparent") {
        CHECK( answer != "bla" );
      }
    }
  }
}

1

→ The code can be found @github, including the presentation slides.