Misadventures with Property-Based TDD: A Lesson Learned

An awkward moment occured while I was demonstrating property-based testing during my workshop at the XP2013 conference. Working out what went wrong surfaced an interesting difference between driving development from properties and from examples. When working from examples, we start with specifics and then generalise, by adding contradictory examples. With property-based tests it seems better to start with very general properties and then specialise.

The code I was writing to demonstrate property-based testing was a function to calculate the surface area of a 3D cylinder.

The test I wrote was:

@forall(r=floats(), h=floats())
def test_cylinder_area(r, h):
    c = cylinder(r,h)

    end_area = pi * r**2
    end_circumference = 2 * pi * r
    side_area = end_circumference * h

    assert_that(c.area, equal_to_float(2*end_area + side_area))

And the code to make it pass:

class cylinder:
    def __init__(self, radius, height):
        self.radius = radius
        self.height = height

    @property
    def area(self):
        return 2 * pi * self.radius * (self.radius + self.height)

Not a great example of a property, I admit. I was trying to demonstrate how a property-based test more clearly expresses a property than a lot of examples that triangulate the property.

For the purposes of demonstrating iterative development, I then threw a new feature into the mix: the system does not allow 2D or 1D shapes, so the area should be 0 when r=0 or h=0. The constraint for r=0 is covered by the existing property, but a change is required for the case when h=0. So, I added an additional property test:

@forall(r=floats(), h=always(0)))
def test_zero_height(r, h):
    c = cylinder(r,h)
    assert_that(c.area, equal_to(0))

And I made that test pass by adding a conditional in the cylinder's area property:

class cylinder:
    def __init__(self, r, h):
        self.r = r
        self.h = h

    @property
    def area(self):
        if self.h == 0:
            return 0
        else:
            return 2*pi*self.r*(self.r + self.h)

However, to my embarassment, I had forgotten to change the test_cylinder_area test to ignore the case when h = 0, but it still passed! What's going on?

The problem is that the floats() generator is so unlikely to generate a height of zero that the test_cylinder_area property was never applied to a combination of r and h that would have hit the special case for zero height.

Even worse, using floats() to generate radius and height is wrong. Floats() can generate negative values, which can result in nonsensical negative areas. But the calculation of the expected area in the test_cylinder_area property allows the area to be negative.

My tests should only generate floats greater than or equal to zero for the radius and height. In factcheck, the floats() generator can be parameterised by a lower bound:

def pos_floats():
    return floats(lower=0)

This will kill two birds with one stone, because when a lower bound is specified factcheck will always generate the lower bound, and that will make the current implementation of test_cylinder_area fail for the h=0 case as it should.

But that's not good enough. Before fixing the generators, I want to see a test fail because they are incorrect.

The fact that I could write nonsensical properties and see them pass makes me think that I made too large a step at the start. I should have started by writing simpler, but more general, properties, that would then act as a safety net, double-checking the more specific but complex properties and implementation I added later.

A simple property that catches my mistake is that the area must always be positive:

@forall(r=floats(), h=floats())
def test_cylinder_area_is_positive(r, h):
    assert cylinder(r,h).area >= 0

That fails, as it should. Now I can start fixing the generators.

Fixing the generators highlights another mistake I made. I used the primitive floats() generator wherever I needed to generate heights and radii. This makes changing the tests to use pos_floats() instead of floats() rather tedious. I should have used generators to capture domain knowledge. Now is a good time to fix that too:

radii = pos_floats
heights = pos_floats

Now I can change the existing properties to use radii() and heights() generators instead of floats():

@forall(r=radii(), h=heights())
def test_cylinder_area_is_positive(r, h):
    assert cylinder(r,h).area >= 0

@forall(r=radii(), h=heights())
def test_cylinder_area(r, h):
    c = cylinder(r,h)

    end_area = pi * r**2
    end_circumference = 2 * pi * r
    side_area = end_circumference * h

    assert_that(c.area, equal_to_float(2*end_area + side_area))

@forall(r=radii(), h=always(0)))
def test_zero_height(r, h):
    c = cylinder(r,h)
    assert_that(c.area, equal_to(0))

Finally the incorrect test_cylinder_area property fails as it should. I can fix it by specifying that it only applies to heights that are greater than zero:

@forall(r=radii(), h=heights(), where=lambda h: h > 0)
def test_cylinder_area(r, h):
    c = cylinder(r,h)

    end_area = pi * r**2
    end_circumference = 2 * pi * r
    side_area = end_circumference * h

    assert_that(c.area, equal_to_float(2*end_area + side_area))

I think that starting with a small but generic first property is key to using property-based testing with incremental design and a significant difference to using example-based testing.

When working from examples, we start with specifics, initially a single specific example, and generalise, by adding contradictory examples, either to "triangulate" a property with examples or to add new behaviour. In fact, if very strict about TDD, you avoid making a distinction between the two.

With property-based tests it seems better to start with very general properties and then specialise. The general properties act as a safety net in case you get later properties or generators wrong.

Mistaeks I Hav Made

Misadventures with Property-Based TDD: A Lesson Learned