Skip to content

How to make the generators surface more edge cases? #40

@chtenb

Description

@chtenb

When using the builtin generators in a naieve manner, I find that the probability of generating data that hit upon interesting edge cases is very low.
Here is a simple but extreme example

    [Test] public void TestArray() => TestLaws(Gen.Int.Array);
    // Here are other custom object generators

    protected void TestLaws<T>(Gen<T> gen)
        where T : IEquatable<T>
    {
        Gen.Select(gen, gen).Sample((a, b) =>
        {
            // Test the IEquatable interface Equals against the object.Equals
            AreEq(a.Equals(b), Equals(a, b));

            // Test the symmetry
            AreEq(a.Equals(b), b.Equals(a));

            // Test compability of hashcode implementation with equals implementation.
            if (a.Equals(b))
            {
                var hashA = a.GetHashCode();
                var hashB = b.GetHashCode();
                AreEq(hashA, hashB, $"{a} and {b} are equal while their hash is not: {hashA}, {hashB}");
            }
        }, iter:10000000);
    }

For this test to be meaningful, the generators must generate objects which are structurally similar (or even the same object).
With some experimentation I find that the probability of generating two arrays with the same elements is extremely unlikely.

I think it is possible to improve upon the situation with some pragmatic techniques. Some ideas

  • Have tuple/array generators that generate the same objects for a subset of the elements, by re-using the seed or by returning the same object multiple times
  • Skew primitive generators more towards builtin edge cases, thereby skewing derived generators too. E.g. if GenInt returns certain numbers a lot more often, a generator like Gen.Select(Gen.DateTime, Gen.TimeSpan, Gen.DateTime, Gen.TimeSpan) is a lot more likely to generate two intervals that are directly adjacent to each other. This would otherwise be near impossible to generate, statistically.
  • Skewing primitive generators to smaller instances. Large instances tend to be very similar behaviorally. I.e. whenever the code works for number 2435867, it'll probably also work for 2435868.
  • Add builtin knowledge about common pitfalls in code. E.g. Gen.DateTime could generate data around leap-seconds/days or DST transitions more frequently. I see that there is a GenSpecial class for floating point numbers, which is a good example. For good coverage, one should probably use Gen.OneOf(Gen.Double.Special, Gen.Double) when writing tests
  • Skew primitive generators by mixing in generators with different distributions. E.g. a generator that builds a number by adding a large random number [1, 9] and a small random number [0.00001, 0.00009] together is more likely to generate two numbers that are very close together than the default GenDouble. Selecting a generator from a set of skewed generators can statistically perform much better at surfacing edge cases.

I've used these techniques with reasonable success in my own ad-hoc testing library, but I'm currently investigating CsCheck as a faster and more ergonomic alternative.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions