How to make the generators surface more edge cases?

When using the builtin generators in a naieve manner, I find that the probability of generating data that hit upon interesting edge cases is very low.
Here is a simple but extreme example

```cs
    [Test] public void TestArray() => TestLaws(Gen.Int.Array);
    // Here are other custom object generators

    protected void TestLaws<T>(Gen<T> gen)
        where T : IEquatable<T>
    {
        Gen.Select(gen, gen).Sample((a, b) =>
        {
            // Test the IEquatable interface Equals against the object.Equals
            AreEq(a.Equals(b), Equals(a, b));

            // Test the symmetry
            AreEq(a.Equals(b), b.Equals(a));

            // Test compability of hashcode implementation with equals implementation.
            if (a.Equals(b))
            {
                var hashA = a.GetHashCode();
                var hashB = b.GetHashCode();
                AreEq(hashA, hashB, $"{a} and {b} are equal while their hash is not: {hashA}, {hashB}");
            }
        }, iter:10000000);
    }
```

For this test to be meaningful, the generators must generate objects which are structurally similar (or even the same object).
With some experimentation I find that the probability of generating two arrays with the same elements is extremely unlikely.

I think it is possible to improve upon the situation with some pragmatic techniques. Some ideas
- Have tuple/array generators that generate the same objects for a subset of the elements, by re-using the seed or by returning the same object multiple times
- Skew primitive generators more towards builtin edge cases, thereby skewing derived generators too. E.g. if GenInt returns certain numbers a lot more often, a generator like Gen.Select(Gen.DateTime, Gen.TimeSpan, Gen.DateTime, Gen.TimeSpan) is a lot more likely to generate two intervals that are directly adjacent to each other. This would otherwise be near impossible to generate, statistically.
- Skewing primitive generators to smaller instances. Large instances tend to be very similar behaviorally. I.e. whenever the code works for number 2435867, it'll probably also work for 2435868.
- Add builtin knowledge about common pitfalls in code. E.g. Gen.DateTime could generate data around leap-seconds/days or DST transitions more frequently. I see that there is a GenSpecial class for floating point numbers, which is a good example. For good coverage, one should probably use Gen.OneOf(Gen.Double.Special, Gen.Double) when writing tests
- Skew primitive generators by mixing in generators with different distributions. E.g. a generator that builds a number by adding a large random number `[1, 9]` and a small random number `[0.00001, 0.00009]` together is more likely to generate two numbers that are very close together than the default GenDouble. Selecting a generator from a set of skewed generators can statistically perform much better at surfacing edge cases.

I've used these techniques with reasonable success in my own ad-hoc testing library, but I'm currently investigating CsCheck as a faster and more ergonomic alternative.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to make the generators surface more edge cases? #40

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to make the generators surface more edge cases? #40

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions