-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
When using the builtin generators in a naieve manner, I find that the probability of generating data that hit upon interesting edge cases is very low.
Here is a simple but extreme example
[Test] public void TestArray() => TestLaws(Gen.Int.Array);
// Here are other custom object generators
protected void TestLaws<T>(Gen<T> gen)
where T : IEquatable<T>
{
Gen.Select(gen, gen).Sample((a, b) =>
{
// Test the IEquatable interface Equals against the object.Equals
AreEq(a.Equals(b), Equals(a, b));
// Test the symmetry
AreEq(a.Equals(b), b.Equals(a));
// Test compability of hashcode implementation with equals implementation.
if (a.Equals(b))
{
var hashA = a.GetHashCode();
var hashB = b.GetHashCode();
AreEq(hashA, hashB, $"{a} and {b} are equal while their hash is not: {hashA}, {hashB}");
}
}, iter:10000000);
}For this test to be meaningful, the generators must generate objects which are structurally similar (or even the same object).
With some experimentation I find that the probability of generating two arrays with the same elements is extremely unlikely.
I think it is possible to improve upon the situation with some pragmatic techniques. Some ideas
- Have tuple/array generators that generate the same objects for a subset of the elements, by re-using the seed or by returning the same object multiple times
- Skew primitive generators more towards builtin edge cases, thereby skewing derived generators too. E.g. if GenInt returns certain numbers a lot more often, a generator like Gen.Select(Gen.DateTime, Gen.TimeSpan, Gen.DateTime, Gen.TimeSpan) is a lot more likely to generate two intervals that are directly adjacent to each other. This would otherwise be near impossible to generate, statistically.
- Skewing primitive generators to smaller instances. Large instances tend to be very similar behaviorally. I.e. whenever the code works for number 2435867, it'll probably also work for 2435868.
- Add builtin knowledge about common pitfalls in code. E.g. Gen.DateTime could generate data around leap-seconds/days or DST transitions more frequently. I see that there is a GenSpecial class for floating point numbers, which is a good example. For good coverage, one should probably use Gen.OneOf(Gen.Double.Special, Gen.Double) when writing tests
- Skew primitive generators by mixing in generators with different distributions. E.g. a generator that builds a number by adding a large random number
[1, 9]and a small random number[0.00001, 0.00009]together is more likely to generate two numbers that are very close together than the default GenDouble. Selecting a generator from a set of skewed generators can statistically perform much better at surfacing edge cases.
I've used these techniques with reasonable success in my own ad-hoc testing library, but I'm currently investigating CsCheck as a faster and more ergonomic alternative.
Metadata
Metadata
Assignees
Labels
No labels