Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BLOG] Generating Test Values using JavaScript Generators #18

Open
peter-leonov opened this issue Jan 18, 2025 · 2 comments
Open

[BLOG] Generating Test Values using JavaScript Generators #18

peter-leonov opened this issue Jan 18, 2025 · 2 comments
Labels
documentation Improvements or additions to documentation

Comments

@peter-leonov
Copy link
Owner

peter-leonov commented Jan 18, 2025

Combining Generators

So I've wrote yet another type predicate (guard) generator for TypeScript. Cool, cool, cool. What's next?

As the Predicate Generator is all about trust, type safety and correctness, why not to go the extra mile and also generate a test suite for the resulting type predicate function right next to the predicate itself!

Before you say it, I tried AI make it all for me first, but the results were unstable though still ok.

Why

Ok, we're generating tests for a type predicate function. Where to start? Let's first take a look at a simple predicate function.

type MyNumber = number

function isMyNumber(value: unknown): value is MyNumber {
  if (typeof value !== "number") {
    return false
  }
  value satisfies MyNumber
  return true
}

You can see that this function uses a typeof operator to check if value is of the primitive type number. In JS/TS all numbers are of this type. Then the function uses the satisfies type operator to verify that the type of value is properly narrowed down to number which satisfies the desired type MyNumber.

That would be great to end the story here, but in TypeScript the body of a type predicate function is not checked to be actually correct in regards of examining the given value. The only requirement is that it returns a boolean. One can easily write a function like below and get no errors from the TypeScript compiler:

function isMyNumber(value: unknown): value is MyNumber {
  // Just trust me!
  return true
}

This is why we have to check the type predicate functions thoroughly to avoid misguiding ourselves that the code does what it does. One way, as you have seen, is to always use the satisfies type operator before actually returning true (even better if there is only one return true in the whole function to be sure). The other way is the topic of this post: generating test values for a given type.

What

Generating test values boils down to creating two sets of values: valid and invalid. These values are then passed to the type predicate function. The function has to return true for all the valid values and false for all the invalid values.

For our simple example the sets of values are as follows:

type MyNumber = number

const valid = [0, 42, 3.14]
const invalid = ["foo", null, true]

Having these values we can generate a unit test file like this:

test("valid values", () => {
  expect(isMyNumber(0)).toBe(true)
  expect(isMyNumber(42)).toBe(true)
  expect(isMyNumber(3.14)).toBe(true)
})

test("invalid values", () => {
  expect(isMyNumber("foo")).toBe(false)
  expect(isMyNumber(null)).toBe(false)
  expect(isMyNumber(true)).toBe(false)
})

Test frameworks offer a way to iterate over a list of values, but for the sake of readability I'm using the explicit syntax.

So far so good. As a next example let's manually generate a set of test values for a more complex type.

type MyObject = {
  a: number,
  b: string
}

To avoid the combinatorial explosion of possible values here we're gonna pick only a few values for every type:

  • for the number type we are checking that 0 and 42 are valid and "42" and undefined are invalid
  • for the string type we're checking that "" and "foo" are valid and undefined and 1 are invalid
  • for the object type itself the valid values are any combination of both valid properties, and the invalid values are where any of a and b attributes are missing or any of them have an invalid value

How

Now we need to somehow generate the test values for the types the predicates check. Given that in the process of the predicates generation we already parse input types into a simple nested structure (see TypeModel) it must be trivial to traverse it and dump a bunch of test values. Right? To a degree.

First things that pops to mind is to write a recursive function (or a recursive interface) that takes a type, goes about its internal specifics–like iterating over the attributes in an object–and returns a test value. Let's try writing a simple example of such a function:

class MyObjectType {
  attributes: Record<string, MyTypes>
}

class MyNumberType {}

class MyStringType {}

type MyTypes = MyObjectType | MyStringType | MyNumberType

function getTestValuesFor(type: MyTypes) {
  if (type instanceof MyNumberType) {
    return 42
  } else if (type instanceof MyStringType) {
    return "foo"
  } else if (type instanceof MyObjectType) {
    const result = {}
    for (const [key, attrType] of Object.entries(type.attributes)) {
        result[key] =getTestValuesFor(attrType)
    }
  }
}

Seems ok, until we find out that the function returns only one combined value with only one test value per attribute. Never fear, we always can return an array of values instead! But, then we still return a nested structure (array of arrays of objects or something) which we'd need to traverse once more to flatten it into an array of final test values suitable for the predicate to test on. Still doable, but what does this function return in turn? It should be a flattened array of all flattened values. Some progress, but it certainly reminds of what we started with. Could we do better and merge all this in one efficient pass?

Yes! To be able to traverse the nested type tree, permute the values any way we want and keep the code readable we just need to find a way to yield one test value at a time instead of returning arrays and merging them on each step.

Yielding a value. One step at a time. This definitely rings a bell… Use a callback? Nope. An iterator? Close. A coroutine? Bullseye! A generator, to be correct as in JavaScript they are known as generator functions. Generators as class of computation in JavaScript allow to pull a single value at a time and also have nice syntax and state management out of the box with unlimited nesting. We also get laziness for free to cover the cases when there is no need to generate all the values at once saving some CPU cycles. Implementing this level of control with the raw JS functions is still possible of course as JS is a Turing complete language and every closure in JS can technically be a coroutine. But it's not gonna be pretty (we're gonna try it anyway a bit later in this post).

Generators to the rescue!

A simple test value generator looks like this:

function* myNumber() {
  yield 0
  yield 42
  yield 3.14
}

Note the star after the function keyword. This star tells the JS engine to compile the function in a special way so that the yield keyword works as in a coroutine. The example above when called gives back an iterator that produces three values 0, 42 and 3.14. We can read all of them at once like this:

console.log([...myNumber()])

…or fetch one by one like this:

for(const value of myNumber()) {
  console.log(value)
}

Ok, it seems that we've got the right tool. Now, how do we build with it? We'd need to find a way to combine several value generators into a compound generator and then just pull values from the top level generator and pass them to our tests. Let's try to implement a union first as it's the simplest of the combining generators:

function* myUnion(members) {
  for (const member of members) {
    for (const value of member) {
      yield value
    }
  }
}

it can also be written in a shorter form:

function* myUnion(members) {
  for (const member of members) {
    yield* member()
  }
}

And that's it. Generator syntax, it feels, is just made for infinite / lazy sequences computation. Let's implement the same logic manually for comparison. It looks something like this:

function myUnion(members) {
  let i = 0;
  let member = undefined;
  return function () {
    for (; i < members.length; i++) {
      if (!member) {
        // "instantiate" a member
        member = members[i]()
      }
      const value = member();
      // no more values in the member iterator
      if (value === undefined) {
        continue;
      }
      return value
    }
    // no more values in our iterator either
    return undefined
  };
}

Already a lot even without the next() method and the rest of the proper iteration API.

I hope, at this point you're as deep in love with generators as I obviously am. Then you'll definitely like the beauty of generating an object using a generator function, our first combinatorially complex type (a product type):

function* myObject(attrA, attrB, attrC) {
  for (const a of attrA())
    for (const b of attrB())
      for (const c of attrC())
        yield { a, b, c }
}

Wow, right? It looks like it's gonna spin N^3 times and produce a ton of values before returning any, but it's not the case. This generator first yields an object with the first value of a, the first value of b and the first value of c that gets returned immediately to the caller. The next value is gonna be still the first value of a, the first value of b and the second value of c. It's a DFS-like execution, yep. Once any of the inner generators is up the next outer progresses by one step. When the most outer is done in its turn the whole generator function returns and the caller finishes reading the test values. Hoorah!

You might have noticed that the myObject() function is a bit simplified. It's implemented for a fixed number of attributes with fixed names. This is an intentional simplification we have to make to demonstrate the raw power of the approach in a clear simple way. Of course, the real life code is around 10x longer, but it covers arbitrary objects with any number of optional attributes and any of those might have invalid test values. Next, take a look at the below example yielding both valid and invalid objects:

function* myObject(attrA, attrB, attrC) {
  for (const a of attrA())
    for (const b of attrB())
      for (const c of attrC()) {
        // valid
        yield { a, b, c }
        // invalid ones
        yield {}
        yield { a }
        yield { b }
        yield { c }
        yield { a, b }
        yield { a, c }
        yield { b, c }
      }
}

Still pretty readable and easy to start with. Just using basic generators like this one already gives a good boost on attacking a combinatorial task at hand.

Finally, let's see how to combine them all into something meaningful.

const values = myObject(
  myUnion([
    myNumber(),
    myString()
  ]),
  myString(),
  myNumber()
);

console.log([...values()])

The output is as follows (using only valid test values for readability):

[
  { a: 0, b: '', c: 0 },
  { a: 0, b: '', c: 42 },
  { a: 0, b: '', c: 3.14 },
  { a: 0, b: 'foo', c: 0 },
  { a: 0, b: 'foo', c: 42 },
  { a: 0, b: 'foo', c: 3.14 },
  { a: 42, b: '', c: 0 },
  { a: 42, b: '', c: 42 },
  { a: 42, b: '', c: 3.14 },
  { a: 42, b: 'foo', c: 0 },
  { a: 42, b: 'foo', c: 42 },
  { a: 42, b: 'foo', c: 3.14 },
  { a: 3.14, b: '', c: 0 },
  { a: 3.14, b: '', c: 42 },
  { a: 3.14, b: '', c: 3.14 },
  { a: 3.14, b: 'foo', c: 0 },
  { a: 3.14, b: 'foo', c: 42 },
  { a: 3.14, b: 'foo', c: 3.14 },
  { a: '', b: '', c: 0 },
  { a: '', b: '', c: 42 },
  { a: '', b: '', c: 3.14 },
  { a: '', b: 'foo', c: 0 },
  { a: '', b: 'foo', c: 42 },
  { a: '', b: 'foo', c: 3.14 },
  { a: 'foo', b: '', c: 0 },
  { a: 'foo', b: '', c: 42 },
  { a: 'foo', b: '', c: 3.14 },
  { a: 'foo', b: 'foo', c: 0 },
  { a: 'foo', b: 'foo', c: 42 },
  { a: 'foo', b: 'foo', c: 3.14 }
]

A familiar combinatorial Christmas tree. Find the full source code of this example below.

I hope you've enjoyed the journey. If you're interested in how the type predicate generator tool deals with the vast amount of combination you're welcome to continue reading the footnotes below.

Footnotes

On the combinatorial explosion

First, let's calculate how many combinations a random everyday type is gonna require to test ALL possible cases. Just a single non-nested object type of 10 optional fields with a single possible value per attribute is 210 = 1024 combinations. And for a slightly bigger usual SaaS API response with 20 optional fields 220 = 1M! Let alone properties with multiple possible values or even nested objects with their own optional attributes. This is a no-go even for fuzz testing. To avoid checking all this almost infinite number of values the test generator takes the white box testing approach. It's when we write only a small number of targeted tests only for the known edge cases of the implementation we know.

The type predicate generator specifically makes two tradeoffs. First, for valid values it uses only the values that can potentially be misinterpreted by a bug in the generator (like an empty string or the number 0). And second, it does not permute all possible combinations of values effectively turning all the product types into sum types. Instead, it picks only one valid value at a time for each position keeping the rest values exactly the same effectively changing only one value at a time for the whole resulting test value. In other words imagine that a test value with all the properties and all the nesting is a tree. Each leaf of the tree is a primitive value in each position. There are not too many such leaves even in a relatively big type. The take each leaf and change it with all possible values for only this leaf and yield them. Then take the other leaf alone and change values for only that leaf not changing the others. And so on. In the end you should be getting at maximum M * N values where M is the number of all the leaves and N is the sum of all individual values of all the leaves. Compare this with NM.

Please leave a comment if you're interested in learning more on how it's implemented. Meanwhile you can take a look at the implementation in the tool itself.

On white box testing

Why white box testing approach is ok here? First and foremost, because we have to either use some kind of theorem proving formal language (which TS is not, yet) or reduce the problem space using a limited subset of the building blocks. Following this logic the test generator assumes that the type predicates use some sort of a DSL instead of the Turing complete JavaScript and tests this subset instead. This way we trust the system formed of the predicate generator and the test generator as a whole. In other terms we're operating in an inherently safe system as far as we control the DSL and its source code, no need to cover for the broad Byzantine environment that the wild JavaScript world can sometimes be.

@peter-leonov peter-leonov added the documentation Improvements or additions to documentation label Jan 18, 2025
@peter-leonov
Copy link
Owner Author

peter-leonov commented Jan 18, 2025

Here is the full code for the final combined generator:

type Value = string | number | object;

type GeneratorConstructor = () => Generator<Value>;

function* myNumber(): Generator<Value> {
  yield 0;
  yield 42;
  yield 3.14;
}

function* myString(): Generator<Value> {
  yield "";
  yield "foo";
}

function myUnion(
  members: GeneratorConstructor[]
): GeneratorConstructor {
  return function* () {
    for (const member of members) {
      yield* member();
    }
  };
}

function* myObject(
  attrA: GeneratorConstructor,
  attrB: GeneratorConstructor,
  attrC: GeneratorConstructor
) {
  for (const a of attrA())
    for (const b of attrB())
      for (const c of attrC()) yield { a, b, c };
}

const values = myObject(
  myUnion([myNumber, myString]),
  myString,
  myNumber
);

console.log([...values]);

Run it here.

@peter-leonov peter-leonov changed the title [WIP] A note on Combining Generators A note on Combining Generators Jan 18, 2025
@peter-leonov peter-leonov changed the title A note on Combining Generators Generating Safe Values using JavaScript Generators Jan 20, 2025
@peter-leonov peter-leonov changed the title Generating Safe Values using JavaScript Generators Generating Test Values using JavaScript Generators Jan 20, 2025
@peter-leonov peter-leonov changed the title Generating Test Values using JavaScript Generators [BLOG] Generating Test Values using JavaScript Generators Jan 20, 2025
@peter-leonov
Copy link
Owner Author

Btw, folks. In case your language of choice does not support coroutines there is a way to follow what JS community used to be doing before the full support for coroutines (and later async / await) got into the main engines. They used transpilation. Here is an example on how TypeScript compiler lowers the code for a generator function down to the old JS standard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant