Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(checkpoint-validation): add custom checkpointer validation tool #600

Merged

Conversation

benjamincburns
Copy link
Contributor

@benjamincburns benjamincburns commented Oct 15, 2024

What's in the box?

This change adds the package @langgraph/checkpoint-validation-tool, which provides a portable black-box acceptance test suite for validating both internal and external checkpoint saver implementations.

This change fixes #541.

How can I use it?

It can be ran as a CLI tool, or by embedding it into an existing Jest test project. Interfacing between the test suite and custom checkpoint saver code is handled by a user-provided object that contains a set of simple functions that are used for infrastructure and component setup/teardown. Full usage details can be found in the README.md introduced in this PR.

Why is the build failing?

This package is also configured to exercise the existing checkpoint saver implementations during normal test runs (via yarn test). I use TestContainers where needed (MongoDB and PostgreSQL savers) for infrastructure provisioning.

A number of problems were discovered in the existing LangGraph checkpoint saver implementations while working on this PR. I've submitted additional PRs to correct some of the problems that I discovered (#578, #580, #582, #583, #584), and raised GH issues for others (#581, #589, #590, #591, #592, #593, #594, #595).

In the case of the problems that are described in GH issues, I've included implementation-specific carve-outs in the test logic to avoid test failures. These carve-outs are marked with TODO comments in the code.

In the case of the problems for which I've submitted PRs, I have not included any carve-outs in this test suite for those. As a result, the tests introduced by this PR will fail until the above mentioned PRs are accepted.

What's the best way to evaluate this whole mess?

To solve the chicken-and-egg problem that's caused by the dependence of this PR on so many others, I've created a merge branch that contains all of those fixes, along with the changes in this PR. I will endeavor to keep that branch in sync as things change.

@benjamincburns
Copy link
Contributor Author

I should also note that I haven't added any documentation for this beyond what's in the README.md. I'd be happy to add docs, but I wasn't sure where they'd fit best. Guidance would be appreciated.

@benjamincburns benjamincburns force-pushed the checkpoint-acceptance-tests branch 5 times, most recently from fcf1947 to 78309c4 Compare October 15, 2024 01:55
@benjamincburns benjamincburns force-pushed the checkpoint-acceptance-tests branch 4 times, most recently from 957b968 to 34c6b65 Compare October 17, 2024 04:29
@jacoblee93
Copy link
Collaborator

Thanks so much for this! Will dive in today.

Copy link
Collaborator

@jacoblee93 jacoblee93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits but overall I think this looks excellent and well thought out. Will give it a spin a bit later and approve if I'm able to run everything!

Other general nit is that we've generally used snake case for filenames stylistically - would prefer to keep it that way. I can make that change later too though.


// passing via global is ugly, but there's no good alternative for handling the dynamic import here
const initializer = (globalThis as GlobalThis)
.__langgraph_checkpoint_validation_initializer;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit: a symbol might be more idiomatic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, although it's not much better. Jest gives each test file a fresh module cache, so there's no way to insert something into globalThis under a globally unique symbol key. I had to fall back to using Symbol.for, which isn't much different from what I had here, apart from being less ugly to read.

parents: {
// I think this is meant to be opaque to checkpoint savers, so I'm just stuffing the data in here to make sure it's stored and retrieved
// I think this is roughly what it'd look like if it were generated by the pregel loop, though
checkpoint_ns: parentCheckpointId,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: checkpoint_ns is a subgraph name pulled from the name of the node in the parent graph rather than a checkpoint id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually a syntax goof on my part. I meant to write [checkpoint_ns]: parentCheckpointId here, as I was basing this off of the TSDoc comment on CheckpointMetadata.parents which reads:

export interface CheckpointMetadata {
   ...
    /**
     * The IDs of the parent checkpoints.
     * Mapping from checkpoint namespace to checkpoint ID.
     */
    parents: Record<string, string>;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to fix my initial syntax goof, but please lmk if I got the semantics wrong.

const baseConfig = {
configurable: {},
};
initializerConfig = mergeConfigs(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can use ensureConfig for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think actually I need to just remove this (along with the configure method on CheckpointSaverTestInitializer).

My initial thinking was that it'd be handy to pass custom config into the checkpointer, e.g. like auth details that can be used for multi-tenant isolation. Unfortunately it doesn't seem that the core langgraph code merges the RunnableConfig passed on invoke when calling the checkpointer, so this logic doesn't hold any value in the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to remove this logic altogether.

@@ -0,0 +1,103 @@
# @langchain/langgraph-checkpoint-validation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a section to the LangGraph docs around contributing! I can do that in a separate PR after we merge this.

await runCLI(
{
_: [],
$0: "",
Copy link
Collaborator

@jacoblee93 jacoblee93 Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we avoid the need for the global by passing the initializer filepath here as an extra option, then do the dynamic import within the test suites? Not sure what the API looks like

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find any nice way to do this. If Jest had proper support for ESMs (and therefore allowed top-level await in test files), I could've made it much cleaner.

FWIW, vitest does support this, and migration from Jest isn't usually much of a drama. Definitely not advocating for that as a result of this PR, but it might be worth keeping it in mind if you hit more of these sort of issues in the future.

Copy link
Collaborator

@jacoblee93 jacoblee93 Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to mongodb_initializer.ts for style

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, and applied this naming convention for all other files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move into testUtils.ts or make clearer that this doesn't contain any test files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


**IMPORTANT**: If you kill the test runner early this function may not be called. To avoid manual clean-up, give preference to test infrastructure management tools like [TestContainers](https://testcontainers.com/guides/getting-started-with-testcontainers-for-nodejs/), as these tools are designed to detect when this happens and clean up after themselves once the controlling process dies.

### (Required) `createSaver`: Construct your checkpoint saver
Copy link
Collaborator

@jacoblee93 jacoblee93 Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uber nit: have been generally calling these checkpointer rather than saver

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wasn't sure what to go with there, so I defaulted to the term used in the type definitions. Easy to switch it out, though.

FWIW, I think it'd make the code a bit more searchable/readable if you were to rename the XyzSaver classes (including BaseCheckpointSaver) to XyzCheckpointer next time you had to make a breaking change in the core checkpoint library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated all names/references/readme text/etc to reflect this naming convention.

@jacoblee93
Copy link
Collaborator

Just ran locally with MemorySaver and it seems great :)

If you don't have time for the style things I'd feel pretty comfortable merging as is/picking them up myself.

@benjamincburns
Copy link
Contributor Author

Just ran locally with MemorySaver and it seems great :)

Glad to hear it! 😅

If you don't have time for the style things I'd feel pretty comfortable merging as is/picking them up myself.

I'll look over the feedback a bit more closely tonight or tomorrow and let you know if I'll need to put any of it off. From a quick glance through I think the only thing that might take a bit of time is the bit about avoiding passing via globals in the dynamic test gen stuff (runner.ts).

Apart from the feedback that you've given, I also have three lingering "nice to haves" that I didn't really have time to address:

  • it'd be nicer if library users didn't need to be using Jest to integrate it directly with their existing test suite
  • it'd also be nicer if the CLI exposed more of the Jest functionality to the user, for configuring things like code coverage, various reporters, etc.
  • It's probably overkill, but it'd give me slightly more confidence if it also ran some e2e tests that exercised the savers at a higher level, perhaps using some code from the examples, etc.

If you agree that these would be useful improvements I can raise some issues for them so they aren't forgotten.

@benjamincburns
Copy link
Contributor Author

benjamincburns commented Oct 18, 2024

Pushed a couple quick fixes for test logic errors that I found while using this to validate my project's custom DynamoDB checkpointer today

  • dropped a put test that was validating an invalid assumption that I made early on 😊
    • it caused my custom checkpointer to throw while building the pending_sends array (parent not found)
    • may be worth adding this back in as a referential integrity failure case, as it seems a bit off that the existing checkpointers allowed this to pass silently?
  • updated a getTuple failure case test to pass an invalid thread_id
    • without this it was essentially duplicating the test just below it

@jacoblee93
Copy link
Collaborator

it'd be nicer if library users didn't need to be using Jest to integrate it directly with their existing test suite

I think it's fine for now, it's up to you but would say wait for a few people to ask

it'd also be nicer if the CLI exposed more of the Jest functionality to the user, for configuring things like code coverage, various reporters, etc.

Same as above

It's probably overkill, but it'd give me slightly more confidence if it also ran some e2e tests that exercised the savers at a higher level, perhaps using some code from the examples, etc.

Yeah this would be nice, perhaps something in line with:

https://github.com/langchain-ai/langgraphjs/blob/main/libs/langgraph/src/tests/pregel.postgres_saver.int.test.ts

@jacoblee93
Copy link
Collaborator

jacoblee93 commented Oct 20, 2024

Seems like some flakiness with testcontainers? Not super familiar with it but am seeing this in CI:

  ● @langchain/langgraph-checkpoint-mongodb › @langchain/langgraph-checkpoint-mongodb#list › {
  description: "should return no tuples when thread_id does not match pushed checkpoint(s), checkpoint_ns matches 'child', limit is 2, before child checkpoint, and filter is undefined",
  thread_id: '1ef8e8ba-72cb-6ca0-fffd-bb7bbb3b207d',
  checkpoint_ns: 'child',
  limit: 2,
  before: [Object],
  filter: undefined,
  expectedCheckpointIds: []
}

    Could not find a working container runtime strategy

      22 |
      23 |   async beforeAll() {
    > 24 |     startedContainer = await container.start();
         |                        ^
      25 |     const connectionString = `mongodb://127.0.0.1:${startedContainer.getMappedPort(
      26 |       27017
      27 |     )}/${dbName}?directConnection=true`;

      at getContainerRuntimeClient (node_modules/testcontainers/src/container-runtime/clients/client.ts:64:9)
      at MongoDBContainer.start (node_modules/testcontainers/src/generic-container/generic-container.ts:81:20)
      at MongoDBContainer.start (node_modules/@testcontainers/mongodb/src/mongodb-container.ts:15:40)
      at Object.beforeAll (libs/checkpoint-validation/src/tests/mongoInitializer.ts:24:24)
      at Object._initializer_beforeAllTimeout (libs/checkpoint-validation/src/spec/index.ts:27:5)

Seems there might be something wrong with the package? I can repro locally, not sure if it's working for you or if it needs additional setup?

@benjamincburns
Copy link
Contributor Author

benjamincburns commented Oct 20, 2024

I can repro locally, not sure if it's working for you or if it needs additional setup?

It does work for me. Do you have docker installed and running when testing locally? If so, you may be encountering a timeout when pulling the image due to the local yarn test applying a 30s timeout via the CLI.

In the GHA log it's failing on the mac runner with the message Could not find a working container runtime strategy. It looks like this is because Docker doesn't ship on the macOS images by default (per actions/runner-images#2150 (comment)). You'll notice that it did pass on the Ubuntu image, however.

In the thread linked above, they did pre-install colima, so if you don't mind the mac build taking an extra minute or two to run, the solution may just be to add brew install docker && colima start to the GHA test run config. That'll install the docker CLI for use with the colima runtime. Otherwise we can ignore these tests in CI when running on mac, mark them as integration tests, or some combination of these options.

@benjamincburns benjamincburns force-pushed the checkpoint-acceptance-tests branch from 9b541b5 to 53211d7 Compare October 20, 2024 21:58
@benjamincburns benjamincburns force-pushed the checkpoint-acceptance-tests branch from 839b908 to aaffbad Compare October 20, 2024 23:13
@benjamincburns
Copy link
Contributor Author

benjamincburns commented Oct 20, 2024

Colima isn't supported on M-series macs in GHA due to a lack of support for nested virtualization (actions/runner-images#9460 (comment)), so I ignored those tests in CI when running on M-series macs.

Unfortunately it seems that they also fail on Windows, likely due to docker desktop not being installed for licensing reasons (edit: yep - windows containers work fine, linux containers don't - actions/runner#904, and TestContainers doesn't support windows containers). I'll see if there's an alternative container runtime available there. If not, I'll also ignore these in CI on Windows.

@benjamincburns benjamincburns force-pushed the checkpoint-acceptance-tests branch 2 times, most recently from b3a4833 to 7611cba Compare October 20, 2024 23:42
@benjamincburns benjamincburns force-pushed the checkpoint-acceptance-tests branch from 7611cba to ef088b9 Compare October 21, 2024 00:42
@benjamincburns
Copy link
Contributor Author

Current HEAD (0ea8824) passes CI. AFAICT this should be ready for re-review.

it'd be nicer if library users didn't need to be using Jest to integrate it directly with their existing test suite

I think it's fine for now, it's up to you but would say wait for a few people to ask

I did wind up taking a small/simple step in this direction by dropping imports from @jest/globals and using the types TS compiler option. That should make it work with any test lib that injects compatible globals.

Copy link
Contributor

@nfcampos nfcampos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, great work!


const writesToParent = [
{
taskId: "pending_sends_task",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the library always uses uuids for taskId, are we using something else here on purpose to test it supports arbitrary strings?

@nfcampos nfcampos merged commit d453cf6 into langchain-ai:main Oct 21, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: Create an acceptance test suite for checkpointers
3 participants