Skip to content

Conversation

@johnnygreco
Copy link
Contributor

@johnnygreco johnnygreco commented Jan 29, 2026

Experimenting with this plan-first approach!

Main goal is to build out a graph of all the cells that need to be generated along with their dependencies. The idea would be to have an execution engine fetch work from this graph when it is ready.

Part of #260
Closes #267

NOTE: The reference implementation is just meant to help build some intuition for what a concrete implementation might look like.

📋 Summary

This PR introduces the Execution Graph Builder plan document that designs a memory-efficient graph-based execution framework for async dataset generation. A reference implementation is included to build intuition and validate the design decisions—the plan document should be the primary focus for review.

🎯 Focus: The Plan Document

The core of this PR is plans/async-engine/execution_graph_builder.plan.md which details:

  • Hybrid graph representation: O(C) memory for millions of records (cell nodes computed on-demand)
  • Generator execution traits: START, CELL_BY_CELL, ROW_STREAMABLE, BARRIER inferred from generator properties
  • Cell-level dependency resolution: Enables fine-grained async execution
  • Thread-safe completion tracking: Separate trackers for asyncio vs thread pool execution

🔄 Changes

✨ Added

Plan & Exploration:

Reference Implementation (in packages/data-designer-engine/src/data_designer/engine/execution_graph/):

Generator Properties:

Tests (in packages/data-designer-engine/tests/engine/execution_graph/):

  • Comprehensive test suite covering graph building, node iteration, dependency resolution, and completion tracking

🔧 Changed

  • Minor updates to ModelConfig and schema transform processor

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:


🤖 Generated with AI

This introduces a design plan for a memory-efficient execution graph
that models cell-level dependencies for async dataset generation.
The reference implementation is included to help build intuition
about how the concepts could work in practice.

The plan is the primary artifact - the code is exploratory.
@johnnygreco johnnygreco requested a review from a team as a code owner January 29, 2026 21:16
@johnnygreco johnnygreco changed the base branch from main to jena/dev/async-engine January 29, 2026 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants