feat: add execution graph builder plan with reference implementation #269
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Experimenting with this plan-first approach!
Main goal is to build out a graph of all the cells that need to be generated along with their dependencies. The idea would be to have an execution engine fetch work from this graph when it is ready.
plans/async-engine/execution_graph_builder.plan.mdplans/async-engine/explore_execution_graph.ipynb- Interactive notebook exploring the implementationPart of #260
Closes #267
NOTE: The reference implementation is just meant to help build some intuition for what a concrete implementation might look like.
📋 Summary
This PR introduces the Execution Graph Builder plan document that designs a memory-efficient graph-based execution framework for async dataset generation. A reference implementation is included to build intuition and validate the design decisions—the plan document should be the primary focus for review.
🎯 Focus: The Plan Document
The core of this PR is
plans/async-engine/execution_graph_builder.plan.mdwhich details:START,CELL_BY_CELL,ROW_STREAMABLE,BARRIERinferred from generator properties🔄 Changes
✨ Added
Plan & Exploration:
plans/async-engine/execution_graph_builder.plan.md- Detailed design document for the execution graph frameworkplans/async-engine/explore_execution_graph.ipynb- Interactive notebook exploring the implementationReference Implementation (in
packages/data-designer-engine/src/data_designer/engine/execution_graph/):node_id.py-CellNodeId,BarrierNodeId, andNodeIdtype aliastraits.py-ExecutionTraitsFlag enumcolumn_descriptor.py- Column metadata with traits and dependenciesgraph.py-ExecutionGraphwith hybrid representationbuilder.py-GraphBuilderfactory with trait inferencecompletion.py-CompletionTrackerandThreadSafeCompletionTrackerGenerator Properties:
is_row_streamableproperty toColumnGeneratorbase class and overrides in subclassesTests (in
packages/data-designer-engine/tests/engine/execution_graph/):🔧 Changed
ModelConfigand schema transform processor🔍 Attention Areas
execution_graph_builder.plan.md- Primary review target: Does the design adequately address async execution requirements?graph.py- Core hybrid representation logicbuilder.py#L175-L210- Trait inference logic (plugin-compatible approach)🤖 Generated with AI