Architecture
The Dataset Generator is organized as a modular, graph-based pipeline. Each ticket is created by passing a shared context (“Value Objects”) through a directed graph of AI-powered “nodes,” with results cached to avoid re-execution. The high-level flow looks like this:
flowchart TD classDef expensiveModel stroke:#d55,stroke-width:2px classDef cheapModel stroke:#5d5,stroke-width:2px A0[Start] --> TopicGen TopicGen[Topic Generation] --> Email Email[Email Text] --> Answer Answer[Answer Text] --> Tags Tags[Add Tags] --> Rephrase Rephrase --> Translate Translate --> End class TopicGen,Email,Answer expensiveModel class Tags,Rephrase,Translate cheapModel
Graph & Nodes
GraphRunner kicks off execution by listing all end-nodes.
Graph tracks parent→child links; each Node first checks its Cache then delegates to the private _ExecutionNode.
Storage backs the pipeline’s shared context so downstream nodes can load upstream results.
Assistants & Data Models
AIAssistant is your abstract wrapper around the LLM; AINode invokes it with a PromptBuilder.
InputType / OutputType implement a uniform dataclass interface (via Pydantic) for JSON serialization.
Randomized Inputs
RandomCollection perturbs base weights by up to a factor, then samples.
RandomTable ties one enum key to a collection of other-enum values (e.g. TicketType → random Priority).
Cost & Usage Analysis
AssistantRun captures tokens & cost for one run_openai.
AssistantRunsAnalyzer aggregates runs per assistant or batch.
DatasetGenAnalyzer produces the final, currency-converted summary with percentage shares.