Architecture

The Dataset Generator is organized as a modular, graph-based pipeline. Each ticket is created by passing a shared context (“Value Objects”) through a directed graph of AI-powered “nodes,” with results cached to avoid re-execution. The high-level flow looks like this:

        flowchart TD
  classDef expensiveModel stroke:#d55,stroke-width:2px
  classDef cheapModel   stroke:#5d5,stroke-width:2px

  A0[Start]           --> TopicGen
  TopicGen[Topic Generation] --> Email
  Email[Email Text]   --> Answer
  Answer[Answer Text] --> Tags
  Tags[Add Tags]      --> Rephrase
  Rephrase            --> Translate
  Translate           --> End

  class TopicGen,Email,Answer expensiveModel
  class Tags,Rephrase,Translate cheapModel
    

Graph & Nodes

@startuml
package ai_graph {
  package graph {
    class RunInformation <<ValueObject>> {
      + number_batches
      + runs_per_batch
      + total_number_runs
    }
    class GraphRunner {
      + run(): List<Any>
    }
    class Graph {
      - endNodes: List<INode>
    }
    class Node {
      - parents: List<INode>
      - was_executed: bool
      + {async} execute(): Any
    }
    class _ExecutionNode <<private>> {
      + execute(inputs): Any
    }
    class Cache <<private>> {
      - stored_result: Any
    }
    GraphRunner --> Graph
    Graph --> Node
    Node --> _ExecutionNode
    Node --> Cache
  }
  class AINode extends graph.Node {
    - chat_assistant: ChatAssistant
    + {async} execute(inputs)
  }
  class Storage {
    - data: Map<String,Object>
    + save(data)
    + load(): Object
  }
}
@enduml

  1. GraphRunner kicks off execution by listing all end-nodes.

  2. Graph tracks parent→child links; each Node first checks its Cache then delegates to the private _ExecutionNode.

  3. Storage backs the pipeline’s shared context so downstream nodes can load upstream results.

Assistants & Data Models

@startuml
package ai_graph {
  class AIAssistant {
    + get_response(prompt): string
  }
  class InputType <<DataModel>> {
    + get_description(): string
  }
  class OutputType <<DataModel>> {
    + get_description(): string
  }
}
@enduml

  • AIAssistant is your abstract wrapper around the LLM; AINode invokes it with a PromptBuilder.

  • InputType / OutputType implement a uniform dataclass interface (via Pydantic) for JSON serialization.

Randomized Inputs

@startuml
package random {
  interface IRandom<V> {
    + get_random(): V
  }
  class RandomCollection<V> implements IRandom<V> {
    - items: List<V>
    - weights: List<float>
    + get_random(): V
  }
  class RandomSet<V> implements IRandom<V> {
    - items: List<V>
    - weights: List<float>
    + get_random(): V
  }
  class RandomTable<K,V> implements IRandom<V> {
    - rows: Map<K,RandomCollection<V>>
    + get_random(key:K): V
  }
}
@enduml

  • RandomCollection perturbs base weights by up to a factor, then samples.

  • RandomTable ties one enum key to a collection of other-enum values (e.g. TicketType → random Priority).

Cost & Usage Analysis

@startuml
package analysis {
  interface Analyzer {
    + analyze(): AnalysisResult
  }
  class AssistantRun implements Analyzer {
    + get_cost_analysis(): AnalysisResult
  }
  class AssistantRunsAnalyzer implements Analyzer {
    + get_cost_analysis(): AnalysisResult
  }
  class DatasetGenAnalyzer implements Analyzer {
    + get_cost_analysis(): AnalysisResult
  }
  class AnalysisResult {
    + prompt_tokens: int
    + prompts_cost: Money
    + completion_tokens: int
    + completions_cost: Money
  }
}
@enduml

  • AssistantRun captures tokens & cost for one run_openai.

  • AssistantRunsAnalyzer aggregates runs per assistant or batch.

  • DatasetGenAnalyzer produces the final, currency-converted summary with percentage shares.