Features
Dataset Generator comes with a powerful set of features to help you produce high-quality synthetic ticket data:
Graph-Based Pipeline
Directed AI “nodes”: Each processing step (topic → email → tags → paraphrase → translate → response) is represented by a node.
Flexible topology: Easily reorder, add or remove nodes to craft custom pipelines.
Modular Assistants
Configurable prompts: Swap in different system- and user-prompts for each Assistant.
Multiple model support: Use any OpenAI / OpenRouter / Together provider and any chat model (GPT-4, Qwen, LLaMA, etc.).
Rich Ticket Attributes
Subject & Body
Ticket Type (Incident, Request, Problem, Change)
Queue (Technical Support, Billing, HR, …)
Priority (Low, Medium, High)
Language (DE, EN, FR, ES, PT)
Tags (4–8 topic tags per ticket)
First-response simulation by an AI agent
Extensibility & Customization
Randomized inputs: Built-in collections for language, priority, queue descriptions.
Unique ID generator: Guaranteed unique ticket IDs in a 12–13 digit range.
Custom fields: Add your own data classes, enums, or RandomCollection tables.
Cost & Usage Analysis
Per-run cost tracking: Monitors token counts and cost by model (input vs. output).
Currency conversion: Summaries in USD, EUR or your chosen currency.
Automated logging: Warnings/errors when a run exceeds cost thresholds.
Built-in CLI & Config
Config file (config/config.py): Define run size, batch size, translation nodes, time zone.
Dev vs. Prod modes: Toggle between small test runs and full-scale dataset generation.
Get started with the :doc:configuration <config> guide and see the full :doc:architecture overview. Enjoy creating realistic, varied ticket datasets in minutes!```