Features

Dataset Generator comes with a powerful set of features to help you produce high-quality synthetic ticket data:

Graph-Based Pipeline

  • Directed AI “nodes”: Each processing step (topic → email → tags → paraphrase → translate → response) is represented by a node.

  • Flexible topology: Easily reorder, add or remove nodes to craft custom pipelines.

Modular Assistants

  • Configurable prompts: Swap in different system- and user-prompts for each Assistant.

  • Multiple model support: Use any OpenAI / OpenRouter / Together provider and any chat model (GPT-4, Qwen, LLaMA, etc.).

Rich Ticket Attributes

  • Subject & Body

  • Ticket Type (Incident, Request, Problem, Change)

  • Queue (Technical Support, Billing, HR, …)

  • Priority (Low, Medium, High)

  • Language (DE, EN, FR, ES, PT)

  • Tags (4–8 topic tags per ticket)

  • First-response simulation by an AI agent

Extensibility & Customization

  • Randomized inputs: Built-in collections for language, priority, queue descriptions.

  • Unique ID generator: Guaranteed unique ticket IDs in a 12–13 digit range.

  • Custom fields: Add your own data classes, enums, or RandomCollection tables.

Cost & Usage Analysis

  • Per-run cost tracking: Monitors token counts and cost by model (input vs. output).

  • Currency conversion: Summaries in USD, EUR or your chosen currency.

  • Automated logging: Warnings/errors when a run exceeds cost thresholds.

Built-in CLI & Config

  • Config file (config/config.py): Define run size, batch size, translation nodes, time zone.

  • Dev vs. Prod modes: Toggle between small test runs and full-scale dataset generation.

  • One-line dataset build: ```bash python -m ticket_generator


Get started with the :doc:configuration <config> guide and see the full :doc:architecture overview. Enjoy creating realistic, varied ticket datasets in minutes!```