Introduction

Welcome to Dataset Generator, a flexible, extendable pipeline for creating realistic synthetic ticket datasets using AI assistants. Whether you need sample support tickets for testing, machine-learning training, or analytics, Dataset Generator lets you configure every step—from subject generation to multilingual translation and first-response simulation.

Key Concepts

  • Graph-based pipeline Each ticket is generated by a directed graph of AI “nodes” (assistants). Outputs of one node feed into the next, enabling complex workflows such as: 1. Topic generation 2. Email text creation 3. Tag assignment 4. Paraphrasing & translation 5. First-response drafting

  • Modular Assistants Every step is powered by a configurable Assistant class. You can swap models, prompts, or even injection-decorate runs for cost tracking.

  • Rich Ticket Attributes Generate tickets with full metadata: - Subject & Body - Type (Incident, Request, etc.) - Queue (Technical Support, Billing, etc.) - Priority (low, medium, high) - Language (EN, DE, FR, ES, PT) - Tags (4–8 relevant topics) - First-answer simulation

  • Cost Analysis Automatically measure token usage and per-token costs across runs, aggregate by Assistant, and output summaries in your chosen currency.

Getting Started

  1. Configure your project in config/config.py: - Number of tickets - Translation nodes per ticket - Models, prompts, and providers

  2. Build the docs ```bash cd docs make html

Dataset Generator on GitHub