Show HN: Misata – synthetic data engine using LLM and Vectorized NumPy
4 days ago
- #synthetic-data
- #machine-learning
- #data-generation
- Misata allows generating realistic multi-table datasets from natural language descriptions without needing to write schemas or provide training data.
- It supports features like auto schema generation, relational integrity, business constraints, and streaming for large datasets (10M+ rows).
- Installation is simple via pip, and it supports multiple LLM providers like Groq, OpenAI, and Ollama.
- Users can generate data for various scenarios like SaaS, e-commerce, and fitness apps with customizable options.
- Misata includes advanced features like noise injection, custom distributions, and conditional values for realistic data generation.
- Performance metrics show high-speed data generation, capable of handling millions of rows efficiently.
- It offers a browser-based trial and enterprise support for complex scenarios.
- Developed by Muhammed Rasin under MIT License.