This is a premium, high-fidelity human vocal dataset specifically engineered for training clinical Automatic Speech Recognition (ASR), medical Text-to-Speech (TTS), and healthcare-focused conversational AI models. This specialized dataset contains highly complex medical terminology, pharmaceutical drug names, clinical jargon, and anatomical strings performed by professional voice talent in a dedicated, treated studio environment. Technical Specifications Audio Format: Uncompressed, raw .wav format Audio Quality: Studio-grade 48kHz / 24-bit sampling rate Vocal Characteristics: Completely dry, natural human speech dynamics, zero digital compression, and an ultra-low room noise floor. Pipeline Delivery: Packaged into a single comprehensive .zip archive containing the audio directory and matching mapping framework. Dataset Schema & Structure The primary mapping spreadsheet is fully formatted to match standard speech-processing framework conventions (LJ Speech equivalent). Every audio file maps to two distinct cryptographic text layers to eliminate preprocessing overhead for machine learning teams: id: Unique alphanumeric tracker directly matching the corresponding audio filename. raw_text: The standard human-readable format of the medical text (retains standard capitalization, punctuation, medical symbols, and numerical values). normalized_text: The completely expanded, purely phonetic text rendering matching exactly how the clinical terms are spoken (all numbers, symbols, abbreviations, and pharmacological acronyms are fully spelled out textually). Available Licensing Tiers We offer two distinct commercial deployment tracks for this dataset, depending on your organization's infrastructure requirements and user scale. Feature / Allowance 🥉 Tier 1: Indie Developer ($149.00) 🥇 Tier 2: Enterprise ($1,499.00) Deployment Rights Embedded vocal UI/UX, software dev, video games, local fine-tuning, live client demos. Advanced IVR networks, smart-assistants, automated medical scribes, software interfaces. Application Limits Valid for one (1) proprietary application or call-bot system. Unlimited multi-platform commercial deployment rights. Scaling & Traffic Caps Strictly capped at 100,000 MAUs or 100,000 call sessions. Completely Uncapped. No MAU limits, concurrency caps, or bottlenecks. Generative TTS Restrictions Excludes foundational generative multi-tenant TTS model training. Excludes foundational generative multi-tenant TTS model training. How to Purchase Buy instantly below via Ouro's secure checkout. Purchase directly through our official storefront or contact via email. Full License Parameters Tier 1: Indie Developer License ($149.00) Purchasing this product directly via Ouro grants your business a Tier 1 Indie Developer License: Scope: Perpetual commercial use for embedded vocal UI/UX, software development, video games, local model fine-tuning, or live client sales demos. Limitations: Valid for up to one (1) proprietary application or call-bot system. Strictly capped at a scale of 100,000 Monthly Active Users (MAUs) or 100,000 call sessions. Exclusions: Excludes foundational generative multi-tenant TTS model training. Tier 2: Enterprise License ($1,499.00) To lift all scaling restrictions and deploy across larger networks, your organization requires a Tier 2 Enterprise Training License: Scope: Perpetual, worldwide commercial deployment rights for advanced IVR networks, smart-assistants, automated medical scribes, and software interfaces. Scale: Completely Lifts All Scaling Restrictions. No Monthly Active User (MAU) caps, no concurrency limitations, and no infrastructure bottlenecks. Exclusions: Excludes foundational, multi-tenant generative TTS foundation engine training. 👉 To secure the Tier 2 Enterprise License ($1,499.00), please purchase directly through our official storefront or contact us at [email protected]. (For complete copyright assignments or full custom model buyouts, please reach out directly via email to initiate terms). Licensing & Commercial Usage License Type: Commercial Use, Non-Exclusive Digital Data License (All Rights Reserved). Permitted Usage: Granted instantly upon purchase. The Buyer is fully authorized to utilize this dataset (including all phonetic metadata and master audio assets) for commercial machine learning model training, algorithmic benchmarking, weights development, and internal R&D. Restrictions: Redistribution, sub-licensing, repackaging, or public resale of the raw audio tracks or text metadata sheets (in whole or in part) is strictly prohibited. All original master recordings remain the sole intellectual property of the creator.
Curating ethically sourced, low-latency Text-to-Speech (TTS) datasets, Voice Cloning, and Telephony Datasets, while producing LJ Speech Format.
No organizations yet