SceneLine: AI-Powered Dubbing Practice Platform
SceneLine: AI-Powered Dubbing Practice Platform π¬

I recently released a new project called SceneLine β an AI-powered dubbing practice platform that enables language learners to practice dubbing through immersive film/TV scene dialogues.
Why I Built This
The hardest part of language learning is often not vocabulary or grammar, but language sense β that natural, authentic way of expression. Traditional listening and reading exercises struggle to develop this, but dubbing practice is a great solution:
- Immersive scenarios β Real movie/TV dialogues, not textbook sentences
- Multi-character interactions β Different tones, emotions, and pacing
- Instant feedback β Know whether your pronunciation is accurate
But traditional dubbing practice has a pain point: no feedback. You say the lines along with the video, but donβt know if youβre doing it right.
SceneLine aims to solve this.
Core Features
ποΈ Real-time ASR (Speech Recognition)
- Uses FunASR for speech recognition
- Resident process mode, 10x performance optimization
- Real-time comparison of your pronunciation against the original
π 40+ TTS Voices
- Microsoft Edge TTS support
- 40+ voice options
- Filter by gender/locale to find the best reference voice
π Multi-character Dialogue Practice
- Supports multi-role scenes (like Friends dialogues)
- Individual scoring for each character
- Practice multiple roles by yourself
π Practice History & Statistics
- Three view modes: Overview / By Script / Details
- Track your progress curve
- Identify weak areas
π Smart Deduplication
- Content hash-based script deduplication
- Automatically merges identical content to avoid repetition
Tech Stack
Frontend
- React + Vite β Fast development experience
- Tailwind CSS β Clean UI design
- TypeScript β Type safety
Backend
- Express + TypeScript β API service
- FunASR β Core speech recognition
- node-edge-tts β TTS wrapper
AI/ML
- FunASR β Alibaba DAMO Academyβs open-source ASR framework
- ModelScope β Model hub
- faster-whisper β Accelerated Whisper ASR
Quick Start
One-click Launch (Recommended)
git clone https://github.com/hugcosmos/SceneLine.git
cd SceneLine
./start.sh
First launch will:
- Ask if youβre in mainland China (auto-configures mirror sources)
- Download ASR model (~2GB, takes 6-9 minutes first time)
Then visit http://localhost:5000
Docker Deployment
docker-compose up -d
System Requirements
- Node.js: 20+
- Python: 3.9-3.11 (ASR dependencies, torch doesnβt support 3.12+)
- Memory: Minimum 4GB (ASR model uses ~2GB)
- Disk: 3GB+ free space
- FFmpeg: For audio format conversion
Project Structure
sceneline/
βββ server/ # Backend (Express + TypeScript)
β βββ lib/ # Core libraries (ASR, TTS)
β βββ routes/ # API routes
βββ client/ # Frontend (React + Vite + Tailwind)
β βββ src/pages/ # Page components
βββ shared/ # Shared type definitions
βββ models/ # ASR model cache
βββ tts-cache/ # TTS audio cache
βββ docker-compose.yml
License
MIT License β Fully open source, contributions welcome.
Future Plans
- Multi-user mode β Support multiple users practicing simultaneously with real-time comparison
- Streaming ASR β Faster real-time recognition with lower latency
- Intelligent scoring system β More systematic and human-friendly scoring mechanism
- TTS upgrade β Support for richer voice options
- Multiple TTS providers β Integration with more API providers (ElevenLabs, iFlytek, Baidu, etc.)
Links
- GitHub: github.com/hugcosmos/SceneLine
Made with π by Nicky & AI
π¬ Discuss on GitHub