
BirdSR
ML app that can identify 50 North American bird species from audio recordings. Features real-time recording and achieves 97% accuracy on clean audio. Hosted on HuggingFace.

Core Features
Audio Processing
- • Audio upload (WAV/MP3) and real-time microphone recording
- • Intelligent processing with basic and advanced modes
- • Automatic format conversion and resampling
- • Multi-segment aggregation for longer files
- • Quality scoring and intelligent segment selection
Machine Learning
- • 88% accuracy on curated segments, 92.4% validation accuracy
- • YAMNet backbone fine-tuned for bird sounds
- • GRU architecture with batch normalization
- • 50 North American bird species support
- • Production deployment on HuggingFace Spaces
Technical Stack
Frontend
- • Next.js 15, React 19, TailwindCSS 4
- • Mobile-friendly responsive design
- • Visual feedback and processing insights
- • Real-time recording with MediaRecorder API
- • Error handling with user-friendly feedback
Backend & ML
- • HuggingFace Spaces API integration
- • Flask-based REST API for inference
- • YAMNet embeddings with GRU architecture
- • Audio processing with music-metadata library
- • Axios with exponential backoff retry logic
Model Training Pipeline
Data Collection & Processing
- • 50 North American bird species from eBird recordings
- • ~20GB dataset with quality filtering
- • 5-second audio segments with 1.5-second stride
- • JSON-based metadata tracking
- • Automated download pipeline
Training & Performance
- • 400 epochs with AdamW optimizer (lr=5e-5)
- • Audio augmentation (noise, pitch, time stretching)
- • Checkpoint-based training with state persistence
- • 88% top-1 accuracy, 96.7% top-5 accuracy
- • Batch size 32 with comprehensive validation
Supported Species (50 total)
Core backyard birds (Northern Cardinal, American Robin, Blue Jay), raptors (Red-tailed Hawk, Great Horned Owl), warblers, sparrows, water birds, and specialized species like Ruby-throated Hummingbird and Cedar Waxwing.