experience
Multimodal Content Generation System
- Architected an agentic system for multimodal content generation using LangGraph and OpenAI APIs, with a central orchestration agent dynamically routing tasks to image, video, and audio pipelines.
- Developed image outpainting and depth-effect video pipelines leveraging SDXL and ControlNet models. Built music generation modules integrating MusicGen.
- Optimized processing pipeline with concurrent GPU task execution and state-managed workflows via a Flask backend, reducing end-to-end task times to under 20 seconds.
Cefaly Link
- Trained a YOLOv8n-pose model for real-time sticker/electrode detection and key point estimation, enabling accurate localization for Cefaly's mobile-based computer vision pipeline.
- Trained the custom pose estimation model on a dataset of ~1800 images, resulting in mAP50 of 0.992, mAP50-95 of 0.814 (box) and 0.843 (pose), with peak F1-score of 0.98 at confidence threshold 0.565.
- Optimized the real-time inference pipeline for a Unity-based deeplinking application, reducing end-to-end latency to ~60ms.
Sketchbot Link
- Created a FastAPI backend with SDXL Turbo, ControlNets, and OpenCV for line-art generation and Dexarm G-code optimization, cutting draw times by 70%.
- Integrated MongoDB and S3 for user and asset management; achieved <200ms API response under load.
- Deployed on AWS GPU instances, consolidated servers for 30% lower infra costs, and automated disaster recovery for 95% less downtime.
OnePlus 13s Quest Quiz Link
- Built scalable Node.js/TypeScript/TypeORM APIs for 150K+ users with <120ms latency.
- Developed a Django admin panel and managed GCP Cloud SQL for high-availability production ops.