# Piyush Choudhari - Full Content > AI & Backend Engineer specialized in Agentic Systems, LLM Evaluation, and High-Performance Backend Design. ## Blog Posts - [Making Agents Play Pictionary](https://piyushchoudhari.me/blog/Making-Agents-Play-Pictionary) (2026-05-17) Summary: a turn-based drawing-and-guessing game where AI agents compete - [My Codebases Have an AI Receptionist Now](https://piyushchoudhari.me/blog/My-Codebases-Have-an-AI-Receptionist-Now) (2026-04-08) Summary: I built an agentic codebase navigator that lets engineers and recruiters ask anything about my projects , no RAG, no embeddings, just pure read-and-reason. - [Implementing An SQS Like Message Queue System](https://piyushchoudhari.me/blog/Implementing-An-SQS-Like-Message-Queue-System) (2026-01-08) Summary: I built a rudimentary AWS SQS clone with managed queues, visibilitytimeouts, DLQs. - [How I Used PostgreSQL, Groq & Vercel's AI SDK To Create A Chatbot For My Portfolio Website](https://piyushchoudhari.me/blog/How-I-Used-Postgres-Groq-And-AI-SDK-To-Create-My-Portfolio-Website-Chatbot) (2025-12-25) Summary: My portfolio website contains a decent amount of text data, so I implemented a chatbot to interface with it. - [Designing a Deterministic, Low Latency Decision Engine with C++](https://piyushchoudhari.me/blog/Designing-a-Deterministic-Low-Latency-Decision-Engine-with-CPP) (2025-12-19) Summary: A high performance, deterministic decision tree engine built in C++ that evaluates rules via gRPC with low latency, hot reloadable trees, and production ready observability. - [Simulating vLLM's PagedAttention](https://piyushchoudhari.me/blog/Simulating-vLLMs-PagedAttention) (2025-12-08) Summary: An implementation of a simulation of PagedAttention in Python - [Making A Peer Review System for My Blogs Using Google-ADK & Mem0](https://piyushchoudhari.me/blog/Making-A-Peer-Review-System-for-My-Blogs-Using-Google-ADK-And-Mem0) (2025-11-26) Summary: I needed an automation peer review my blogs, so I used Google-ADK and Mem0 to create an end to end system. - [Training a Mixture-of-Experts Router](https://piyushchoudhari.me/blog/Training-a-Mixture-of-Experts-Router) (2025-11-17) Summary: I was curious about MoEs, so I implemented and tested one. - [Building a Vector Database from Scratch - CapybaraDB](https://piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB) (2025-11-04) Summary: A custom implementation of a toy vector database - [Be Curious About Your Compute](https://piyushchoudhari.me/blog/Be-Curious-About-Your-Compute) (2025-09-24) Summary: After facing a blocker related to hardware, I decided to deep dive into it: An Explainer - [How I Built an Automated Social Media Workflow with LangGraph](https://piyushchoudhari.me/blog/How-I-Automated-My-Social-Media-Workflow-with-LangGraph) (2025-09-14) Summary: For me, posting on social media is a tedious task and repetitive task: draft → edit → get feedback → post. So I used Langgraph and some AI magic sauce to automate away these tedious jobs. - [SHAP values for GBTs – intuition + how they work internally](https://piyushchoudhari.me/blog/SHAP-values-for-GBTs) (2025-09-01) Summary: This blog explores how Gradient Boosted Trees can be explained using PDP, ICE, and SHAP — with TreeSHAP making black-box models transparent and highly interpretable. - [Welcome to My Blog](https://piyushchoudhari.me/blog/welcome-to-my-blog) (2025-08-29) Summary: Introducing my new blog where I'll share insights about software engineering, AI, and technology. ## Projects - [CapyNodes](https://piyushchoudhari.me/projects/capynodes) (2026) Real-time collaborative system design platform with AI-powered evaluation engine. Stack: Python, Langchain, Groq, Django, PostgreSQL, LLMs (Qwen/Groq), React Flow, WebSockets. Outcome: CapyNodes features a unique multi-stage evaluation pipeline combining rule-based heuristics with LLM Chain-of-Thought (CoT) reasoning. It uses a hybrid Django + Channels architecture to handle both high-frequency cursor updates and persistent state synchronization. - Evaluation Latency (Stage 2): <5s - Scoring Consistency: >80% - Supported Concurrent Users: 50+ - [CapybaraDB](https://piyushchoudhari.me/projects/capybaradb) (2025) capybaradb - a toy Vector DB implementation from scratch in Python. Explore Vector DB internals. Stack: Python, Pytorch, RAG, Vector Search. Outcome: Built from scratch using only numpy and PyTorch for core operations, providing complete transparency into vector database internals. Implements efficient exact search with linear scaling and sub-10ms query latency even at 5000+ vectors. - Average Query Latency: <9ms - Throughput: 118+ QPS - Retrieval Quality (nDCG@10): 0.979 - [Post Automation Agent](https://piyushchoudhari.me/projects/post-automation-agent) (2025) A post automation to automatically extract content from my blog and obsidian to turn them into complete posts for LinkedIn and X, along with a peer review agent. Stack: Python, Langgraph, Agents. Outcome: Built with LangGraph as a stateful multi-agent system orchestrating 13+ specialized agents. Implements self-evaluation and recovery logic with automated peer review loop, integrating Obsidian research notes directly into content generation workflow. - Time Saved per Post: 75% - Platforms Automated: 3 - Quality Checks: Automated - [KnowFlow](https://piyushchoudhari.me/projects/knowflow) (2025) KnowFlow is a powerful hybrid Retrieval-Augmented Generation (RAG) system that combines semantic search with knowledge graph capabilities for intelligent document processing and querying. Stack: Python, AWS, TypeScript, Neo4j, PostgreSQL, pgvector, Google Gemini, S3, JWT, bcrypt, RAG, Knowledge Graph, Vector Search. Outcome: Combines dense vector embeddings with Neo4j knowledge graphs for hybrid retrieval, enabling both semantic similarity and multi-hop reasoning. Implements intelligent query decomposition to extract entities and relationships automatically while maintaining conversation context. - Query Types Supported: 3x - Context Preservation: Session-based - Document Processing: Multi-format - [TrackML](https://piyushchoudhari.me/projects/trackml) (2025) TrackML is a full-stack tool for tracking, managing, and analyzing machine learning models. It supports organizing models, extracting insights using AI, and integrating with external APIs for automation and enrichment. Stack: Python, Flask, React, TypeScript, TailwindCSS, SQLAlchemy, PostgreSQL, AWS, Google Gemini, HuggingFace, JWT, Nginx. Outcome: Integrates HuggingFace API for auto-filling model metadata and Google Gemini for comparative analysis. Features comprehensive Nginx caching and reverse proxy setup with multi-level cache hierarchy, achieving optimized static asset delivery and API response times on AWS EC2. - Data Entry Time: 60% reduction - Cache Hit Ratio: 85%+ - API Response Time: <200ms ## Experience ### AI Engineer Intern @ Flytbase (Apr 2026 - Present) ### AI Engineer Intern @ Ronin Labs (Jan 2025 - Apr 2026) - **Multimodal Content Generation System** Architected an agentic system for multimodal content generation using LangGraph and OpenAI APIs, with a central orchestration agent dynamically routing tasks to image, video, and audio pipelines. Developed image outpainting and depth-effect video pipelines leveraging SDXL and ControlNet models. Built music generation modules integrating MusicGen. Optimized processing pipeline with concurrent GPU task execution and state-managed workflows via a Flask backend, reducing end-to-end task times to under 20 seconds. - **Cefaly** Trained a YOLOv8n-pose model for real-time sticker/electrode detection and key point estimation, enabling accurate localization for Cefaly's mobile-based computer vision pipeline. Trained the custom pose estimation model on a dataset of ~1800 images, resulting in mAP50 of 0.992, mAP50-95 of 0.814 (box) and 0.843 (pose), with peak F1-score of 0.98 at confidence threshold 0.565. Optimized the real-time inference pipeline for a Unity-based deeplinking application, reducing end-to-end latency to ~60ms. - **Sketchbot** Created a FastAPI backend with SDXL Turbo, ControlNets, and OpenCV for line-art generation and Dexarm G-code optimization, cutting draw times by 70%. Integrated MongoDB and S3 for user and asset management; achieved <200ms API response under load. Deployed on AWS GPU instances, consolidated servers for 30% lower infra costs, and automated disaster recovery for 95% less downtime. - **OnePlus 13s Quest Quiz** Built scalable Node.js/TypeScript/TypeORM APIs for 150K+ users with <120ms latency. Developed a Django admin panel and managed GCP Cloud SQL for high-availability production ops.