# Piyush Choudhari - Full Content

> AI & Backend Engineer specialized in Agentic Systems, LLM Evaluation, and High-Performance Backend Design.

## Blog Posts

- [Making Agents Play Pictionary](https://piyushchoudhari.me/blog/Making-Agents-Play-Pictionary) (2026-05-17)
  Summary: a turn-based drawing-and-guessing game where AI agents compete

- [My Codebases Have an AI Receptionist Now](https://piyushchoudhari.me/blog/My-Codebases-Have-an-AI-Receptionist-Now) (2026-04-08)
  Summary: I built an agentic codebase navigator that lets engineers and recruiters ask anything about my projects , no RAG, no embeddings, just pure read-and-reason.

- [Implementing An SQS Like Message Queue System](https://piyushchoudhari.me/blog/Implementing-An-SQS-Like-Message-Queue-System) (2026-01-08)
  Summary: I built a rudimentary AWS SQS clone with managed queues, visibilitytimeouts, DLQs.

- [How I Used PostgreSQL, Groq & Vercel's AI SDK To Create A Chatbot For My Portfolio Website](https://piyushchoudhari.me/blog/How-I-Used-Postgres-Groq-And-AI-SDK-To-Create-My-Portfolio-Website-Chatbot) (2025-12-25)
  Summary: My portfolio website contains a decent amount of text data, so I implemented a chatbot to interface with it.

- [Designing a Deterministic, Low Latency Decision Engine with C++](https://piyushchoudhari.me/blog/Designing-a-Deterministic-Low-Latency-Decision-Engine-with-CPP) (2025-12-19)
  Summary: A high performance, deterministic decision tree engine built in C++ that evaluates rules via gRPC with low latency, hot reloadable trees, and production ready observability.

- [Simulating vLLM's PagedAttention](https://piyushchoudhari.me/blog/Simulating-vLLMs-PagedAttention) (2025-12-08)
  Summary: An implementation of a simulation of PagedAttention in Python

- [Making A Peer Review System for My Blogs Using Google-ADK & Mem0](https://piyushchoudhari.me/blog/Making-A-Peer-Review-System-for-My-Blogs-Using-Google-ADK-And-Mem0) (2025-11-26)
  Summary: I needed an automation peer review my blogs, so I used Google-ADK and Mem0 to create an end to end system.

- [Training a Mixture-of-Experts Router](https://piyushchoudhari.me/blog/Training-a-Mixture-of-Experts-Router) (2025-11-17)
  Summary: I was curious about MoEs, so I implemented and tested one.

- [Building a Vector Database from Scratch - CapybaraDB](https://piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB) (2025-11-04)
  Summary: A custom implementation of a toy vector database

- [Be Curious About Your Compute](https://piyushchoudhari.me/blog/Be-Curious-About-Your-Compute) (2025-09-24)
  Summary: After facing a blocker related to hardware, I decided to deep dive into it: An Explainer

- [How I Built an Automated Social Media Workflow with LangGraph](https://piyushchoudhari.me/blog/How-I-Automated-My-Social-Media-Workflow-with-LangGraph) (2025-09-14)
  Summary: For me, posting on social media is a tedious task and repetitive task: draft → edit → get feedback → post. So I used Langgraph and some AI magic sauce to automate away these tedious jobs.

- [SHAP values for GBTs – intuition + how they work internally](https://piyushchoudhari.me/blog/SHAP-values-for-GBTs) (2025-09-01)
  Summary: This blog explores how Gradient Boosted Trees can be explained using PDP, ICE, and SHAP — with TreeSHAP making black-box models transparent and highly interpretable.

- [Welcome to My Blog](https://piyushchoudhari.me/blog/welcome-to-my-blog) (2025-08-29)
  Summary: Introducing my new blog where I'll share insights about software engineering, AI, and technology.

## Projects

- [CapyNodes](https://piyushchoudhari.me/projects/capynodes) (2026)
  Real-time collaborative system design platform with AI-powered evaluation engine.
  Stack: Python, Langchain, Groq, Django, PostgreSQL, LLMs (Qwen/Groq), React Flow, WebSockets.
  Outcome: CapyNodes features a unique multi-stage evaluation pipeline combining rule-based heuristics with LLM Chain-of-Thought (CoT) reasoning. It uses a hybrid Django + Channels architecture to handle both high-frequency cursor updates and persistent state synchronization.
  - Evaluation Latency (Stage 2): <5s
  - Scoring Consistency: >80%
  - Supported Concurrent Users: 50+

- [CapybaraDB](https://piyushchoudhari.me/projects/capybaradb) (2025)
  capybaradb - a toy Vector DB implementation from scratch in Python. Explore Vector DB internals.
  Stack: Python, Pytorch, RAG, Vector Search.
  Outcome: Built from scratch using only numpy and PyTorch for core operations, providing complete transparency into vector database internals. Implements efficient exact search with linear scaling and sub-10ms query latency even at 5000+ vectors.
  - Average Query Latency: <9ms
  - Throughput: 118+ QPS
  - Retrieval Quality (nDCG@10): 0.979

- [Post Automation Agent](https://piyushchoudhari.me/projects/post-automation-agent) (2025)
  A post automation to automatically extract content from my blog and obsidian to turn them into complete posts for LinkedIn and X, along with a peer review agent.
  Stack: Python, Langgraph, Agents.
  Outcome: Built with LangGraph as a stateful multi-agent system orchestrating 13+ specialized agents. Implements self-evaluation and recovery logic with automated peer review loop, integrating Obsidian research notes directly into content generation workflow.
  - Time Saved per Post: 75%
  - Platforms Automated: 3
  - Quality Checks: Automated

- [KnowFlow](https://piyushchoudhari.me/projects/knowflow) (2025)
  KnowFlow is a powerful hybrid Retrieval-Augmented Generation (RAG) system that combines semantic search with knowledge graph capabilities for intelligent document processing and querying.
  Stack: Python, AWS, TypeScript, Neo4j, PostgreSQL, pgvector, Google Gemini, S3, JWT, bcrypt, RAG, Knowledge Graph, Vector Search.
  Outcome: Combines dense vector embeddings with Neo4j knowledge graphs for hybrid retrieval, enabling both semantic similarity and multi-hop reasoning. Implements intelligent query decomposition to extract entities and relationships automatically while maintaining conversation context.
  - Query Types Supported: 3x
  - Context Preservation: Session-based
  - Document Processing: Multi-format

- [TrackML](https://piyushchoudhari.me/projects/trackml) (2025)
  TrackML is a full-stack tool for tracking, managing, and analyzing machine learning models. It supports organizing models, extracting insights using AI, and integrating with external APIs for automation and enrichment.
  Stack: Python, Flask, React, TypeScript, TailwindCSS, SQLAlchemy, PostgreSQL, AWS, Google Gemini, HuggingFace, JWT, Nginx.
  Outcome: Integrates HuggingFace API for auto-filling model metadata and Google Gemini for comparative analysis. Features comprehensive Nginx caching and reverse proxy setup with multi-level cache hierarchy, achieving optimized static asset delivery and API response times on AWS EC2.
  - Data Entry Time: 60% reduction
  - Cache Hit Ratio: 85%+
  - API Response Time: <200ms

## Experience

### AI Engineer Intern @ Flytbase (Apr 2026 - Present)


### AI Engineer Intern @ Ronin Labs (Jan 2025 - Apr 2026)

- **Multimodal Content Generation System**
  Architected an agentic system for multimodal content generation using LangGraph and OpenAI APIs, with a central orchestration agent dynamically routing tasks to image, video, and audio pipelines.
  Developed image outpainting and depth-effect video pipelines leveraging SDXL and ControlNet models. Built music generation modules integrating MusicGen.
  Optimized processing pipeline with concurrent GPU task execution and state-managed workflows via a Flask backend, reducing end-to-end task times to under 20 seconds.

- **Cefaly**
  Trained a YOLOv8n-pose model for real-time sticker/electrode detection and key point estimation, enabling accurate localization for Cefaly's mobile-based computer vision pipeline.
  Trained the custom pose estimation model on a dataset of ~1800 images, resulting in mAP50 of 0.992, mAP50-95 of 0.814 (box) and 0.843 (pose), with peak F1-score of 0.98 at confidence threshold 0.565.
  Optimized the real-time inference pipeline for a Unity-based deeplinking application, reducing end-to-end latency to ~60ms.

- **Sketchbot**
  Created a FastAPI backend with SDXL Turbo, ControlNets, and OpenCV for line-art generation and Dexarm G-code optimization, cutting draw times by 70%.
  Integrated MongoDB and S3 for user and asset management; achieved <200ms API response under load.
  Deployed on AWS GPU instances, consolidated servers for 30% lower infra costs, and automated disaster recovery for 95% less downtime.

- **OnePlus 13s Quest Quiz**
  Built scalable Node.js/TypeScript/TypeORM APIs for 150K+ users with <120ms latency.
  Developed a Django admin panel and managed GCP Cloud SQL for high-availability production ops.