🤖
Feb 2026
LLM Document Q&A — RAG Chatbot
Upload any PDF and ask questions in plain English. Retrieval-Augmented Generation pipeline powered by LangChain, OpenAI, and FAISS. Deployed on Streamlit Cloud.
RAG
LangChain
FAISS
OpenAI
Streamlit
🎯
Mar 2026
AI Resume Job Matcher
Semantic matching system that scores your resume against any job description using Sentence Transformers and LLM-powered skill gap analysis with actionable suggestions.
NLP
Sentence Transformers
GPT-4
Streamlit
☕
Nov 2025 – Jan 2026
AI Coffee Demand Predictor
ML forecasting model integrating weather and event data. Improved accuracy from 60% to 90%, reducing waste 50% and stockouts 75%. Demonstrated $12K+ annual savings per store.
scikit-learn
Python
Streamlit
UC Berkeley
🏗️
Aug 2024 – Jun 2025
Azure Retail Data Lakehouse
Full Medallion Architecture (Bronze/Silver/Gold) on Azure using ADLS Gen2, Azure Data Factory, and Databricks. PySpark ETL with partitioning and caching optimizations.
Azure Databricks
PySpark
Delta Lake
ADF
🔍
Apr 2026
AWS-Style Sentiment & NLP Analyzer
Replicates Amazon Comprehend — sentiment analysis, key phrase extraction, and entity detection with AWS-style JSON output. Single and batch analysis modes.
NLP
AWS Comprehend
Sentiment Analysis
Streamlit
👥
Apr 2026
Employee Attrition Risk Predictor
ML classifier predicting employee attrition risk with 85%+ accuracy. Models 8 risk factors, generates HR recommendations, and projects $900K+ annual savings for 500-person companies.
Gradient Boosting
scikit-learn
HR Analytics
Streamlit
⚡
Feb – Sep 2023
Real-Time Streaming Pipeline
Built real-time data ingestion pipelines using Azure Event Hub and Databricks Structured Streaming. Processed high-volume data with PySpark, stored in Delta Lake with checkpointing and fault-tolerant mechanisms for near real-time dashboarding.
Azure Event Hub
Structured Streaming
Delta Lake
PySpark
🔄
May – Dec 2022
ETL Migration — Informatica to Databricks
Migrated legacy Informatica PowerCenter ETL workflows to Azure Databricks and ADF. Converted complex mappings to optimized PySpark transformations with full data validation, reconciliation, and ADF orchestration for scheduling and monitoring.
Informatica PowerCenter
Azure Databricks
ADF
PySpark