Get To Know More

About Me

Data Science Intern

Rally Vision
Sept-Dec 2024

M.Tech (Computation and Data Science)

IISc Bangalore
8.4/10 CGPA

B.E. (Chemical Engineering)

BITS Pilani KK Birla Goa Campus
8.03/10 CGPA

I'm a Data Scientist with a background in Chemical Engineering and an M.Tech in Computational and Data Science from IISc Bangalore.
My expertise lies in building end to end Machine Learning systems, from feature engineering and model development to containerized deployment on cloud infrastructure. I have worked across computer vision, NLP, RAG, forecasting, developing solutions that are both technically robust and practical for real world use.
I enjoy applying AI to solve real-world problems and building systems that are not only accurate, but also scalable, deployable, and production-ready.

What I Work With

Technical Skills

Languages

Python SQL TypeScript

Machine Learning

Scikit-learn XGBoost Feature Engineering Model Evaluation Hyperparameter Tuning

Deep Learning

PyTorch Transformers Transfer Learning

Computer Vision

Object Tracking Image Classification Event Detection

NLP & LLMs

RAG DistilBERT Text Embeddings

Audio & Signal Processing

Librosa YAMNet Embeddings

Deployment

FastAPI Docker ONNX AWS EC2 Streamlit

Web Scraping & Retrieval

Scrapy Beautiful Soup Chroma DB BM25 Reciprocal Rank Fusion Cohere Reranking

Data Analysis

Demand Forecasting EDA

Databases

MongoDB

Web Development

Next.js Better Auth

Tools

Git

Browse My Recent

Projects

AI Powered Job Application Tracker

A full stack platform for organizing and tracking job applications throughout the hiring process. Users can manage applications, monitor progress across different stages, and maintain a profile for their job search.
The platform integrates AI powered workflow that generates personalized cover letter and preparation notes using profile data and job description.
Built with Next.js, TypeScript, MongoDB, FastAPI, and deployed using Vercel, Docker and AWS EC2.

CUDA Runtime API RAG Assistant

A domain-specific question answering system built on NVIDIA CUDA Runtime API documentation using Retrieval Augmented Generation (RAG).
The system combines dense retrieval using OpenAI embeddings and ChromaDB with BM25 sparse retrieval, fused through Reciprocal Rank Fusion (RRF). Retrieval quality is further improved through multi-query expansion and Cohere reranking, enabling accurate answers grounded in the official CUDA documentation.

Retail Demand Forecasting

A machine learning system for forecasting daily retail sales using the Rossmann Store Sales dataset.
The project includes exploratory data analysis, feature engineering, and model development using XGBoost with Optuna-based hyperparameter optimization. Historical sales patterns, promotions, holidays, competition, and store metadata are used to generate forecasts, which are visualized through an interactive Streamlit dashboard.

Fungi Image Classification

An image classification system for fungi image classification using deep learning and transfer learning techniques.
The model is built on EfficientNetV2 with selective fine-tuning of deeper layers and a custom classification head. Data augmentation and class-weighted loss were used to improve generalization on an imbalanced dataset, achieving 90.7% classification accuracy. The final model was exported to ONNX for efficient deployment and inference.

Movie Reviews Sentiment Analysis

A sentiment analysis system trained on the IMDb movie reviews dataset to classify reviews as positive or negative.
The project compares transformer based and classical machine learning approaches, including DistilBERT and TF-IDF-based models. DistilBERT achieved 90.86% accuracy, while a TF-IDF + Logistic Regression model (88.03% accuracy) was selected for deployment to provide faster and more cost-efficient inference.

Music Genre Classification

An audio classification system that predicts music genres from audio clips using both handcrafted features and deep audio embeddings.
The project explores traditional audio features extracted with Librosa alongside pretrained YAMNet embeddings. Multiple machine learning models were evaluated, with YAMNet embeddings achieving 85% accuracy. For deployment, a lightweight Librosa + Logistic Regression pipeline (78% accuracy) was chosen to balance performance with computational efficiency.

Mandeep Singh