Back
AI Data Dictionary Builder (SaaS Automation)

AI Data Dictionary Builder (SaaS Automation)

AIFastAPIReactOpenAIAWS

I built an AI-powered SaaS application that automatically generates complete data dictionaries, business glossaries, and metadata summaries from uploaded CSV/Excel files. The system uses GPT-4 and Groq embeddings to analyze datasets and produce business-friendly documentation in under 10 seconds, deployed as a full-stack application with React frontend on Vercel and FastAPI backend on AWS Lambda.

Implemented automated schema profiling using Pandas/Polars to analyze column types, missing values, cardinality, and statistical distributions. Built an AI description generator that prompts GPT-4.1 with column statistics to create clear, business-friendly descriptions. Developed glossary clustering using vector embeddings and cosine similarity to group semantically related columns, with LLMs automatically naming the groups.

Created a comprehensive data quality automation engine that flags inconsistent casing, invalid dates, mixed types, and outliers with automatic correction suggestions. Built a multi-format export engine generating downloadable PDF, Markdown, and Excel reports with auto-formatted glossaries and metadata summaries. The system handles datasets up to 2 million rows with secure S3 file storage, JWT authentication, and automatic cleanup. Achieved 90% reduction in manual documentation time, tested on 20+ real-world datasets from eCommerce, Finance, and HR domains, serving PMs, analysts, data engineers, and QA teams.

Background

Raunak skipped presentations and built real AI products.

Raunak Pandey was part of the August 2025 cohort at Curious PM, alongside 15 other talented participants.