CONTEXTO AI
Link to open source: https://github.com/malikamiss628/contexto.ai
Link to Live Project: https://malikamiss628.github.io/contexto.ai/
"Enterprise data is dark."
Every company relies on databases, but the documentation explaining them is usually outdated, non-existent, or trapped in technical jargon. When business teams need answers, they wait days for data engineers to translate cryptic column names like cust_ltv_q4_flg into actual insights. This documentation debt creates a massive bottleneck for decision-making.
"Contexto AI is an autonomous data sense-making engine."
We’ve built a platform that plugs directly into enterprise databases, extracts the raw schema, runs statistical health checks, and uses AI to automatically write business-ready documentation. We are turning static, confusing databases into an interactive, conversational asset where anyone can simply ask questions about their data.
The Foundation (Backend & Extraction): We built a high-performance backend using Python and FastAPI for rapid, asynchronous processing. Using SQLAlchemy, we established secure, database-agnostic connectors that can ingest INFORMATION_SCHEMA metadata seamlessly from sources like PostgreSQL or Snowflake.
The Intelligence (Google Gemini 1.5 Flash): Speed is critical for a conversational interface. We chose Gemini 1.5 Flash because its low-latency reasoning is perfect for instantly translating technical metadata into plain English and automatically tagging sensitive data (PII) on the fly.
The Lab (Google AI Studio): To ensure accuracy, we utilized Google AI Studio to rapidly prototype, test, and refine our system prompts against complex database schemas before hardcoding the API calls into our backend.
The Engine (RAG Pipeline): Standard LLMs hallucinate. To prevent this, we implemented a Retrieval-Augmented Generation (RAG) architecture. We embed the AI-generated documentation into a vector database. When a user asks a question, the AI only pulls from this highly-structured, verified metadata
To prove this architecture works in a realistic, messy enterprise environment, we validated our extraction and AI enrichment tools against the GDG Cloud New Delhi Hackfest 2.0 dataset. By processing these internal recommendation documents and mock schemas, we successfully demonstrated the platform's ability to map complex data into searchable, business-ready insights.
With Contexto AI, business users can finally ask, "Where is the Q4 revenue data?" and get an instant, accurate answer with a data trust score and suggested SQL. We are eliminating documentation debt and democratizing data access.
This build was uploaded as a hackathon project


