Overview
Analyzing business data typically requires writing queries, cleaning datasets, and manually interpreting results. This project builds an Agentic Data Analyst — a system that allows users to ask business questions in natural language and receive answers backed by real data analysis. Instead of relying on static dashboards, the system behaves like an automated analyst that interprets questions, performs analysis, and returns concise insights.
Dataset
This project is built around the Superstore dataset, a commonly used dataset for retail analytics found on Kaggle. It includes features such as order dates, product categories, regions, sales, profit, and customer information. Due to the dataset being 100% cleaned, I manually created a messy version of it to demonstrate how the agentic data analyst handles preprocessing. The dataset provides a realistic foundation for exploring common business questions such as identifying top-performing regions and analyzing trends over time.
System Design
The core of the project is an agent-based architecture that connects a language model with structured analysis tools. Instead of following a fixed pipeline, the system dynamically interprets the user's question, selects the appropriate analysis function, executes calculations on the dataset, and generates a clear, business-focused response. This makes the system flexible and capable of handling a variety of analytical queries without hardcoding each one.
Analysis Engine
The backend includes a set of reusable analysis tools designed to answer common business questions — total sales and profit, sales by region, profit by category, monthly sales trends, and dataset summaries. These tools act as the foundation for the agent, allowing it to perform real computations rather than relying solely on language model responses.
Results & Behavior
The system is able to respond to questions like “Which region sells the most?”, “What category is most profitable?”, and “How do sales change over time?” For each query, the agent selects the relevant analysis, runs it on the dataset, and returns a structured answer. In some cases, it can also generate charts to support the response.
The result is a workflow where insights are generated dynamically based on the user's question rather than predefined outputs.