Beyond the Chatbot: Building Agentic Workflows for Real-World Due Diligence

2026-02-10•

#AI#LLMs#Python#Agentic Workflows#Stellula

In the last year, "RAG" (Retrieval-Augmented Generation) became the buzzword of the industry. Every company built a "Chat with your Data" prototype. You upload a PDF, ask a question, and the bot answers.

For 90% of use cases, that's fine. But when we started building the technical infrastructure for Stellula, a Venture Builder focused on high-stakes investment decisions, "fine" wasn't enough.

We needed to automate Due Diligence—a process that requires skepticism, cross-referencing, and multi-step reasoning. A simple chatbot would just summarize the founder's pitch deck. We needed an agent that could interrogate it.

Here is how we moved beyond simple RAG to build true Agentic Workflows using Python and LLMs.

The Problem with "One-Shot" Inference

If you paste a 50-page financial report into an LLM and ask "Is this a good investment?", you will get a generic, hallucinated, or overly optimistic answer.

LLMs are probabilistic engines, not truth machines. If you give them a complex task in a single prompt, they tend to "lazy reason"—they take the path of least resistance.

To solve this, we stopped treating the LLM as a Oracle and started treating it as a Processor within a larger architecture.

The Agentic Architecture

Instead of one giant prompt, we architected a workflow of specialized agents. We didn't just ask for an answer; we engineered a manufacturing line of thought.

1. The Decomposition Agent

The first step isn't answering the question. It's understanding it. When a user asks "Analyze this startup," our first agent breaks this down into strict sub-tasks:

"Extract the Market Size claims."
"Identify the Competitors mentioned."
"Calculate the burn rate based on the P&L."

2. The Researcher (Tool Use)

This is where Python shines. The LLM identifies what it needs, but Python executes the retrieval. We gave the agents tools—not just vector search, but the ability to perform precise math or structured data extraction.

This separates "creative writing" (LLM) from "fact-checking" (Code).

3. The Critic (Self-Correction)

This was our breakthrough. We introduced a "Critic Agent" whose only job is to review the output of the "Researcher Agent."

Researcher: "The company grew 300% YoY."
Critic: "Citation needed. Please point to the specific page in the document or flag this as an assumption."

This loop reduces hallucinations dramatically. It mimics a Senior Engineer reviewing a Junior's PR.

Structured Output is King

One major lesson: Never let the LLM just "talk".

At the end of the workflow, we force the output into strict JSON schemas (using libraries like Pydantic). We don't want a poem about the startup's finances; we want a structured object that our frontend can render as charts and risk scores.

Conclusion: Architecture Over Prompts

The transition from "Senior Engineer" to "AI Engineer" isn't about learning how to write clever prompts. It's about applying software engineering principles—modularity, testing, and separation of concerns—to non-deterministic models.

At Stellula, the "Magic" isn't the model (GPT-4 or Claude). The magic is the Workflow that orchestrates the model to produce reliable, business-critical outputs.

The future isn't just chatting with AI. It's delegating work to it.