Skip to main content
Saksham.
Back to blog

How I Built an AI Agent That Books 2x More Meetings

A step-by-step breakdown of how I built an AI lead qualification agent for a 40-person SaaS company. Architecture, implementation, and results.

Saksham Solanki
Saksham Solanki
AI Systems Architect12 min

Most AI chatbots fail at lead qualification because they're built as conversation tools, not decision systems.

I took a different approach for a 40-person SaaS company that was losing qualified leads to slow response times. The result: 2x more meetings booked, with faster qualification and zero additional headcount.

2.1x

More Meetings Booked

After 60 days in production

47s

Response Time

Down from 4.2 hours

89%

Qualification Accuracy

Validated by SDR team

$340/mo

API Cost

vs $5,000/mo for an SDR

Results after 60 days of the AI qualification agent in production

The Problem

The company had a solid inbound flow, about 200 leads per week. But their SDR team of three couldn't respond fast enough. Average response time: 4.2 hours. By the time they reached out, 60% of leads had gone cold or talked to a competitor.

The obvious solution was "hire more SDRs." The smarter solution was to build a system that qualifies and routes leads in under 2 minutes.

Before (Manual SDRs)
After (AI Agent)
Response Time0.78 min
Leads Handled by AI85%
Monthly Cost340$

The Architecture

I designed a three-layer agent system:

Three-Layer Agent Architecture

LAYER 1: INTAKE PROCESSINGForm Submission TriggerCRM Context PullClearbit EnrichmentICP ScoringLAYER 2: QUALIFICATION LOGICStructured BANT FlowReal-Time Score UpdatesDeterministic RulesLLM ConversationLAYER 3: ROUTING & BOOKINGTerritory-Based SDR RoutingCalendly Link Delivery24-Hour Follow-Up

Layer 1: Intake Processing Every form submission triggers the agent. It pulls context from the CRM, enriches the company data via Clearbit, and scores the lead against the ICP criteria.

Layer 2: Qualification Logic The agent runs a structured qualification flow, not a generic chatbot conversation. It asks specific questions mapped to the company's qualification framework (budget, authority, need, timeline). Each response updates the lead score in real-time.

Layer 3: Routing & Booking Qualified leads get routed to the right SDR based on territory and availability. The agent sends a Calendly link and follows up if no booking is made within 24 hours.

The 5 Qualification Signals My Agent Tracks

Most BANT-style chatbots ask 4 generic questions and call it a day. The agent I built tracks five signals, each with its own scoring logic and threshold. Together they form a composite qualification score that has held up at 89% accuracy validated against SDR review.

Signal 1: Company Size Match. The ICP is 50-500 employees. The agent doesn't just ask the lead. It pulls the truth from Clearbit (which usually has the right number) and only asks the lead if Clearbit returns ambiguous data. This single change cut the average conversation length by 35% because the agent stopped asking questions it could already answer.

Signal 2: Role Authority. Job title alone is a weak signal because titles don't translate across companies. The agent combines stated title plus self-described responsibilities ("I lead the ops team" vs "I report on metrics for ops"). The scoring is: decision maker (full points), strong influencer (partial points), researcher (low points). Researchers still get qualified, but they get routed to a content nurture flow rather than directly to an SDR.

Signal 3: Intent Timing. "I'm exploring options" and "we have budget approved for Q2" are wildly different leads. The agent extracts a timing band (immediate, 30-90 days, 3-6 months, exploratory) and weights the score accordingly. Immediate-timing leads jump the queue. Exploratory leads get a longer nurture path with no SDR pressure.

Signal 4: Technical Readiness. For this client, the product needed an existing CRM (HubSpot, Salesforce, or Pipedrive). Leads without one weren't a fit no matter how senior they were. The agent asks about current stack early in the conversation. If the answer fails the technical fit, the agent politely declines and offers a self-serve resource. Saves the SDR team a meeting they would have ended on slide 3.

Signal 5: Budget Alignment. The hardest signal to extract gracefully. The agent doesn't ask "what's your budget" because that kills the conversation. It asks proxy questions: "what tools have you evaluated," "what's your team's spend on similar tools today." Combined with company size and role, this gives a strong-enough budget signal to qualify or disqualify without ever directly asking.

The composite score is computed deterministically from the 5 signals. The LLM extracts the values; the math happens in code.

How the BANT Flow Actually Works in Production

A sample conversation excerpt (anonymized) showing how the agent handles a real qualified lead. The lead's stated text is in italics; the agent's responses follow.

"Hey, we're looking at automating our sales pipeline. About 60 of us, mostly in ops and CS."

Got it. With ops and CS at 60 people, you're squarely in the range I help most with. Quick question to point you to the right next step: are you currently running HubSpot, Salesforce, or another CRM?

"HubSpot. Pro tier, just upgraded last month."

Perfect. HubSpot Pro is exactly what most of my client setups run on. One more: when you say "looking at automating," is this something you'd want to move on this quarter, or is it more exploratory?

"This quarter ideally. We have budget approved through end of Q2."

Great. Based on what you've shared (60-person team, HubSpot Pro, Q2 timeline) you're a strong fit. The fastest path is a 30-minute call with our team. Here's the calendar to grab a slot that works: [Calendly link]. If you want, share what you're hoping to automate first, and I'll make sure the call is ready with the right context.

What's happening under the hood: every reply runs through Layer 2 to extract one structured field (company size, current CRM, timing, etc.). The deterministic policy module updates the score and decides the next state. The agent's response is generated by Claude Sonnet 4.6 conditioned on the next-state instruction. The conversation feels natural, but the qualification logic is fully deterministic.

When the score hits the qualified threshold, the routing layer fires. When it falls below the disqualified threshold (wrong CRM, no timing, no authority), the agent gracefully declines with a self-serve resource and the conversation ends.

The Build

Total build time: 11 days. Here's what I used:

  • LLM: Claude Sonnet 4.6 for qualification conversations, Claude Haiku 4.5 for routing decisions
  • Orchestration: Custom Python agent framework (not LangChain, too much overhead for this use case)
  • CRM Integration: HubSpot API for lead data and deal creation
  • Enrichment: Clearbit API for company data
  • Scheduling: Calendly API for meeting booking
  • Deployment: AWS Lambda + API Gateway

The key architectural decision was separating the qualification logic from the conversation. The LLM handles natural language, but the actual qualification rules are deterministic. This gives us reliability without sacrificing conversation quality.

Join AI Builders Club

Weekly AI insights, tools, and builds. No fluff, just what matters.

Integration Architecture in Detail

The stack is unremarkable on purpose. Boring is what stays up.

Webhook ingress (HubSpot → Lambda). HubSpot's workflow engine fires a webhook to an API Gateway endpoint on every form submission. The Lambda is a thin entrypoint that validates the signature, drops the event into an SQS queue, and returns 200 within 50ms. The reason: HubSpot will retry if the webhook is slow, and you don't want duplicate events. The queue decouples ingress speed from agent processing speed.

Enrichment layer (Clearbit + HubSpot CRM). Worker Lambda picks up the SQS event and runs enrichment in parallel: Clearbit company-by-domain lookup (timeout 800ms, fall through if slow) and HubSpot contact-by-email lookup. Both writes go into a normalized event schema. Total enrichment latency: typically 600-900ms.

Conversation orchestration (Claude Sonnet 4.6 + state store). Each conversation has a conversation_id keyed in DynamoDB. State includes current step, signals collected so far, score, and history of LLM extractions. Every turn loads state, calls the LLM with a structured-output prompt for the current state's extraction, updates state, and persists. The LLM call uses prompt caching on the policy/system prompt, which cut token spend by ~40% once volume stabilized.

Routing decisions (Claude Haiku 4.5). Once qualified, a smaller, cheaper model handles territory and SDR assignment. Haiku 4.5 is more than capable of "given this lead profile and current SDR loads, who gets it" while costing a fraction of Sonnet per call. This is the only place I run a multi-model split, and it's because the routing decision is a pure structured task with no language nuance.

CRM write-back (HubSpot API). The agent updates the HubSpot contact with all 5 signal scores, the composite score, and a "qualified by AI" stamp. It also creates a deal in the right pipeline stage and assigns the owner based on territory.

Calendly integration. The agent uses Calendly's API to fetch the SDR's available windows for the next 5 business days, builds a single-use scheduling link, and includes it in the response. When the lead books, Calendly fires a webhook back into the system, which logs the booking against the deal and stops the 24-hour follow-up timer.

Follow-up scheduler. A simple EventBridge schedule triggers a check every 6 hours: any qualified lead without a booking after 24 hours gets one follow-up message. After 72 hours with no response, the deal is reassigned to the SDR for human follow-up. No spam, no chasing.

API rate limits matter at scale. HubSpot Pro tier allows 100 requests/10s per private app, which is generous for this volume. Calendly's API is the bottleneck: I hit their rate limit twice in the first month and had to add a token-bucket throttle on outbound Calendly calls.

Cost Breakdown: Real Numbers from 6 Months in Production

The headline number is $340/month. Here's how that decomposes after 6 months at steady state, processing roughly 800 leads/month:

Line ItemMonthly CostNotes
Claude Sonnet 4.6 API$172~7M input tokens, ~1.7M output tokens with prompt caching enabled
Claude Haiku 4.5 API$13Routing-only calls, ~2M input tokens
AWS Lambda + API Gateway$24Modest invocation count, low memory footprint
DynamoDB (state store)$14On-demand pricing, ~12M reads/4M writes
SQS + EventBridge$6Negligible at this volume
Clearbit API$79800 enrichments at ~$0.10 each
Calendly API$0Included in existing Calendly subscription
HubSpot API$0Included in existing HubSpot Pro
Monitoring (Datadog)$32Trace ingestion + dashboards

Total: ~$340/month.

Per-conversation unit economics: about $0.43/conversation across all infrastructure. For comparison, a single SDR-handled qualification conversation costs the company roughly $11 in fully-loaded SDR time. The agent is ~25x cheaper per conversation, with a faster response time and a measurable lift in conversion.

The vector DB line is intentionally absent. The agent doesn't use one. Knowledge needed for the conversation lives in the prompt (small) or in the structured CRM data (already loaded). Adding a vector store for "context" would have added complexity and cost without value, because the conversation scope is narrow enough that retrieval isn't needed.

What This Agent Does Not Do

Worth being explicit about the limits, because they're a feature, not a gap.

It does not handle inbound questions outside the qualification flow. If a lead asks "how does your billing work" mid-conversation, the agent acknowledges, says it'll have an SDR follow up with that, and continues with qualification. It is not a general assistant. Asking it to be one is how chatbots fail.

It does not negotiate, quote prices, or commit to deliverables. Code-level guardrails block any output containing dollar amounts, "guaranteed," or "promise." Pricing happens with a human SDR on the call.

It does not handle existing-customer support. The HubSpot context check ensures the agent only engages with new leads. Existing customers who land on a contact form get routed directly to support, no AI in between.

It does not run outbound or initiate cold conversations. The agent only responds to inbound triggers. The outbound system is a separate set of tools (covered in my AI B2B lead generation playbook) and intentionally lives outside this agent's scope.

It does not make hire/fire decisions about SDRs. The agent's metrics are reviewed weekly with the SDR team. They keep ownership of qualification quality and routing rules. The agent is a tool they configure, not a replacement for them. This is non-negotiable in any deployment I run.

The Results

After 60 days in production:

  • Response time: 4.2 hours → 47 seconds
  • Meetings booked: 2.1x increase
  • Qualification accuracy: 89% (validated by SDR team)
  • Cost: $340/month in API costs vs $5,000/month for an additional SDR

The system now handles 85% of initial lead qualification. The SDR team focuses on high-value conversations with pre-qualified leads instead of chasing cold form submissions.

Build Timeline: 11 Days Total

Days 1-2Architecture DesignThree-layer system design and API mapping
Days 3-5Intake + EnrichmentCRM integration, Clearbit API, ICP scoring rules
Days 6-8Qualification EngineBANT flow, LLM conversation layer, deterministic rules
Days 9-10Routing + BookingTerritory routing, Calendly integration, follow-up logic
Day 11Testing + DeployAWS Lambda deployment, end-to-end validation

What I'd Do Differently

If I built this again, I'd add a feedback loop from closed-won deals back to the qualification model. The current system qualifies based on input criteria, but it doesn't learn which qualification patterns actually convert. That's the next iteration.

The lesson: AI agents work best when they handle specific, well-defined workflows, not when they try to be general-purpose assistants. Scope the problem tightly, build the system around the decision logic, and let the LLM handle the language part.

AI agents work best when they handle specific, well-defined workflows. Scope the problem tightly, build the system around the decision logic, and let the LLM handle the language part.

Lesson from 11-day build

Frequently Asked Questions

Can I build this with no-code tools?

Partially. n8n v2.11.4 (or Make, Zapier) plus a Claude API node can handle the orchestration for a simpler version of this agent: webhook in, enrichment, single LLM call, write back. Where no-code falls short is multi-turn conversation state and structured-output reliability. If you need a real BANT flow with 5+ turns and accurate signal extraction, you'll hit the limits of no-code around step 3 of the conversation. My rule of thumb: no-code is fine for single-shot qualification (one-question screening); custom code is worth it for multi-turn flows.

What if my CRM is not HubSpot?

The architecture is CRM-agnostic. I've shipped variants on Salesforce, Pipedrive, and Attio. The work that changes is the integration layer (different API shapes, different field mappings) and the territory/owner-assignment logic (different CRMs model territory differently). Plan for an extra 2-3 days of integration work for non-HubSpot CRMs. The conversation, qualification logic, and Calendly integration are all reusable as-is.

How accurate is the qualification?

89% agreement with the SDR team's manual qualification on the same leads, validated by parallel review during the first 30 days. The 11% disagreement was almost entirely on edge cases (very small companies that the SDR team would have engaged anyway, or unusual industries that didn't fit the ICP cleanly). Accuracy improves with each iteration of the signal-scoring rules. The baseline 89% is what's achievable in the first month with no fine-tuning.

What happens at the human handoff?

When the agent qualifies a lead and the lead books a meeting, the SDR receives a Slack notification with: the lead's contact info, all 5 signal scores, the composite score, the full conversation transcript, the enrichment data, and a one-paragraph summary written by the agent. The SDR walks into the call already knowing who they're talking to. Average prep time before a call dropped from ~12 minutes to ~2.

Can this work for high-ticket B2B sales?

Yes, with adjustments. For deals above $50K ACV, the agent's role narrows: it qualifies fit and books the call, but it does not run BANT discovery. BANT happens with the human on the call. The agent's job is "is this person worth a call" and nothing more. For deals below $50K ACV, the agent can run a fuller qualification including budget proxy questions because the deal economics support a more abbreviated human conversation.

How do you prevent the agent from sounding robotic?

Three things. First, the system prompt for Claude Sonnet 4.6 explicitly establishes a voice that matches how the SDR team writes (I extracted that from real Slack and email threads with permission). Second, the agent never repeats the same phrasing twice in a conversation; the prompt forbids it explicitly and the policy layer flags repeats. Third, the agent acknowledges what the lead said before asking the next question, which is what a human SDR does and what most chatbots forget. Real test: 4 out of 5 leads in a 200-lead post-launch survey thought the agent might be a human. That's not the goal (the agent introduces itself as AI when asked), but it's a useful proxy for naturalness.


Related reads: The architecture above follows my 3-layer agent architecture pattern. For the full outbound system this agent plugs into, read my AI B2B lead generation playbook. And if you want to score leads before they hit the agent, here's how I build AI lead scoring systems.

See a similar system in production: SaaS support triage case study and CRM pipeline automation.

Want to build something similar? Join AI Builders Club for weekly implementation insights, or book a call to discuss your specific use case.

Saksham Solanki

Saksham Solanki

AI Systems Architect

I build production-grade AI systems for B2B companies. 50+ systems deployed, $2M+ in client ROI across 16+ industries. I write about what I build, not what I theorize about.

Connect on LinkedIn

Want to deploy AI systems like this?

I build production-grade AI automation for B2B companies. Every system is built to generate measurable ROI.

Book a 30-Min Strategy Call

Get the AI Builders Club newsletter

Weekly AI insights, tools, and builds. Every Thursday. No fluff.

No spam. Unsubscribe anytime.