How a Query Works: Step-by-Step Pipeline¶
When a user types a question like "What is the VL suppression rate by district?" in the chat interface, here is exactly what happens end-to-end.
Architecture¶
User Question
│
▼
┌─────────────────────────────────────────────────────┐
│ Slim PHP API │
│ │
│ ChatController.ask() │
│ │ │
│ ▼ │
│ QueryService.processQuery() │
│ │ │
│ ├─ 1. Validate against business rules │
│ ├─ 2. Check conversation context │
│ ├─ 3. Detect intent (LLM) │
│ ├─ 4. Select relevant tables (LLM) │
│ ├─ 5. Retrieve RAG context (Qdrant) │
│ ├─ 6. Build strict RAG allowlist │
│ ├─ 7. Generate SQL (LLM) │
│ ├─ 8. Validate SQL (privacy + grounding) │
│ │ │
│ ▼ │
│ DatabaseService.execute() │
│ │ (runs SQL against vlsm) │
│ ▼ │
│ ChartService.suggest() │
│ │ (heuristic + optional LLM) │
│ ▼ │
│ JSON Response │
└─────────────────────────────────────────────────────┘
Step 1: Validate Against Business Rules¶
File: QueryService::validateQueryAgainstBusinessRules()
Before any LLM call, the system checks the question against hard-coded business rules in config/business-rules.php:
- Privacy check — Rejects questions asking for patient-level data (names, IDs, phone numbers, addresses). These are blocked before the LLM ever sees them.
- Forbidden patterns — Detects and blocks attempts to ask for individual patient records, DELETE/UPDATE/DROP statements, etc.
- Length validation — Rejects queries that are too short (< 3 chars) or too long.
If validation fails, an error is returned immediately without calling the LLM.
Step 2: Check Conversation Context¶
File: ConversationContextService::getContextForNewQuery()
The system checks if this question references a previous conversation turn. It looks for:
- Pronouns: "these", "those", "them", "it"
- Continuations: "of those", "among them", "from those"
- Drill-downs: "break down by province", "by facility" (when no test type specified)
- Refinements: "but only", "just the", "narrow to", "in Littoral"
- Follow-ups: "what about", "how about", "what percentage"
- Implicit: short questions (< 6 words) without a table/test-type keyword
If context is detected, the system builds a rich context block containing previous queries, their SQL, result summaries, and extracted filters. This block is injected into the LLM prompt in Step 7.
Example context block:
CONVERSATION CONTEXT:
Q1: "How many viral load tests done in last 12 months?"
SQL: SELECT COUNT(*) AS total_tests FROM form_vl WHERE sample_tested_datetime >= DATE_SUB(NOW(), INTERVAL 12 MONTH)
Result: Found 45,000 records
Filters: time_period=12 MONTH, table=form_vl
If the user says "these", "those", etc., they mean the results above.
CARRY FORWARD all filters and ADD the new conditions.
Step 3: Detect Intent¶
File: QueryService::detectQueryIntentWithBusinessRules()
The system calls the LLM (via the sidecar) to classify the question's intent. The LLM is given: - The user question - Business rules context - Conversation context (if any)
It returns a structured JSON with:
- intent — One of: count, list, aggregate, trend, comparison, distribution, performance, correlation, geographic, metadata, general
- intent_details — Specific dimensions and filters detected in the question
- confidence — How confident the classification is
This intent guides table selection and prompt construction in later steps.
Step 4: Select Relevant Tables¶
File: QueryService::selectRelevantTablesWithBusinessRules()
Using the detected intent, the system asks the LLM to select which database tables are needed to answer the question. The LLM is given:
- The question
- The intent analysis
- A list of all available tables (from var/schema.json)
It returns 1-3 table names that should be used in the SQL query (e.g., form_vl, facility_details).
Step 5: Retrieve RAG Context¶
File: QueryService::retrieveIntentContexts()
The system performs semantic search against the Qdrant vector database via the RAG API. It sends multiple queries to find the most relevant snippets:
- Primary search — The user question itself, filtered to snippet types relevant to the intent
- Table-specific search — Search filtered to the selected tables, retrieving column definitions
- Rule search — Search for business rules relevant to the question
The RAG API embeds each query using BAAI/bge-small-en-v1.5 and finds the nearest neighbors in the vector space. Results are scored by cosine similarity.
Typically returns 15-24 snippets covering: table schemas, column semantics, business rules, clinical thresholds, and query patterns.
Step 6: Build Strict RAG Allowlist¶
File: QueryService::buildStrictRagPack()
The raw RAG results are compressed into a compact 24-item "allowlist" — a structured block of text that tells the LLM exactly what tables, columns, rules, and patterns it is allowed to use.
Format:
ALLOWLIST (you may ONLY use items from this list):
[TABLE] form_vl — Viral load test results
Columns: sample_tested_datetime, facility_name, result_value_absolute, ...
[TABLE] facility_details — Facility metadata
Columns: facility_name, facility_state, facility_district, ...
[RULE] Suppressed VL: result_value_absolute < 1000 copies/ml
[RULE] Always exclude rejected samples: IFNULL(is_sample_rejected, 'no') = 'no'
[RULE] Never expose patient-level data
[PATTERN] VL suppression rate: COUNT(CASE WHEN result_value_absolute < 1000 ...) / COUNT(*)
This allowlist is the grounding mechanism — the LLM can only use tables, columns, and patterns listed here. Anything not in the allowlist is forbidden.
Step 7: Generate SQL¶
File: QueryService::callLLM()
This is where the LLM generates the actual SQL query. It receives a two-part prompt:
System prompt:
You are a strict MySQL SQL generator for a medical laboratory database.
- Only use tables and columns from the ALLOWLIST
- Never return patient-identifying information
- Always exclude rejected samples unless asked
- Use aggregate functions, never return raw patient rows
- Respond ONLY with JSON: {"s":"<SQL>","ok":true,"conf":0.85,"why":"...","cit":["..."]}
User prompt:
ALLOWLIST:
[the compact RAG pack from Step 6]
CONVERSATION CONTEXT:
[context block from Step 2, if applicable]
QUESTION: "What is the VL suppression rate by district?"
Return ONLY one JSON object.
The LLM responds with JSON containing:
- s — The generated SQL query
- ok — Whether it was able to answer (true/false)
- conf — Confidence score (0-1)
- why — Human-readable explanation of what the query does
- cit — Citations (which RAG snippets were used)
Example LLM output:
{
"s": "SELECT fd.facility_state AS district, COUNT(*) AS total_tests, COUNT(CASE WHEN fv.result_value_absolute < 1000 THEN 1 END) AS suppressed, ROUND(COUNT(CASE WHEN fv.result_value_absolute < 1000 THEN 1 END) * 100.0 / COUNT(*), 1) AS suppression_rate FROM form_vl fv JOIN facility_details fd ON fv.facility_id = fd.facility_id WHERE fv.sample_tested_datetime >= DATE_SUB(NOW(), INTERVAL 12 MONTH) AND IFNULL(fv.is_sample_rejected, 'no') = 'no' GROUP BY fd.facility_state ORDER BY suppression_rate DESC",
"ok": true,
"conf": 0.92,
"why": "Calculates VL suppression rate (< 1000 copies/ml) by district using form_vl joined with facility_details, excluding rejected samples, for the last 12 months.",
"cit": ["VL suppression threshold", "form_vl columns", "facility_details columns"]
}
The LLM call goes through the LLM Sidecar (LlmClient → HTTP → llm-sidecar:3100) which routes to the configured model (e.g., Claude, GPT-4, DeepSeek).
Step 8: Validate SQL¶
File: QueryService::validateSql() and QueryService::enforceGrounding()
Before executing, the generated SQL is validated:
Privacy validation¶
- No forbidden columns (patient_name, phone_number, etc.) in SELECT
- No patient-level queries without aggregation
- No DELETE, UPDATE, DROP, ALTER statements
Grounding validation¶
- Every table in the SQL must be in the allowed tables list
- Every column must exist in the schema
- The SQL must be syntactically valid
If validation fails, the query is rejected with an explanation. The system may retry with adjusted prompts (up to 2 retries).
Step 9: Execute SQL¶
File: DatabaseService::execute()
The validated SQL is executed against the vlsm database using a raw PDO connection. This is intentionally separate from the Eloquent ORM used for app data.
Returns:
- columns — Column names from the result set
- rows — Array of result rows (associative arrays)
- count — Total row count
- execution_time_ms — How long the query took
Step 10: Store Conversation Context¶
File: ConversationContextService::addQuery()
After execution, the full turn is stored in the session: - Original question - Generated SQL - Detected intent - Tables used - Extracted filters (time range, facility, province, etc.) - Result summary - Sample rows (first 5)
This enables the next question to reference "those results" or "break that down by facility".
Step 11: Suggest Chart¶
File: ChartService::suggest()
The system analyzes the result set to recommend a visualization:
Heuristic pass (fast, no LLM)¶
- Single row → table/KPI
- Temporal + numeric columns → line/area chart
- Single categorical + single numeric → pie (few rows) or bar (many rows)
- Two numeric columns → scatter plot
- Multiple categorical + numeric → stacked bar
- Many columns → table
LLM fallback¶
If heuristics are inconclusive, the LLM is asked via structured output to recommend a chart type with axis configuration.
Step 12: Return Response¶
File: ChatController::ask()
The final JSON response is assembled:
{
"sql": "SELECT ...",
"verification": {
"ok": true,
"conf": 0.92,
"why": "Calculates VL suppression rate by district..."
},
"citations": ["VL suppression threshold", "form_vl columns"],
"data": {
"columns": ["district", "total_tests", "suppressed", "suppression_rate"],
"rows": [...],
"count": 10
},
"chart": {
"recommended": "bar",
"alternatives": ["horizontal_bar", "table"],
"config": {"x_axis": "district", "y_axis": "suppression_rate"},
"reasoning": "Single categorical dimension with many categories"
},
"meta": {
"execution_time_ms": 3200,
"detected_intent": "aggregate",
"sql_execution_time_ms": 45,
"session_id": "abc-123"
},
"debug": {
"tables_used": ["form_vl", "facility_details"],
"conversation_context": []
}
}
The frontend renders this as bento cards: explanation, data table, chart, and query details.
Multi-Turn Conversation Example¶
Q1: "How many VL tests in the last 12 months?"
→ SQL: SELECT COUNT(*) FROM form_vl WHERE sample_tested_datetime >= DATE_SUB(NOW(), INTERVAL 12 MONTH)
→ Result: 45,000
Q2: "How many of those were in Littoral province?"
→ Context detected: "those" references Q1
→ SQL: SELECT COUNT(*) FROM form_vl fv JOIN facility_details fd ON ... WHERE ... AND fd.facility_state = 'Littoral'
→ Result: 12,300
→ Filters carried forward: 12 MONTH time range + form_vl table
Q3: "Break those down by facility"
→ Context detected: drill-down on Q2
→ SQL: SELECT fd.facility_name, COUNT(*) FROM form_vl fv JOIN ... WHERE ... AND fd.facility_state = 'Littoral' GROUP BY fd.facility_name
→ Result: 45 facilities listed
→ All filters carried forward: 12 MONTH + Littoral + form_vl
Error Handling¶
- Privacy violation → Blocked at Step 1, user sees "This question asks for restricted patient-level data"
- LLM refusal →
ok: falsein Step 7, user sees the LLM's explanation of why it can't answer - SQL validation failure → Blocked at Step 8, retried up to 2 times, then user sees validation error
- Database error → Caught in Step 9, user sees "Database error: ..."
- RAG unavailable → Step 5 returns empty results, LLM generates SQL without RAG context (less accurate)