How a leading oncology center improved medical retrieval accuracy by 40% — benchmarked across 7 models with FloTorch
+40%
Retrieval accuracy improvement
7 models
Benchmarked in a single study
40% faster
AWS vs. Azure on latency
2–3×
Cost savings vs. competing cloud stack
INDUSTRY
Healthcare – Oncology
REGION
United States
USE CASE
Clinical Decision Support RAG
The Challenge
A leading oncology center set out to build a clinical AI assistant that could help patients and clinicians access treatment plans, drug options, and evidence-backed medical insights — in real time. The challenge was significant: no ground truth dataset existed, the content was highly domain-specific, and the system had to perform reliably within strict cost and latency constraints. Without a structured evaluation framework, choosing the right retrieval stack was guesswork.
The Solution
- Multi-model benchmarking — FloTorch evaluated 7 embedding models across 2,000 queries, measuring NDCG@k, recall@k, cost, and latency across AWS and Azure infrastructure
- LLM-as-a-Judge evaluation — With no ground truth available, FloTorch deployed LLM-as-a-Judge to generate unbiased pseudo-ground-truth and ensure cross-model fairness
- Text normalization layer — Discovered that Azure and open-source embeddings were highly sensitive to Unicode inconsistencies; implemented canonical normalization (Unicode standardization, symbol cleanup, spacing) that produced dramatic accuracy gains
- Cost & latency profiling — Generated a full cost-per-million-query breakdown and latency comparison across both cloud stacks to support production decision-making
KEY RESULTS
NDCG@1 improvement (MedEmbed)
↑ +40.22 points
NDCG@1 improvement (Azure T3-Large)
↑ +32.65 points
AWS vs. Azure latency
↑ 40% faster
AWS vs. Azure cost per 1M queries
↓ 2–3× lower
Engineers needed to switch stacks
0
FLOTORCH STACK USED
◈
Multi-Embedding Benchmarking Framework
◈
LLM-as-a-Judge Evaluation
◈
Cost & Latency Analytics Dashboard
AWS vs. Azure cost per 1M queries
Want Similar Result ?
Talk to our team about deploying a RAG blueprint for your use case.
Book a free scoping call →


