How a leading oncology center improved medical retrieval accuracy by 40% — benchmarked across 7 models with FloTorch

+40%

Retrieval accuracy improvement

7 models

Benchmarked in a single study

40% faster

AWS vs. Azure on latency

2–3×

Cost savings vs. competing cloud stack

INDUSTRY

Healthcare – Oncology

REGION

United States

USE CASE

Clinical Decision Support RAG

The Challenge

A leading oncology center set out to build a clinical AI assistant that could help patients and clinicians access treatment plans, drug options, and evidence-backed medical insights — in real time. The challenge was significant: no ground truth dataset existed, the content was highly domain-specific, and the system had to perform reliably within strict cost and latency constraints. Without a structured evaluation framework, choosing the right retrieval stack was guesswork.

The Solution

Multi-model benchmarking — FloTorch evaluated 7 embedding models across 2,000 queries, measuring NDCG@k, recall@k, cost, and latency across AWS and Azure infrastructure
LLM-as-a-Judge evaluation — With no ground truth available, FloTorch deployed LLM-as-a-Judge to generate unbiased pseudo-ground-truth and ensure cross-model fairness
Text normalization layer — Discovered that Azure and open-source embeddings were highly sensitive to Unicode inconsistencies; implemented canonical normalization (Unicode standardization, symbol cleanup, spacing) that produced dramatic accuracy gains
Cost & latency profiling — Generated a full cost-per-million-query breakdown and latency comparison across both cloud stacks to support production decision-making

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

KEY RESULTS

NDCG@1 improvement (MedEmbed)

↑ +40.22 points

NDCG@1 improvement (Azure T3-Large)

↑ +32.65 points

AWS vs. Azure latency

↑ 40% faster

AWS vs. Azure cost per 1M queries

↓ 2–3× lower

Engineers needed to switch stacks

FLOTORCH STACK USED

◈

Multi-Embedding Benchmarking Framework

◈

LLM-as-a-Judge Evaluation

◈

Cost & Latency Analytics Dashboard

AWS vs. Azure cost per 1M queries

Want Similar Result ?

Talk to our team about deploying a RAG blueprint for your use case.

Book a free scoping call →

How a leading oncology center improved medical retrieval accuracy by 40% — benchmarked across 7 models with FloTorch

The Challenge

The Solution

📄Read the Full Case Study

Other Case studies

How a Legal AI Platform Benchmarked Its Way to the Right LLM

How a leading US neobank reduced customer support resolution time by 62% using FloTorch's RAG blueprints