Blog

Article

Getting Started with FloTorch-core: Building Modular RAG Pipelines

FloTorch-core is a modular and extensible Python framework designed for building LLM-powered Retrieval-Augmented Generation (RAG) pipelines. It offers plug-and-play components for embeddings, chunking, retrieval, gateway-based LLM calls, and RAG evaluation.

In this blog, we'll explore how to get started with FloTorch-core, covering installation, core components, and practical code examples.

‍

🚀 Installation

To install the latest version of FloTorch-core, use pip:

For development dependencies:

📁 Project Structure Overview

FloTorch-core is structured into modular components, each handling a specific part of the RAG pipeline:

reader/: Handles input parsing from JSON or PDF files.
chunking/: Responsible for splitting raw text into manageable chunks for downstream processing.
embedding/: Integrates embedding models from Bedrock or SageMaker for vector representation of text.
storage/: Interfaces with vector databases and storage backends such as OpenSearch, S3, and DynamoDB.
rerank/: Provides mechanisms for reordering retrieved documents based on relevance.
inferencer/: Connects to Bedrock or SageMaker-hosted LLMs to generate responses based on input queries and retrieved context.
guardrails/: Supports policy enforcement and safety mechanisms during inference.
evaluator/: Enables RAG pipeline evaluation using RAGAS metrics like faithfulness and context relevance.

🛠️ Prerequisites

Before proceeding, ensure you have the following:

AWS Account with access to Amazon Bedrock
Make sure your AWS account has been granted access to Amazon Bedrock service. If you do not have access, request it through the Amazon Bedrock Access Request Form.
Model Access in Bedrock
Within Bedrock, you must have enabled access to the specific foundation model (e.g., Anthropic Claude, Amazon Titan, or AI21). Navigate to the Model Access page in the AWS Console and ensure the model you want to use is listed under "Granted Access".

🖥️ Where Can You Run FloTorch pipeline?

You can run FloTorch pipelines in the following environments:

1. Locally on your machine

To run locally, ensure:

Python 3.9+ is installed.

AWS CLI is installed and configured with appropriate credentials.

You’ll need to provide:

AWS Access Key ID
AWS Secret Access Key
Default region
Output format (e.g., json)

Make sure the configured user has access to Amazon Bedrock, SageMaker (if used), and other services you're invoking like S3, DynamoDB, OpenSearch.

2. AWS SageMaker Notebooks

If you're using SageMaker Studio or Notebook instances:

Choose a kernel with Python 3.x (preferably Conda-based for better package isolation).

Ensure the attached IAM role has the necessary permissions to access Bedrock, S3, and other services.

🧾 Provide Experiment Configuration

The `exp_config_data` dictionary below provides a configuration example, containing key parameters for executing a RAG pipeline with either Bedrock or SageMaker.

🧾 Reading Data

FloTorch-core provides readers to ingest data from various sources. Here is a sample code snippet of reading JSON data from an S3 bucket and loading. Sample ground truth JSON file

Question Chunking: Each question from the input JSON is transformed into a `Chunk` object using the `get_chunk()` method defined in the Question class. This conversion ensures compatibility with FloTorch's data structures.

🗃️ Vector Storage Options

FloTorch-core is compatible with various vector storage options, such as Bedrock Knowledge Base, Opensearch.

Bedrock Knowledge Base

To set up Bedrock Knowledge Base for vector storage with VectorStorageFactory, you can supply configuration values dynamically, as illustrated below.

Please follow this link to create Bedrock Knowledge base and this notebook to upload data with different chunking mechanisms.

🧠 Bedrock Reranker Integration

To enhance response relevance after document retrieval from vector storage, FloTorch-core offers reranking capabilities. For instance, the Bedrock Reranker can be employed for a secondary ranking of the initial results.

🧬Inferencer Options

FloTorch-core enables response generation from LLMs using various inferencer backends. For instance, the Bedrock Inferencer can be used and configured through environment variables or a configuration file.

RAG with Flotorch Utility

Steps involved:

Initialization: The utility takes as input a configuration for the experiment, a vector storage instance, an optional reranker, an inferencer (LLM), and a set of question-answer pairs for evaluation.
Vector Retrieval: The provided vector storage is queried using each question's embedding to perform a k-Nearest Neighbors (KNN) search. This retrieves relevant documents from the underlying vector database.
Context Reranking (Optional): If a reranking model (such as Bedrock Reranker) is supplied, the retrieved documents are passed through it. This step aims to refine the relevance of the retrieved context for more accurate answer generation.
Answer Generation: The inferencer, which is a Large Language Model (LLM), processes the original question along with the retrieved (and potentially reranked) context documents to generate a final answer.
Metadata Logging: For each question, the utility collects and stores valuable information. This includes metadata about the inference process, the generated answer, the expected (ground truth) answer, the original question, and the documents retrieved as context. This data is crucial for evaluating the performance of the RAG pipeline.
Iterative Processing: The steps above (2-6) are repeated for every question provided in the input. Finally, all the generated responses and associated metadata are compiled into a comprehensive list of results.

🔁 Executing the RAG Workflow for Multiple Inference Models

Let’s now run the RAG pipeline for the following models:

us.amazon.nova-lite-v1:0
us.amazon.nova-micro-v1:0
us.anthropic.claude-3-5-haiku-20241022-v1:0
us.anthropic.claude-3-5-sonnet-20241022-v2:0

Each model will go through the same setup steps involving Vector Storage, Reranker, and Inferencer, and their outputs will be collected for comparison.

‍

📊 The responses for each model will be stored in a dictionary named for further analysis.

Here are the results JSON file which ran on the above attached ground truth JSON file.

Evaluating Multiple Models with FloTorch using Ragas

🧠 Ragas Introduction

Ragas provides a powerful evaluation framework for RAG pipelines—and internally utilizes LLMs to assess quality metrics like faithfulness, answer relevance, and context precision.

🧾 Sample Evaluation Configuration JSON

The `evaluation_config_data` dictionary holds the settings necessary for configuring and assessing the embedding and retrieval pipeline for evaluation. These settings are crucial for testing various embedding models, retrieval methods.

🧠 Embedding Model Initialization

FloTorch-core utilizes an `embedding_registry` for the dynamic selection and initialization of embedding models as per the specified configuration. This design facilitates the effortless interchangeability of different embedding models, streamlining the evaluation process without necessitating alterations to the fundamental pipeline structure.

🤖 Inferencers

FloTorch offers a consistent way to set up an LLM-based inferencer, leveraging either Amazon Bedrock or SageMaker. This allows for adaptable deployment of diverse foundation models to conduct inference on retrieved documents.

📊 Initialize RAG Evaluator

FloTorch integrates with Ragas, allowing the use of the RagasEvaluator to assess RAG pipeline performance. This utility applies standard metrics like Faithfulness, Answer Relevance, and Context Precision to evaluate retrieved documents and generated responses.

✅ Evaluate RAG Performance

After setting up the evaluator, RAG evaluation is performed on each model in the dataset using the RagasEvaluator. This process calculates crucial performance metrics and structures them for subsequent analysis.

Here are the Evaluation final results JSON file evaluated with ‘us.amazon.nova-pro-v1:0’ inference model and ‘amazon.titan-embed-text-v2:0’ embedding model.

✅ Plotting RAG Evaluation Metrics

To visualize the metrics from the final evaluation using the plot_grouped_bar function, you can first convert the JSON into a DataFrame, then select the desired metrics to plot. Here's the complete code that does that and produces a grouped bar chart:

‍

📊 Plot on multiple models with Ragas metrics

Here is the plot of showing multiple models on X-axis and evaluation metrics on Y-axis.

🌐 Additional Resources

FloTorch: flotorch.ai
PyPI Package: FloTorch-core
GitHub Repository: FissionAI/FloTorch‍
Flotorch Notebooks: Jupyter Notebooks

‍

📝 Conclusion

FloTorch-core offers a modular approach to building and evaluating RAG pipelines with LLMs. By leveraging its components for data ingestion, embedding, vector storage, inferencing, and evaluation, developers can construct robust and scalable AI solutions.

‍