Use the Amazon Bedrock knowledge base to perform metadata filtering on table data

Amazon Bedrock is a totally managed service that gives a number of high-performance foundational fashions (FMs) from main synthetic intelligence (AI) firms equivalent to AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon by way of a single API. To offer FM with the most recent proprietary data, organizations use Retrieval Augmented Technology (RAG), a know-how that takes knowledge from company knowledge sources and enriches prompts to supply extra related and correct responses. The Amazon Bedrock Information Base is a totally managed characteristic that assists you along with your complete RAG workflow, from ingestion to retrieval and immediate enhancement. Nevertheless, details about one dataset will be situated in one other dataset, referred to as metadata. With out metadata, your retrieval course of could lead to retrieving irrelevant outcomes, lowering FM accuracy and growing the price of FM trace tokens.

On March 27, 2024, Amazon Bedrock introduced a key new characteristic referred to as metadata filtering and modifications to the preset engine. This alteration means that you can use metadata fields throughout the search course of. Nevertheless, metadata fields have to be configured throughout the information base ingestion course of. Usually, you might have tabular knowledge the place particulars from one subject can be found in one other subject. Moreover, chances are you’ll have to reference the precise textual content file or textual content subject to stop hallucinations. On this article, we’ll present you methods to use the brand new metadata filtering characteristic with the Amazon Bedrock information base to work with such a tabular knowledge.

Resolution overview

This answer consists of the next superior steps:

Put together knowledge for metadata filtering.
Construct and ingest supplies and metadata into the information base.
Retrieve supplies from the information base utilizing metadata filtering.

Put together knowledge for metadata filtering

As of this writing, the Amazon Bedrock information base helps Amazon OpenSearch Serverless, Amazon Aurora, Pinecone, Redis Enterprise, and MongoDB Atlas as underlying vector storage suppliers. On this article, we use the Amazon Bedrock Boto3 SDK to create and entry OpenSearch Serverless vector storage. For extra particulars, see Arrange vector indexes on your information base in supported vector shops.

On this article, we create a information base utilizing the general public assortment Meals.com – Recipes and Evaluations. The next screenshot exhibits an instance dataset.

this TotalTime In ISO 8601 format. You possibly can convert it to minutes utilizing the next logic:

# Operate to transform ISO 8601 period to minutes
def convert_to_minutes(period):
    hours = 0
    minutes = 0
    
    # Discover hours and minutes utilizing regex
    match = re.match(r'PT(?:(d+)H)?(?:(d+)M)?', period)
    
    if match:
        if match.group(1):
            hours = int(match.group(1))
        if match.group(2):
            minutes = int(match.group(2))
    
    # Convert complete time to minutes
    total_minutes = hours * 60 + minutes
    return total_minutes

df['TotalTimeInMinutes'] = df['TotalTime'].apply(convert_to_minutes)

After changing some capabilities like CholesterolContent, SugarContent, and RecipeInstructionsthe information body is proven within the determine beneath.

With a purpose to allow FM to level to a particular menu (referenced file) by way of a hyperlink, we break up every row of the desk knowledge right into a textual content file, every file incorporates RecipeInstructions as an information subject and TotalTimeInMinutes, CholesterolContent, and SugarContent as metadata. Metadata needs to be saved in a separate JSON file with the identical title as the information file, and .metadata.json added to its title. For instance, if the information file title is 100.txtthe metadata file title needs to be 100.txt.metadata.json. For extra particulars, see Add metadata to information to permit filtering. Moreover, the content material within the metadata file needs to be within the following format:

{
"metadataAttributes": {
"${attribute1}": "${value1}",
"${attribute2}": "${value2}",
...
}
}

For simplicity, we solely course of the primary 2,000 rows to construct the information base.

After importing the required libraries, use the next Python code to create the native listing:

import pandas as pd
import os, json, tqdm, boto3

metafolder="multi_file_recipe_data"os.mkdir(metafolder)

Iterate over the primary 2,000 rows to create the information and metadata information to be saved in an area folder:

for i in tqdm.trange(2000):
    desc = str(df['RecipeInstructions'][i])
    meta = {
    "metadataAttributes": {
        "Title": str(df['Name'][i]),
        "TotalTimeInMinutes": str(df['TotalTimeInMinutes'][i]),
        "CholesterolContent": str(df['CholesterolContent'][i]),
        "SugarContent": str(df['SugarContent'][i]),
    }
    }
    filename = metafolder+'/' + str(i+1)+ '.txt'
    f = open(filename, 'w')
    f.write(desc)
    f.shut()
    metafilename = filename+'.metadata.json'
    with open( metafilename, 'w') as f:
        json.dump(meta, f)

Create an Amazon Easy Storage Service (Amazon S3) bucket named food-kb and add the file:

# Add knowledge to s3
s3_client = boto3.consumer("s3")
bucket_name = "recipe-kb"
data_root = metafolder+'/'
def uploadDirectory(path,bucket_name):
    for root,dirs,information in os.stroll(path):
        for file in tqdm.tqdm(information):
            s3_client.upload_file(os.path.be a part of(root,file),bucket_name,file)

uploadDirectory(data_root, bucket_name)

Create supplies and metadata and convey them into the information base

As soon as the S3 folder is prepared, you need to use the SDK to create a information base on the Amazon Bedrock console based mostly on this instance pocket book.

Retrieve knowledge from the information base utilizing metadata filtering

Now let’s retrieve some data from the information base. On this article, we’re utilizing the Anthropic Claude Sonnet on Amazon Bedrock because the FM, however you’ll be able to select from a wide range of Amazon Bedrock fashions. First, it’s essential to set the next variables, the place kb_id is the ID of your information base. The information base ID will be discovered programmatically, as proven within the instance pocket book, or from the Amazon Bedrock console by navigating to a single information base, as proven within the following screenshot.

Use the next code to set the required Amazon Bedrock parameters:

import boto3
import pprint
from botocore.consumer import Config
import json

pp = pprint.PrettyPrinter(indent=2)
session = boto3.session.Session()
area = session.region_name
bedrock_config = Config(connect_timeout=120, read_timeout=120, retries={'max_attempts': 0})
bedrock_client = boto3.consumer('bedrock-runtime', region_name = area)
bedrock_agent_client = boto3.consumer("bedrock-agent-runtime",
                              config=bedrock_config, region_name = area)
kb_id = "EIBBXVFDQP"
model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'

# retrieve api for fetching solely the related context.

question = " Inform me a recipe that I could make underneath half-hour and has ldl cholesterol lower than 10 "

relevant_documents = bedrock_agent_runtime_client.retrieve(
    retrievalQuery= {
        'textual content': question
    },
    knowledgeBaseId=kb_id,
    retrievalConfiguration= {
        'vectorSearchConfiguration': {
            'numberOfResults': 2 
        }
    }
)
pp.pprint(relevant_documents["retrievalResults"])

The next code is the output from a information base search with out metadata filtering for the question “Inform me a recipe I could make in half-hour that has lower than 10 ldl cholesterol.” We will see that the preparation time of the 2 recipes is half-hour and 480 minutes respectively, and the ldl cholesterol content material is 86 and 112.4 respectively. Subsequently, the retrieval can’t observe the question precisely.

The next code demonstrates methods to use the Retrieve API and set the metadata filter to ldl cholesterol content material lower than 10 and put together time lower than 30 for a similar question:

def retrieve(question, kbId, numberOfResults=5):
    return bedrock_agent_client.retrieve(
        retrievalQuery= {
            'textual content': question
        },
        knowledgeBaseId=kbId,
        retrievalConfiguration= {
            'vectorSearchConfiguration': {
                'numberOfResults': numberOfResults,
                 "filter": {
                            'andAll':[
                                {
                                "lessThan": {
                                "key": "CholesterolContent",
                                "value": 10
                                }
                            },
                                {
                            "lessThan": {
                                "key": "TotalTimeInMinutes",
                                "value": 30
                            }
                                }
                            ]
                        }
            }
        }
    ) 
question = "Inform me a recipe that I could make underneath half-hour and has ldl cholesterol lower than 10" 
response = retrieve(question, kb_id, 2)
retrievalResults = response['retrievalResults']
pp.pprint(retrievalResults)

From the outcomes beneath we are able to see that the preparation instances of the 2 recipes are 27 and 20 respectively, and the ldl cholesterol content material is 0 and 0 respectively. By utilizing metadata filtering we are able to get extra correct outcomes.

The next code exhibits methods to use the identical metadata filtering to get correct output retrieve_and_generate API. First, we arrange the prompts, then arrange the API with metadata filtering:

immediate = f"""
Human: You will have nice information about meals, so present solutions to questions by utilizing reality. 
If you do not know the reply, simply say that you do not know, do not attempt to make up a solution.

Assistant:"""

def retrieve_and_generate(question, kb_id,modelId, numberOfResults=10):
    return bedrock_agent_client.retrieve_and_generate(
        enter= {
            'textual content': question,
        },
        retrieveAndGenerateConfiguration={
        'knowledgeBaseConfiguration': {
            'generationConfiguration': {
                'promptTemplate': {
                    'textPromptTemplate': f"{immediate} $search_results$"
                }
            },
            'knowledgeBaseId': kb_id,
            'modelArn': model_id,
            'retrievalConfiguration': {
                'vectorSearchConfiguration': {
                    'numberOfResults': numberOfResults,
                    'overrideSearchType': 'HYBRID',
                     "filter": {
                            'andAll':[
                                {
                                "lessThan": {
                                "key": "CholesterolContent",
                                "value": 10
                                }
                            },
                                {
                            "lessThan": {
                                "key": "TotalTimeInMinutes",
                                "value": 30
                            }
                                }
                            ]
                        },
                }
        }
                    },
        'sort': 'KNOWLEDGE_BASE'
    }
    )
    
question = "Inform me a recipe that I could make underneath half-hour and has ldl cholesterol lower than 10"
response = retrieve_and_generate(question, kb_id,modelId, numberOfResults=10)
pp.pprint(response['output']['text'])

As we are able to see within the output beneath, the mannequin returns an in depth recipe that follows the indicated metadata filtering, has a preparation time of lower than half-hour, and has a ldl cholesterol content material of lower than 10.

clear up

If you happen to plan to make use of the information base created for constructing RAG purposes, remember to remark out the next sections. If you happen to simply wish to attempt constructing a information base utilizing the SDK, remember to delete any sources you could have created, as there are costs for storing information within the OpenSearch Serverless index. Please take a look at the next code:

bedrock_agent_client.delete_data_source(dataSourceId = ds["dataSourceId"], knowledgeBaseId=kb['knowledgeBaseId'])
bedrock_agent_client.delete_knowledge_base(knowledgeBaseId=kb['knowledgeBaseId'])
oss_client.indices.delete(index=index_name)
aoss_client.delete_collection(id=collection_id)
aoss_client.delete_access_policy(sort="knowledge", title=access_policy['accessPolicyDetail']['name'])
aoss_client.delete_security_policy(sort="community", title=network_policy['securityPolicyDetail']['name'])
aoss_client.delete_security_policy(sort="encryption", title=encryption_policy['securityPolicyDetail']['name'])
# Delete roles and polices 
iam_client.delete_role(RoleName=bedrock_kb_execution_role)
iam_client.delete_policy(PolicyArn=policy_arn)

in conclusion

On this article, we clarify methods to break up a big tabular knowledge set into rows to construct a information base with metadata for every document, and methods to filter the retrieval output by metadata. We additionally present how retrieval outcomes utilizing metadata will be extra correct than retrieval outcomes with out metadata filtering. Lastly, we present methods to use the outcomes of FM to acquire correct outcomes.

To additional discover the capabilities of the Amazon Bedrock Information Base, please seek advice from the next sources:

In regards to the writer

Tanai Chowdhury is an information scientist on the Amazon Net Providers Generative Synthetic Intelligence Innovation Heart. He makes use of generative synthetic intelligence and machine studying to assist shoppers resolve enterprise issues.

Source link

What's Hot

‘Rick and Morty’ Renewed for Season 12

Filmmakers worry about artificial intelligence. Big Tech wants them to see ‘what’s possible’

A Comprehensive Guide to Easy SWIFT Payments

A Comprehensive Guide to Easy SWIFT Payments

Brand Identity: Creating a Timeless Presence

Apple is trying to share – Apple IOS 15.1 review. – Action spy

Commonalities and Differences: Players in Innovation

9 Best Laptops of 2022 — Ideal for Professional Gamers and Students – Mobile Spy

Filmmakers worry about artificial intelligence. Big Tech wants them to see ‘what’s possible’

Don’t wait for Black Friday VPN deals: Get NordVPN for $2.99/month

Apple’s new feature lets brands put their own stamp on email and calls on iPhone

October 16 New York Times Mini Crossword Puzzle Answers

Anyone can turn you into an artificial intelligence chatbot. There’s nothing you can do to stop them

How DPG Media uses Amazon Bedrock and Amazon Transcribe to enhance video metadata with an AI-driven pipeline

Improve the robustness of your LLM applications using Amazon Bedrock Guardrails and Amazon Bedrock Agents

SK Telecom improves telecom-specific Q&A by fine-tuning Anthropic’s Claude model in Amazon Bedrock

Stop using Amazon Lookout for Metrics

Visier’s data science team increases model output 10x by migrating to Amazon SageMaker

‘Rick and Morty’ Renewed for Season 12

Toyota’s portable hydrogen tank eliminates need for charging infrastructure

How to Use Photoshop AI’s New Interference Removal Tool

Google inks deal with nuclear startup to power its artificial intelligence data center

New Samsung SSD sells for $0.07/GB during Amazon Prime Day sale

What is scrolling? Binance’s 60th Launchpool project

Ethereum dominates, NFT sales hit $85.9 million in one week

Why do people buy NFTs? Seven reasons explained

Beeple’s “Tree of Knowledge” Triptych: A Deep Dive into Digital Art Mastery | NFT Culture | NFT News | Web3 Culture

NFT market faces recession, Hamster Kombat announces NFT integration

Use the Amazon Bedrock knowledge base to perform metadata filtering on table data

How DPG Media uses Amazon Bedrock and Amazon Transcribe to enhance video metadata with an AI-driven pipeline

Improve the robustness of your LLM applications using Amazon Bedrock Guardrails and Amazon Bedrock Agents

SK Telecom improves telecom-specific Q&A by fine-tuning Anthropic’s Claude model in Amazon Bedrock

Stop using Amazon Lookout for Metrics

Leave A Reply Cancel Reply

Subscribe to Updates

What's Hot

Use the Amazon Bedrock knowledge base to perform metadata filtering on table data

Resolution overview

Put together knowledge for metadata filtering

Create supplies and metadata and convey them into the information base

Retrieve knowledge from the information base utilizing metadata filtering

clear up

in conclusion

In regards to the writer

Related Posts

Leave A Reply Cancel Reply