AWS Machine Learning Blog

Amazon Personalize is excited to announce automatic training for solutions. Solution training is fundamental to maintain the effectiveness of a model and make sure recommendations align with users’ evolving behaviors and preferences. As data patterns and trends change over time, retraining the solution with the latest relevant data enables the model to learn and adapt, enhancing its predictive accuracy. Automatic training generates a new solution version, mitigating model drift and keeping recommendations relevant and tailored to end-users’ current behaviors while including the newest items. Ultimately, automatic training provides a more personalized and engaging experience that adapts to changing preferences.
Amazon Personalize accelerates your digital transformation with machine learning (ML), making it effortless to integrate personalized recommendations into existing websites, applications, email marketing systems, and more. Amazon Personalize enables developers to quickly implement a customized personalization engine, without requiring ML expertise. Amazon Personalize provisions the necessary infrastructure and manages the entire ML pipeline, including processing the data, identifying features, using the appropriate algorithms, and training, optimizing, and hosting the customized models based on your data. All your data is encrypted to be private and secure.
In this post, we guide you through the process of configuring automatic training, so your solutions and recommendations maintain their accuracy and relevance.
Solution overview
A solution refers to the combination of an Amazon Personalize recipe, customized parameters, and one or more solution versions (trained models). When you create a custom solution, you specify a recipe matching your use case and configure training parameters. For this post, you configure automatic training in the training parameters.
Prerequisites
To enable automatic training for your solutions, you first need to set up Amazon Personalize resources. Start by creating a dataset group, schemas, and datasets representing your items, interactions, and user data. For instructions, refer to Getting Started (console) or Getting Started (AWS CLI).
After you finish importing your data, you are ready to create a solution.
Create a solution
To set up automatic training, complete the following steps:

On the Amazon Personalize console, create a new solution.
Specify a name for your solution, choose the type of solution you want to create, and choose your recipe.
Optionally, add any tags. For more information about tagging Amazon Personalize resources, see Tagging Amazon Personalize resources.
To use automatic training, in the Automatic training section, select Turn on and specify your training frequency.

Automatic training is enabled by default to train one time every 7 days. You can configure the training cadence to suit your business needs, ranging from one time every 1–30 days.

If your recipe generates item recommendations or user segments, optionally use the Columns for training section to choose the columns Amazon Personalize considers when training solution versions.
In the Hyperparameter configuration section, optionally configure any hyperparameter options based on your recipe and business needs.
Provide any additional configurations, then choose Next.
Review the solution details and confirm that your automatic training is configured as expected.
Choose Create solution.

Amazon Personalize will automatically create your first solution version. A solution version refers to a trained ML model. When a solution version is created for the solution, Amazon Personalize trains the model backing the solution version based on the recipe and training configuration. It can take up to 1 hour for the solution version creation to start.
The following is sample code for creating a solution with automatic training using the AWS SDK:

import boto3
personalize = boto3.client(‘personalize’)

solution_config = {
“autoTrainingConfig”: {
“schedulingExpression”: “rate(3 days)”
}
}

recipe = “arn:aws:personalize:::recipe/aws-similar-items”
name = “test_automatic_training”
response = personalize.create_solution(name=name, recipeArn=recipe_arn, datasetGroupArn=dataset_group_arn,
performAutoTraining=True, solutionConfig=solution_config)

print(response[‘solutionArn’])
solution_arn = response[‘solutionArn’])

After a solution is created, you can confirm whether automatic training is enabled on the solution details page.

You can also use the following sample code to confirm via the AWS SDK that automatic training is enabled:

response = personalize.describe_solution(solutionArn=solution_arn)
print(response)

Your response will contain the fields performAutoTraining and autoTrainingConfig, displaying the values you set in the CreateSolution call.
On the solution details page, you will also see the solution versions that are created automatically. The Training type column specifies whether the solution version was created manually or automatically.

You can also use the following sample code to return a list of solution versions for the given solution:

response = personalize.list_solution_versions(solutionArn=solution_arn)[‘solutionVersions’]
print(“List Solution Version responsen”)
for val in response:
print(f”SolutionVersion: {val}”)
print(“n”)

Your response will contain the field trainingType, which specifies whether the solution version was created manually or automatically.
When your solution version is ready, you can create a campaign for your solution version.
Create a campaign
A campaign deploys a solution version (trained model) to generate real-time recommendations. With Amazon Personalize, you can streamline your workflow and automate the deployment of the latest solution version to campaigns via automatic syncing. To set up auto sync, complete the following steps:

On the Amazon Personalize console, create a new campaign.
Specify a name for your campaign.
Choose the solution you just created.
Select Automatically use the latest solution version.
Set the minimum provisioned transactions per second.
Create your campaign.

The campaign is ready when its status is ACTIVE.
The following is sample code for creating a campaign with syncWithLatestSolutionVersion set to true using the AWS SDK. You must also append the suffix $LATEST to the solutionArn in solutionVersionArn when you set syncWithLatestSolutionVersion to true.

campaign_config = {
“syncWithLatestSolutionVersion”: True
}
resource_name = “test_campaign_sync”
solution_version_arn = “arn:aws:personalize:::solution//$LATEST”
response = personalize.create_campaign(name=resource_name, solutionVersionArn=solution_version_arn, campaignConfig=campaign_config)
campaign_arn = response[‘campaignArn’]
print(campaign_arn)

On the campaign details page, you can see whether the campaign selected has auto sync enabled. When enabled, your campaign will automatically update to use the most recent solution version, whether it was automatically or manually created.

Use the following sample code to confirm via the AWS SDK that syncWithLatestSolutionVersion is enabled:

response = personalize.describe_campaign(campaignArn=campaign_arn)
Print(response)

Your response will contain the field syncWithLatestSolutionVersion under campaignConfig, displaying the value you set in the CreateCampaign call.
You can enable or disable the option to automatically use the latest solution version on the Amazon Personalize console after a campaign is created by updating your campaign. Similarly, you can enable or disable syncWithLatestSolutionVersion with UpdateCampaign using the AWS SDK.
Conclusion
With automatic training, you can mitigate model drift and maintain recommendation relevance by streamlining your workflow and automating the deployment of the latest solution version in Amazon Personalize.
For more information about optimizing your user experience with Amazon Personalize, see the Amazon Personalize Developer Guide.

About the authors
Ba’Carri Johnson is a Sr. Technical Product Manager working with AWS AI/ML on the Amazon Personalize team. With a background in computer science and strategy, she is passionate about product innovation. In her spare time, she enjoys traveling and exploring the great outdoors.
Ajay Venkatakrishnan is a Software Development Engineer on the Amazon Personalize team. In his spare time, he enjoys writing and playing soccer.
Pranesh Anubhav is a Senior Software Engineer for Amazon Personalize. He is passionate about designing machine learning systems to serve customers at scale. Outside of his work, he loves playing soccer and is an avid follower of Real Madrid.
Go to Source
20/04/2024 – 03:03 /Ba’Carri Johnson
Twitter: @hoffeldtcom

AWS Machine Learning Blog

We are excited to announce a new version of the Amazon SageMaker Operators for Kubernetes using the AWS Controllers for Kubernetes (ACK). ACK is a framework for building Kubernetes custom controllers, where each controller communicates with an AWS service API. These controllers allow Kubernetes users to provision AWS resources like buckets, databases, or message queues simply by using the Kubernetes API.
Release v1.2.9 of the SageMaker ACK Operators adds support for inference components, which until now were only available through the SageMaker API and the AWS Software Development Kits (SDKs). Inference components can help you optimize deployment costs and reduce latency. With the new inference component capabilities, you can deploy one or more foundation models (FMs) on the same Amazon SageMaker endpoint and control how many accelerators and how much memory is reserved for each FM. This helps improve resource utilization, reduces model deployment costs on average by 50%, and lets you scale endpoints together with your use cases. For more details, see Amazon SageMaker adds new inference capabilities to help reduce foundation model deployment costs and latency.
The availability of inference components through the SageMaker controller enables customers who use Kubernetes as their control plane to take advantage of inference components while deploying their models on SageMaker.
In this post, we show how to use SageMaker ACK Operators to deploy SageMaker inference components.
How ACK works
To demonstrate how ACK works, let’s look at an example using Amazon Simple Storage Service (Amazon S3). In the following diagram, Alice is our Kubernetes user. Her application depends on the existence of an S3 bucket named my-bucket.

The workflow consists of the following steps:

Alice issues a call to kubectl apply, passing in a file that describes a Kubernetes custom resource describing her S3 bucket. kubectl apply passes this file, called a manifest, to the Kubernetes API server running in the Kubernetes controller node.
The Kubernetes API server receives the manifest describing the S3 bucket and determines if Alice has permissions to create a custom resource of kind s3.services.k8s.aws/Bucket, and that the custom resource is properly formatted.
If Alice is authorized and the custom resource is valid, the Kubernetes API server writes the custom resource to its etcd data store.
It then responds to Alice that the custom resource has been created.
At this point, the ACK service controller for Amazon S3, which is running on a Kubernetes worker node within the context of a normal Kubernetes Pod, is notified that a new custom resource of kind s3.services.k8s.aws/Bucket has been created.
The ACK service controller for Amazon S3 then communicates with the Amazon S3 API, calling the S3 CreateBucket API to create the bucket in AWS.
After communicating with the Amazon S3 API, the ACK service controller calls the Kubernetes API server to update the custom resource’s status with information it received from Amazon S3.

Key components
The new inference capabilities build upon SageMaker’s real-time inference endpoints. As before, you create the SageMaker endpoint with an endpoint configuration that defines the instance type and initial instance count for the endpoint. The model is configured in a new construct, an inference component. Here, you specify the number of accelerators and amount of memory you want to allocate to each copy of a model, together with the model artifacts, container image, and number of model copies to deploy.
You can use the new inference capabilities from Amazon SageMaker Studio, the SageMaker Python SDK, AWS SDKs, and AWS Command Line Interface (AWS CLI). They are also supported by AWS CloudFormation. Now you also can use them with SageMaker Operators for Kubernetes.
Solution overview
For this demo, we use the SageMaker controller to deploy a copy of the Dolly v2 7B model and a copy of the FLAN-T5 XXL model from the Hugging Face Model Hub on a SageMaker real-time endpoint using the new inference capabilities.
Prerequisites
To follow along, you should have a Kubernetes cluster with the SageMaker ACK controller v1.2.9 or above installed. For instructions on how to provision an Amazon Elastic Kubernetes Service (Amazon EKS) cluster with Amazon Elastic Compute Cloud (Amazon EC2) Linux managed nodes using eksctl, see Getting started with Amazon EKS – eksctl. For instructions on installing the SageMaker controller, refer to Machine Learning with the ACK SageMaker Controller.
You need access to accelerated instances (GPUs) for hosting the LLMs. This solution uses one instance of ml.g5.12xlarge; you can check the availability of these instances in your AWS account and request these instances as needed via a Service Quotas increase request, as shown in the following screenshot.

Create an inference component
To create your inference component, define the EndpointConfig, Endpoint, Model, and InferenceComponent YAML files, similar to the ones shown in this section. Use kubectl apply -f to create the Kubernetes resources.
You can list the status of the resource via kubectl describe ; for example, kubectl describe inferencecomponent.
You can also create the inference component without a model resource. Refer to the guidance provided in the API documentation for more details.
EndpointConfig YAML
The following is the code for the EndpointConfig file:

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: EndpointConfig
metadata:
name: inference-component-endpoint-config
spec:
endpointConfigName: inference-component-endpoint-config
executionRoleARN:
productionVariants:
– variantName: AllTraffic
instanceType: ml.g5.12xlarge
initialInstanceCount: 1
routingConfig:
routingStrategy: LEAST_OUTSTANDING_REQUESTS

Endpoint YAML
The following is the code for the Endpoint file:

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: Endpoint
metadata:
name: inference-component-endpoint
spec:
endpointName: inference-component-endpoint
endpointConfigName: inference-component-endpoint-config

Model YAML
The following is the code for the Model file:

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: Model
metadata:
name: dolly-v2-7b
spec:
modelName: dolly-v2-7b
executionRoleARN:
containers:
– image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi0.9.3-gpu-py39-cu118-ubuntu20.04
environment:
HF_MODEL_ID: databricks/dolly-v2-7b
HF_TASK: text-generation

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: Model
metadata:
name: flan-t5-xxl
spec:
modelName: flan-t5-xxl
executionRoleARN:
containers:
– image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi0.9.3-gpu-py39-cu118-ubuntu20.04
environment:
HF_MODEL_ID: google/flan-t5-xxl
HF_TASK: text-generation

InferenceComponent YAMLs
In the following YAML files, given that the ml.g5.12xlarge instance comes with 4 GPUs, we are allocating 2 GPUs, 2 CPUs and 1,024 MB of memory to each model:

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: InferenceComponent
metadata:
name: inference-component-dolly
spec:
inferenceComponentName: inference-component-dolly
endpointName: inference-component-endpoint
variantName: AllTraffic
specification:
modelName: dolly-v2-7b
computeResourceRequirements:
numberOfAcceleratorDevicesRequired: 2
numberOfCPUCoresRequired: 2
minMemoryRequiredInMb: 1024
runtimeConfig:
copyCount: 1

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: InferenceComponent
metadata:
name: inference-component-flan
spec:
inferenceComponentName: inference-component-flan
endpointName: inference-component-endpoint
variantName: AllTraffic
specification:
modelName: flan-t5-xxl
computeResourceRequirements:
numberOfAcceleratorDevicesRequired: 2
numberOfCPUCoresRequired: 2
minMemoryRequiredInMb: 1024
runtimeConfig:
copyCount: 1

Invoke models
You can now invoke the models using the following code:

import boto3
import json

sm_runtime_client = boto3.client(service_name=”sagemaker-runtime”)
payload = {“inputs”: “Why is California a great place to live?”}

response_dolly = sm_runtime_client.invoke_endpoint(
EndpointName=”inference-component-endpoint”,
InferenceComponentName=”inference-component-dolly”,
ContentType=”application/json”,
Accept=”application/json”,
Body=json.dumps(payload),
)
result_dolly = json.loads(response_dolly[‘Body’].read().decode())
print(result_dolly)

response_flan = sm_runtime_client.invoke_endpoint(
EndpointName=”inference-component-endpoint”,
InferenceComponentName=”inference-component-flan”,
ContentType=”application/json”,
Accept=”application/json”,
Body=json.dumps(payload),
)
result_flan = json.loads(response_flan[‘Body’].read().decode())
print(result_flan)

Update an inference component
To update an existing inference component, you can update the YAML files and then use kubectl apply -f . The following is an example of an updated file:

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: InferenceComponent
metadata:
name: inference-component-dolly
spec:
inferenceComponentName: inference-component-dolly
endpointName: inference-component-endpoint
variantName: AllTraffic
specification:
modelName: dolly-v2-7b
computeResourceRequirements:
numberOfAcceleratorDevicesRequired: 2
numberOfCPUCoresRequired: 4 # Update the numberOfCPUCoresRequired.
minMemoryRequiredInMb: 1024
runtimeConfig:
copyCount: 1

Delete an inference component
To delete an existing inference component, use the command kubectl delete -f .
Availability and pricing
The new SageMaker inference capabilities are available today in AWS Regions US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Jakarta, Mumbai, Seoul, Singapore, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Stockholm), Middle East (UAE), and South America (São Paulo). For pricing details, visit Amazon SageMaker Pricing.
Conclusion
In this post, we showed how to use SageMaker ACK Operators to deploy SageMaker inference components. Fire up your Kubernetes cluster and deploy your FMs using the new SageMaker inference capabilities today!

About the Authors
Rajesh Ramchander is a Principal ML Engineer in Professional Services at AWS. He helps customers at various stages in their AI/ML and GenAI journey, from those that are just getting started all the way to those that are leading their business with an AI-first strategy.
Amit Arora is an AI and ML Specialist Architect at Amazon Web Services, helping enterprise customers use cloud-based machine learning services to rapidly scale their innovations. He is also an adjunct lecturer in the MS data science and analytics program at Georgetown University in Washington D.C.
Suryansh Singh is a Software Development Engineer at AWS SageMaker and works on developing ML-distributed infrastructure solutions for AWS customers at scale.
Saurabh Trikande is a Senior Product Manager for Amazon SageMaker Inference. He is passionate about working with customers and is motivated by the goal of democratizing machine learning. He focuses on core challenges related to deploying complex ML applications, multi-tenant ML models, cost optimizations, and making deployment of deep learning models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch, and spending time with his family.
Johna Liu is a Software Development Engineer in the Amazon SageMaker team. Her current work focuses on helping developers efficiently host machine learning models and improve inference performance. She is passionate about spatial data analysis and using AI to solve societal problems.
Go to Source
19/04/2024 – 18:04 /Rajesh Ramchander
Twitter: @hoffeldtcom

AWS Machine Learning Blog

In Part 1 of this series, we presented a solution that used the Amazon Titan Multimodal Embeddings model to convert individual slides from a slide deck into embeddings. We stored the embeddings in a vector database and then used the Large Language-and-Vision Assistant (LLaVA 1.5-7b) model to generate text responses to user questions based on the most similar slide retrieved from the vector database. We used AWS services including Amazon Bedrock, Amazon SageMaker, and Amazon OpenSearch Serverless in this solution.
In this post, we demonstrate a different approach. We use the Anthropic Claude 3 Sonnet model to generate text descriptions for each slide in the slide deck. These descriptions are then converted into text embeddings using the Amazon Titan Text Embeddings model and stored in a vector database. Then we use the Claude 3 Sonnet model to generate answers to user questions based on the most relevant text description retrieved from the vector database.
You can test both approaches for your dataset and evaluate the results to see which approach gives you the best results. In Part 3 of this series, we evaluate the results of both methods.
Solution overview
The solution provides an implementation for answering questions using information contained in text and visual elements of a slide deck. The design relies on the concept of Retrieval Augmented Generation (RAG). Traditionally, RAG has been associated with textual data that can be processed by large language models (LLMs). In this series, we extend RAG to include images as well. This provides a powerful search capability to extract contextually relevant content from visual elements like tables and graphs along with text.
This solution includes the following components:

Amazon Titan Text Embeddings is a text embeddings model that converts natural language text, including single words, phrases, or even large documents, into numerical representations that can be used to power use cases such as search, personalization, and clustering based on semantic similarity.
Claude 3 Sonnet is the next generation of state-of-the-art models from Anthropic. Sonnet is a versatile tool that can handle a wide range of tasks, from complex reasoning and analysis to rapid outputs, as well as efficient search and retrieval across vast amounts of information.
OpenSearch Serverless is an on-demand serverless configuration for Amazon OpenSearch Service. We use OpenSearch Serverless as a vector database for storing embeddings generated by the Amazon Titan Text Embeddings model. An index created in the OpenSearch Serverless collection serves as the vector store for our RAG solution.
Amazon OpenSearch Ingestion (OSI) is a fully managed, serverless data collector that delivers data to OpenSearch Service domains and OpenSearch Serverless collections. In this post, we use an OSI pipeline API to deliver data to the OpenSearch Serverless vector store.

The solution design consists of two parts: ingestion and user interaction. During ingestion, we process the input slide deck by converting each slide into an image, generating descriptions and text embeddings for each image. We then populate the vector data store with the embeddings and text description for each slide. These steps are completed prior to the user interaction steps.
In the user interaction phase, a question from the user is converted into text embeddings. A similarity search is run on the vector database to find a text description corresponding to a slide that could potentially contain answers to the user question. We then provide the slide description and the user question to the Claude 3 Sonnet model to generate an answer to the query. All the code for this post is available in the GitHub repo.
The following diagram illustrates the ingestion architecture.

The workflow consists of the following steps:

Slides are converted to image files (one per slide) in JPG format and passed to the Claude 3 Sonnet model to generate text description.
The data is sent to the Amazon Titan Text Embeddings model to generate embeddings. In this series, we use the slide deck Train and deploy Stable Diffusion using AWS Trainium & AWS Inferentia from the AWS Summit in Toronto, June 2023 to demonstrate the solution. The sample deck has 31 slides, therefore we generate 31 sets of vector embeddings, each with 1536 dimensions. We add additional metadata fields to perform rich search queries using OpenSearch’s powerful search capabilities.
The embeddings are ingested into an OSI pipeline using an API call.
The OSI pipeline ingests the data as documents into an OpenSearch Serverless index. The index is configured as the sink for this pipeline and is created as part of the OpenSearch Serverless collection.

The following diagram illustrates the user interaction architecture.

The workflow consists of the following steps:

A user submits a question related to the slide deck that has been ingested.
The user input is converted into embeddings using the Amazon Titan Text Embeddings model accessed using Amazon Bedrock. An OpenSearch Service vector search is performed using these embeddings. We perform a k-nearest neighbor (k-NN) search to retrieve the most relevant embeddings matching the user query.
The metadata of the response from OpenSearch Serverless contains a path to the image and description corresponding to the most relevant slide.
A prompt is created by combining the user question and the image description. The prompt is provided to Claude 3 Sonnet hosted on Amazon Bedrock.
The result of this inference is returned to the user.

We discuss the steps for both stages in the following sections, and include details about the output.
Prerequisites
To implement the solution provided in this post, you should have an AWS account and familiarity with FMs, Amazon Bedrock, SageMaker, and OpenSearch Service.
This solution uses the Claude 3 Sonnet and Amazon Titan Text Embeddings models hosted on Amazon Bedrock. Make sure that these models are enabled for use by navigating to the Model access page on the Amazon Bedrock console.
If models are enabled, the Access status will state Access granted.

If the models are not available, enable access by choosing Manage model access, selecting the models, and choosing Request model access. The models are enabled for use immediately.
Use AWS CloudFormation to create the solution stack
You can use AWS CloudFormation to create the solution stack. If you have created the solution for Part 1 in the same AWS account, be sure to delete that before creating this stack.

AWS Region
Link

us-east-1

us-west-2

After the stack is created successfully, navigate to the stack’s Outputs tab on the AWS CloudFormation console and note the values for MultimodalCollectionEndpoint and OpenSearchPipelineEndpoint. You use these in the subsequent steps.

The CloudFormation template creates the following resources:

IAM roles – The following AWS Identity and Access Management (IAM) roles are created. Update these roles to apply least-privilege permissions, as discussed in Security best practices.

SMExecutionRole with Amazon Simple Storage Service (Amazon S3), SageMaker, OpenSearch Service, and Amazon Bedrock full access.
OSPipelineExecutionRole with access to the S3 bucket and OSI actions.

SageMaker notebook – All code for this post is run using this notebook.
OpenSearch Serverless collection – This is the vector database for storing and retrieving embeddings.
OSI pipeline – This is the pipeline for ingesting data into OpenSearch Serverless.
S3 bucket – All data for this post is stored in this bucket.

The CloudFormation template sets up the pipeline configuration required to configure the OSI pipeline with HTTP as source and the OpenSearch Serverless index as sink. The SageMaker notebook 2_data_ingestion.ipynb displays how to ingest data into the pipeline using the Requests HTTP library.
The CloudFormation template also creates network, encryption and data access policies required for your OpenSearch Serverless collection. Update these policies to apply least-privilege permissions.
The CloudFormation template name and OpenSearch Service index name are referenced in the SageMaker notebook 3_rag_inference.ipynb. If you change the default names, make sure you update them in the notebook.
Test the solution
After you have created the CloudFormation stack, you can test the solution. Complete the following steps:

On the SageMaker console, choose Notebooks in the navigation pane.
Select MultimodalNotebookInstance and choose Open JupyterLab.
In File Browser, traverse to the notebooks folder to see notebooks and supporting files.

The notebooks are numbered in the sequence in which they run. Instructions and comments in each notebook describe the actions performed by that notebook. We run these notebooks one by one.

Choose 1_data_prep.ipynb to open it in JupyterLab.
On the Run menu, choose Run All Cells to run the code in this notebook.

This notebook will download a publicly available slide deck, convert each slide into the JPG file format, and upload these to the S3 bucket.

Choose 2_data_ingestion.ipynb to open it in JupyterLab.
On the Run menu, choose Run All Cells to run the code in this notebook.

In this notebook, you create an index in the OpenSearch Serverless collection. This index stores the embeddings data for the slide deck. See the following code:

session = boto3.Session()
credentials = session.get_credentials()
auth = AWSV4SignerAuth(credentials, g.AWS_REGION, g.OS_SERVICE)

os_client = OpenSearch(
hosts = [{‘host’: host, ‘port’: 443}],
http_auth = auth,
use_ssl = True,
verify_certs = True,
connection_class = RequestsHttpConnection,
pool_maxsize = 20
)

index_body = “””
{
“settings”: {
“index.knn”: true
},
“mappings”: {
“properties”: {
“vector_embedding”: {
“type”: “knn_vector”,
“dimension”: 1536,
“method”: {
“name”: “hnsw”,
“engine”: “nmslib”,
“parameters”: {}
}
},
“image_path”: {
“type”: “text”
},
“slide_text”: {
“type”: “text”
},
“slide_number”: {
“type”: “text”
},
“metadata”: {
“properties” :
{
“filename” : {
“type” : “text”
},
“desc”:{
“type”: “text”
}
}
}
}
}
}
“””
index_body = json.loads(index_body)
try:
response = os_client.indices.create(index_name, body=index_body)
logger.info(f”response received for the create index -> {response}”)
except Exception as e:
logger.error(f”error in creating index={index_name}, exception={e}”)

You use the Claude 3 Sonnet and Amazon Titan Text Embeddings models to convert the JPG images created in the previous notebook into vector embeddings. These embeddings and additional metadata (such as the S3 path and description of the image file) are stored in the index along with the embeddings. The following code snippet shows how Claude 3 Sonnet generates image descriptions:

def get_img_desc(image_file_path: str, prompt: str):
# read the file, MAX image size supported is 2048 * 2048 pixels
with open(image_file_path, “rb”) as image_file:
input_image_b64 = image_file.read().decode(‘utf-8’)

body = json.dumps(
{
“anthropic_version”: “bedrock-2023-05-31”,
“max_tokens”: 1000,
“messages”: [
{
“role”: “user”,
“content”: [
{
“type”: “image”,
“source”: {
“type”: “base64”,
“media_type”: “image/jpeg”,
“data”: input_image_b64
},
},
{“type”: “text”, “text”: prompt},
],
}
],
}
)

response = bedrock.invoke_model(
modelId=g.CLAUDE_MODEL_ID,
body=body
)

resp_body = json.loads(response[‘body’].read().decode(“utf-8”))
resp_text = resp_body[‘content’][0][‘text’].replace(‘”‘, “‘”)

return resp_text

The image descriptions are passed to the Amazon Titan Text Embeddings model to generate vector embeddings. These embeddings and additional metadata (such as the S3 path and description of the image file) are stored in the index along with the embeddings. The following code snippet shows the call to the Amazon Titan Text Embeddings model:

def get_text_embedding(bedrock: botocore.client, prompt_data: str) -> np.ndarray:
body = json.dumps({
“inputText”: prompt_data,
})
try:
response = bedrock.invoke_model(
body=body, modelId=g.TITAN_MODEL_ID, accept=g.ACCEPT_ENCODING, contentType=g.CONTENT_ENCODING
)
response_body = json.loads(response[‘body’].read())
embedding = response_body.get(’embedding’)
except Exception as e:
logger.error(f”exception={e}”)
embedding = None

return embedding

The data is ingested into the OpenSearch Serverless index by making an API call to the OSI pipeline. The following code snippet shows the call made using the Requests HTTP library:

data = json.dumps([{
“image_path”: input_image_s3,
“slide_text”: resp_text,
“slide_number”: slide_number,
“metadata”: {
“filename”: obj_name,
“desc”: “”
},
“vector_embedding”: embedding
}])

r = requests.request(
method=’POST’,
url=osi_endpoint,
data=data,
auth=AWSSigV4(‘osis’))

Choose 3_rag_inference.ipynb to open it in JupyterLab.
On the Run menu, choose Run All Cells to run the code in this notebook.

This notebook implements the RAG solution: you convert the user question into embeddings, find a similar image description from the vector database, and provide the retrieved description to Claude 3 Sonnet to generate an answer to the user question. You use the following prompt template:

llm_prompt: str = “””

Human: Use the summary to provide a concise answer to the question to the best of your abilities. If you cannot answer the question from the context then say I do not know, do not make up an answer.

{question}

{summary}

Assistant:”””

The following code snippet provides the RAG workflow:

def get_llm_response(bedrock: botocore.client, question: str, summary: str) -> str:
prompt = llm_prompt.format(question=question, summary=summary)

body = json.dumps(
{
“anthropic_version”: “bedrock-2023-05-31”,
“max_tokens”: 1000,
“messages”: [
{
“role”: “user”,
“content”: [
{“type”: “text”, “text”: prompt},
],
}
],
})

try:
response = bedrock.invoke_model(
modelId=g.CLAUDE_MODEL_ID,
body=body)

response_body = json.loads(response[‘body’].read().decode(“utf-8”))
llm_response = response_body[‘content’][0][‘text’].replace(‘”‘, “‘”)

except Exception as e:
logger.error(f”exception while slide_text={summary[:10]}, exception={e}”)
llm_response = None

return llm_response

# create prompt and convert to embeddings question: str = “How does Inf2 compare in performance to comparable EC2 instances? I need numbers.” text_embedding = get_text_embedding(bedrock, question) # vector db search vector_db_response: Dict = find_similar_data(text_embeddings) # download image for local notebook display s3_img_path = vector_db_response.get(‘hits’, {}).get(‘hits’)[0].get(‘_source’).get(‘image_path’) logger.info(f”going to answer the question=”{question}” using the image “{s3_img_path}””) !aws s3 cp {s3_img_path} . local_img_path = os.path.basename(s3_img_path) display(filename=local_img_path) # Ask Claude 3 Sonnet slide_text = vector_db_response.get(‘hits’, {}).get(‘hits’)[0].get(‘_source’).get(‘slide_text’) llm_response = get_llm_response(bedrock, question, slide_text) print(llm_response)

Results
The following table contains some user questions and responses generated by our implementation. The Question column captures the user question, and the Answer column is the textual response generated by Claude 3 Sonnet. The Image column shows the k-NN slide match returned by the OpenSearch Serverless vector search.
Multimodal RAG results

Question
Answer
Image

How does Inf2 compare in performance to comparable EC2 instances? I need numbers.
Based on the information provided in the summary, the key performance comparison numbers given for the Amazon EC2 Inf2 instances compared to other instances are: 1. Up to 4x higher throughput. 2. Up to 10x lower latency. The summary does not provide specific numbers comparing the Inf2 instances to other EC2 instance types. However, it highlights the throughput and latency improvements as the main performance advantages of the Inf2 instances over comparable instances for running generative AI models.

As per the AI/ML flywheel, what do the AWS AI/ML services provide?
Based on the summary, the AWS AI/ML services provide a range of capabilities that fuel an AI/ML flywheel. This flywheel consists of innovation in developing new and better algorithms, enabling more use-cases and applications, driving wider adoption and demand, which in turn leads to further investment in research and optimization to drive more innovation. The AWS AI/ML services seem to offer the tools, resources, and infrastructure to support this continuous cycle of innovation, application development, adoption, and reinvestment in the field of artificial intelligence and machine learning.

Compared to GPT-2, how many more parameters does GPT-3 have? What is the numerical difference between the parameter size of GPT-2 and GPT-3?
According to the information provided in the summary, GPT-3 from 2020 had 175B (175 billion) parameters, while GPT-2 from 2019 had 1.5B (1.5 billion) parameters. Therefore, the numerical difference between the parameter size of GPT-2 and GPT-3 is 173.5 billion parameters. GPT-3 has approximately 115 times more parameters than GPT-2.

What are quarks in particle physics?
Unfortunately, the given summary does not contain any information about quarks in particle physics. The summary describes an image related to the progression of natural language processing and generative AI technologies, but it does not mention anything about particle physics or the concept of quarks.

Query your index
You can use OpenSearch Dashboards to interact with the OpenSearch API to run quick tests on your index and ingested data.

Cleanup
To avoid incurring future charges, delete the resources. You can do this by deleting the stack using the AWS CloudFormation console.

Conclusion
Enterprises generate new content all the time, and slide decks are a common way to share and disseminate information internally within the organization and externally with customers or at conferences. Over time, rich information can remain buried and hidden in non-text modalities like graphs and tables in these slide decks.
You can use this solution and the power of multimodal FMs such as the Amazon Titan Text Embeddings and Claude 3 Sonnet to discover new information or uncover new perspectives on content in slide decks. You can try different Claude models available on Amazon Bedrock by updating the CLAUDE_MODEL_ID in the globals.py file.
This is Part 2 of a three-part series. We used the Amazon Titan Multimodal Embeddings and the LLaVA model in Part 1. In Part 3, we will compare the approaches from Part 1 and Part 2.
Portions of this code are released under the Apache 2.0 License.

About the authors
Amit Arora is an AI and ML Specialist Architect at Amazon Web Services, helping enterprise customers use cloud-based machine learning services to rapidly scale their innovations. He is also an adjunct lecturer in the MS data science and analytics program at Georgetown University in Washington D.C.
Manju Prasad is a Senior Solutions Architect at Amazon Web Services. She focuses on providing technical guidance in a variety of technical domains, including AI/ML. Prior to joining AWS, she designed and built solutions for companies in the financial services sector and also for a startup. She is passionate about sharing knowledge and fostering interest in emerging talent.
Archana Inapudi is a Senior Solutions Architect at AWS, supporting a strategic customer. She has over a decade of cross-industry expertise leading strategic technical initiatives. Archana is an aspiring member of the AI/ML technical field community at AWS. Prior to joining AWS, Archana led a migration from traditional siloed data sources to Hadoop at a healthcare company. She is passionate about using technology to accelerate growth, provide value to customers, and achieve business outcomes.
Antara Raisa is an AI and ML Solutions Architect at Amazon Web Services, supporting strategic customers based out of Dallas, Texas. She also has previous experience working with large enterprise partners at AWS, where she worked as a Partner Success Solutions Architect for digital-centered customers.
Go to Source
19/04/2024 – 18:04 /Antara Raisa
Twitter: @hoffeldtcom

AWS Machine Learning Blog

This is a guest post co-written with the leadership team of Iambic Therapeutics.
Iambic Therapeutics is a drug discovery startup with a mission to create innovative AI-driven technologies to bring better medicines to cancer patients, faster.
Our advanced generative and predictive artificial intelligence (AI) tools enable us to search the vast space of possible drug molecules faster and more effectively. Our technologies are versatile and applicable across therapeutic areas, protein classes, and mechanisms of action. Beyond creating differentiated AI tools, we have established an integrated platform that merges AI software, cloud-based data, scalable computation infrastructure, and high-throughput chemistry and biology capabilities. The platform both enables our AI—by supplying data to refine our models—and is enabled by it, capitalizing on opportunities for automated decision-making and data processing.
We measure success by our ability to produce superior clinical candidates to address urgent patient need, at unprecedented speed: we advanced from program launch to clinical candidates in just 24 months, significantly faster than our competitors.
In this post, we focus on how we used Karpenter on Amazon Elastic Kubernetes Service (Amazon EKS) to scale AI training and inference, which are core elements of the Iambic discovery platform.
The need for scalable AI training and inference
Every week, Iambic performs AI inference across dozens of models and millions of molecules, serving two primary use cases:

Medicinal chemists and other scientists use our web application, Insight, to explore chemical space, access and interpret experimental data, and predict properties of newly designed molecules. All of this work is done interactively in real time, creating a need for inference with low latency and medium throughput.
At the same time, our generative AI models automatically design molecules targeting improvement across numerous properties, searching millions of candidates, and requiring enormous throughput and medium latency.

Guided by AI technologies and expert drug hunters, our experimental platform generates thousands of unique molecules each week, and each is subjected to multiple biological assays. The generated data points are automatically processed and used to fine-tune our AI models every week. Initially, our model fine-tuning took hours of CPU time, so a framework for scaling model fine-tuning on GPUs was imperative.
Our deep learning models have non-trivial requirements: they are gigabytes in size, are numerous and heterogeneous, and require GPUs for fast inference and fine-tuning. Looking to cloud infrastructure, we needed a system that allows us to access GPUs, scale up and down quickly to handle spiky, heterogeneous workloads, and run large Docker images.
We wanted to build a scalable system to support AI training and inference. We use Amazon EKS and were looking for the best solution to auto scale our worker nodes. We chose Karpenter for Kubernetes node auto scaling for a number of reasons:

Ease of integration with Kubernetes, using Kubernetes semantics to define node requirements and pod specs for scaling
Low-latency scale-out of nodes
Ease of integration with our infrastructure as code tooling (Terraform)

The node provisioners support effortless integration with Amazon EKS and other AWS resources such as Amazon Elastic Compute Cloud (Amazon EC2) instances and Amazon Elastic Block Store volumes. The Kubernetes semantics used by the provisioners support directed scheduling using Kubernetes constructs such as taints or tolerations and affinity or anti-affinity specifications; they also facilitate control over the number and types of GPU instances that may be scheduled by Karpenter.
Solution overview
In this section, we present a generic architecture that is similar to the one we use for our own workloads, which allows elastic deployment of models using efficient auto scaling based on custom metrics.
The following diagram illustrates the solution architecture.

The architecture deploys a simple service in a Kubernetes pod within an EKS cluster. This could be a model inference, data simulation, or any other containerized service, accessible by HTTP request. The service is exposed behind a reverse-proxy using Traefik. The reverse proxy collects metrics about calls to the service and exposes them via a standard metrics API to Prometheus. The Kubernetes Event Driven Autoscaler (KEDA) is configured to automatically scale the number of service pods, based on the custom metrics available in Prometheus. Here we use the number of requests per second as a custom metric. The same architectural approach applies if you choose a different metric for your workload.
Karpenter monitors for any pending pods that can’t run due to lack of sufficient resources in the cluster. If such pods are detected, Karpenter adds more nodes to the cluster to provide the necessary resources. Conversely, if there are more nodes in the cluster than what is needed by the scheduled pods, Karpenter removes some of the worker nodes and the pods get rescheduled, consolidating them on fewer instances. The number of HTTP requests per second and number of nodes can be visualized using a Grafana dashboard. To demonstrate auto scaling, we run one or more simple load-generating pods, which send HTTP requests to the service using curl.
Solution deployment
In the step-by-step walkthrough, we use AWS Cloud9 as an environment to deploy the architecture. This enables all steps to be completed from a web browser. You can also deploy the solution from a local computer or EC2 instance.
To simplify deployment and improve reproducibility, we follow the principles of the do-framework and the structure of the depend-on-docker template. We clone the aws-do-eks project and, using Docker, we build a container image that is equipped with the necessary tooling and scripts. Within the container, we run through all the steps of the end-to-end walkthrough, from creating an EKS cluster with Karpenter to scaling EC2 instances.
For the example in this post, we use the following EKS cluster manifest:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: do-eks-yaml-karpenter
version: ‘1.28’
region: us-west-2
tags:
karpenter.sh/discovery: do-eks-yaml-karpenter
iam:
withOIDC: true
addons:
– name: aws-ebs-csi-driver
version: v1.26.0-eksbuild.1
wellKnownPolicies:
ebsCSIController: true
managedNodeGroups:
– name: c5-xl-do-eks-karpenter-ng
instanceType: c5.xlarge
instancePrefix: c5-xl
privateNetworking: true
minSize: 0
desiredCapacity: 2
maxSize: 10
volumeSize: 300
iam:
withAddonPolicies:
cloudWatch: true
ebs: true
This manifest defines a cluster named do-eks-yaml-karpenter with the EBS CSI driver installed as an add-on. A managed node group with two c5.xlarge nodes is included to run system pods that are needed by the cluster. The worker nodes are hosted in private subnets, and the cluster API endpoint is public by default.
You could also use an existing EKS cluster instead of creating one. We deploy Karpenter by following the instructions in the Karpenter documentation, or by running the following script, which automates the deployment instructions.
The following code shows the Karpenter configuration we use in this example:
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
metadata: null
labels:
cluster-name: do-eks-yaml-karpenter
annotations:
purpose: karpenter-example
spec:
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
name: default
requirements:
– key: karpenter.sh/capacity-type
operator: In
values:
– spot
– on-demand
– key: karpenter.k8s.aws/instance-category
operator: In
values:
– c
– m
– r
– g
– p
– key: karpenter.k8s.aws/instance-generation
operator: Gt
values:
– ‘2’
disruption:
consolidationPolicy: WhenUnderutilized
#consolidationPolicy: WhenEmpty
#consolidateAfter: 30s
expireAfter: 720h

apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2
subnetSelectorTerms:
– tags:
karpenter.sh/discovery: “do-eks-yaml-karpenter”
securityGroupSelectorTerms:
– tags:
karpenter.sh/discovery: “do-eks-yaml-karpenter”
role: “KarpenterNodeRole-do-eks-yaml-karpenter”
tags:
app: autoscaling-test
blockDeviceMappings:
– deviceName: /dev/xvda
ebs:
volumeSize: 80Gi
volumeType: gp3
iops: 10000
deleteOnTermination: true
throughput: 125
detailedMonitoring: true
We define a default Karpenter NodePool with the following requirements:

Karpenter can launch instances from both spot and on-demand capacity pools
Instances must be from the “c” (compute optimized), “m” (general purpose), “r” (memory optimized), or “g” and “p” (GPU accelerated) computing families
Instance generation must be greater than 2; for example, g3 is acceptable, but g2 is not

The default NodePool also defines disruption policies. Underutilized nodes will be removed so pods can be consolidated to run on fewer or smaller nodes. Alternatively, we can configure empty nodes to be removed after the specified time period. The expireAfter setting specifies the maximum lifetime of any node, before it is stopped and replaced if necessary. This helps reduce security vulnerabilities as well as avoid issues that are typical for nodes with long uptimes, such as file fragmentation or memory leaks.
By default, Karpenter provisions nodes with a small root volume, which can be insufficient for running AI or machine learning (ML) workloads. Some of the deep learning container images can be tens of GB in size, and we need to make sure there is enough storage space on the nodes to run pods using these images. To do that, we define EC2NodeClass with blockDeviceMappings, as shown in the preceding code.
Karpenter is responsible for auto scaling at the cluster level. To configure auto scaling at the pod level, we use KEDA to define a custom resource called ScaledObject, as shown in the following code:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: keda-prometheus-hpa
namespace: hpa-example
spec:
scaleTargetRef:
name: php-apache
minReplicaCount: 1
cooldownPeriod: 30
triggers:
– type: prometheus
metadata:
serverAddress: http://prometheus- server.prometheus.svc.cluster.local:80
metricName: http_requests_total
threshold: ‘1’
query: rate(traefik_service_requests_total{service=”hpa-example-php-apache-80@kubernetes”,code=”200″}[2m])
The preceding manifest defines a ScaledObject named keda-prometheus-hpa, which is responsible for scaling the php-apache deployment and always keeps at least one replica running. It scales the pods of this deployment based on the metric http_requests_total available in Prometheus obtained by the specified query, and targets to scale up the pods so that each pod serves no more than one request per second. It scales down the replicas after the request load has been below the threshold for longer than 30 seconds.
The deployment spec for our example service contains the following resource requests and limits:
resources:
limits:
cpu: 500m
nvidia.com/gpu: 1
requests:
cpu: 200m
nvidia.com/gpu: 1
With this configuration, each of the service pods will use exactly one NVIDIA GPU. When new pods are created, they will be in Pending state until a GPU is available. Karpenter adds GPU nodes to the cluster as needed to accommodate the pending pods.
A load-generating pod sends HTTP requests to the service with a pre-set frequency. We increase the number of requests by increasing the number of replicas in the load-generator deployment.
A full scaling cycle with utilization-based node consolidation is visualized in a Grafana dashboard. The following dashboard shows the number of nodes in the cluster by instance type (top), the number of requests per second (bottom left), and the number of pods (bottom right).

We start with just the two c5.xlarge CPU instances that the cluster was created with. Then we deploy one service instance, which requires a single GPU. Karpenter adds a g4dn.xlarge instance to accommodate this need. We then deploy the load generator, which causes KEDA to add more service pods and Karpenter adds more GPU instances. After optimization, the state settles on one p3.8xlarge instance with 8 GPUs and one g5.12xlarge instance with 4 GPUs.
When we scale the load-generating deployment to 40 replicas, KEDA creates additional service pods to maintain the required request load per pod. Karpenter adds g4dn.metal and g4dn.12xlarge nodes to the cluster to provide the needed GPUs for the additional pods. In the scaled state, the cluster contains 16 GPU nodes and serves about 300 requests per second. When we scale down the load generator to 1 replica, the reverse process takes place. After the cooldown period, KEDA reduces the number of service pods. Then as fewer pods run, Karpenter removes the underutilized nodes from the cluster and the service pods get consolidated to run on fewer nodes. When the load generator pod is removed, a single service pod on a single g4dn.xlarge instance with 1 GPU remains running. When we remove the service pod as well, the cluster is left in the initial state with only two CPU nodes.
We can observe this behavior when the NodePool has the setting consolidationPolicy: WhenUnderutilized.
With this setting, Karpenter dynamically configures the cluster with as few nodes as possible, while providing sufficient resources for all pods to run and also minimizing cost.
The scaling behavior shown in the following dashboard is observed when the NodePool consolidation policy is set to WhenEmpty, along with consolidateAfter: 30s.

In this scenario, nodes are stopped only when there are no pods running on them after the cool-off period. The scaling curve appears smooth, compared to the utilization-based consolidation policy; however, it can be seen that more nodes are used in the scaled state (22 vs. 16).
Overall, combining pod and cluster auto scaling makes sure that the cluster scales dynamically with the workload, allocating resources when needed and removing them when not in use, thereby maximizing utilization and minimizing cost.
Outcomes
Iambic used this architecture to enable efficient use of GPUs on AWS and migrate workloads from CPU to GPU. By using EC2 GPU powered instances, Amazon EKS, and Karpenter, we were able to enable faster inference for our physics-based models and fast experiment iteration times for applied scientists who rely on training as a service.
The following table summarizes some of the time metrics of this migration.

Task
CPUs
GPUs

Inference using diffusion models for physics-based ML models
3,600 seconds
100 seconds (due to inherent batching of GPUs)

ML model training as a service
180 minutes
4 minutes

The following table summarizes some of our time and cost metrics.

Task
Performance/Cost

CPUs
GPUs

ML model training
240 minutes average $0.70 per training task
20 minutes average $0.38 per training task

Summary
In this post, we showcased how Iambic used Karpenter and KEDA to scale our Amazon EKS infrastructure to meet the latency requirements of our AI inference and training workloads. Karpenter and KEDA are powerful open source tools that help auto scale EKS clusters and workloads running on them. This helps optimize compute costs while meeting performance requirements. You can check out the code and deploy the same architecture in your own environment by following the complete walkthrough in this GitHub repo.

About the Authors
Matthew Welborn is the director of Machine Learning at Iambic Therapeutics. He and his team leverage AI to accelerate the identification and development of novel therapeutics, bringing life-saving medicines to patients faster.
Paul Whittemore is a Principal Engineer at Iambic Therapeutics. He supports delivery of the infrastructure for the Iambic AI-driven drug discovery platform.
Alex Iankoulski is a Principal Solutions Architect, ML/AI Frameworks, who focuses on helping customers orchestrate their AI workloads using containers and accelerated computing infrastructure on AWS.
Go to Source
19/04/2024 – 18:04 /Matthew Welborn
Twitter: @hoffeldtcom

MIT News – Artificial intelligence

To build AI systems that can collaborate effectively with humans, it helps to have a good model of human behavior to start with. But humans tend to behave suboptimally when making decisions.

This irrationality, which is especially difficult to model, often boils down to computational constraints. A human can’t spend decades thinking about the ideal solution to a single problem.

Researchers at MIT and the University of Washington developed a way to model the behavior of an agent, whether human or machine, that accounts for the unknown computational constraints that may hamper the agent’s problem-solving abilities.

Their model can automatically infer an agent’s computational constraints by seeing just a few traces of their previous actions. The result, an agent’s so-called “inference budget,” can be used to predict that agent’s future behavior.

In a new paper, the researchers demonstrate how their method can be used to infer someone’s navigation goals from prior routes and to predict players’ subsequent moves in chess matches. Their technique matches or outperforms another popular method for modeling this type of decision-making.

Ultimately, this work could help scientists teach AI systems how humans behave, which could enable these systems to respond better to their human collaborators. Being able to understand a human’s behavior, and then to infer their goals from that behavior, could make an AI assistant much more useful, says Athul Paul Jacob, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on this technique.

“If we know that a human is about to make a mistake, having seen how they have behaved before, the AI agent could step in and offer a better way to do it. Or the agent could adapt to the weaknesses that its human collaborators have. Being able to model human behavior is an important step toward building an AI agent that can actually help that human,” he says.

Jacob wrote the paper with Abhishek Gupta, assistant professor at the University of Washington, and senior author Jacob Andreas, associate professor in EECS and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). The research will be presented at the International Conference on Learning Representations.

Modeling behavior

Researchers have been building computational models of human behavior for decades. Many prior approaches try to account for suboptimal decision-making by adding noise to the model. Instead of the agent always choosing the correct option, the model might have that agent make the correct choice 95 percent of the time.

However, these methods can fail to capture the fact that humans do not always behave suboptimally in the same way.

Others at MIT have also studied more effective ways to plan and infer goals in the face of suboptimal decision-making.

To build their model, Jacob and his collaborators drew inspiration from prior studies of chess players. They noticed that players took less time to think before acting when making simple moves and that stronger players tended to spend more time planning than weaker ones in challenging matches.

“At the end of the day, we saw that the depth of the planning, or how long someone thinks about the problem, is a really good proxy of how humans behave,” Jacob says.

They built a framework that could infer an agent’s depth of planning from prior actions and use that information to model the agent’s decision-making process.

The first step in their method involves running an algorithm for a set amount of time to solve the problem being studied. For instance, if they are studying a chess match, they might let the chess-playing algorithm run for a certain number of steps. At the end, the researchers can see the decisions the algorithm made at each step.

Their model compares these decisions to the behaviors of an agent solving the same problem. It will align the agent’s decisions with the algorithm’s decisions and identify the step where the agent stopped planning.

From this, the model can determine the agent’s inference budget, or how long that agent will plan for this problem. It can use the inference budget to predict how that agent would react when solving a similar problem.

An interpretable solution

This method can be very efficient because the researchers can access the full set of decisions made by the problem-solving algorithm without doing any extra work. This framework could also be applied to any problem that can be solved with a particular class of algorithms.

“For me, the most striking thing was the fact that this inference budget is very interpretable. It is saying tougher problems require more planning or being a strong player means planning for longer. When we first set out to do this, we didn’t think that our algorithm would be able to pick up on those behaviors naturally,” Jacob says.

The researchers tested their approach in three different modeling tasks: inferring navigation goals from previous routes, guessing someone’s communicative intent from their verbal cues, and predicting subsequent moves in human-human chess matches.

Their method either matched or outperformed a popular alternative in each experiment. Moreover, the researchers saw that their model of human behavior matched up well with measures of player skill (in chess matches) and task difficulty.

Moving forward, the researchers want to use this approach to model the planning process in other domains, such as reinforcement learning (a trial-and-error method commonly used in robotics). In the long run, they intend to keep building on this work toward the larger goal of developing more effective AI collaborators.

This work was supported, in part, by the MIT Schwarzman College of Computing Artificial Intelligence for Augmentation and Productivity program and the National Science Foundation.
Go to Source
19/04/2024 – 06:08 /Adam Zewe | MIT News
Twitter: @hoffeldtcom

MIT News – Artificial intelligence

Although the troposphere is often thought of as the closest layer of the atmosphere to the Earth’s surface, the planetary boundary layer (PBL) — the lowest layer of the troposphere — is actually the part that most significantly influences weather near the surface. In the 2018 planetary science decadal survey, the PBL was raised as an important scientific issue that has the potential to enhance storm forecasting and improve climate projections.  

“The PBL is where the surface interacts with the atmosphere, including exchanges of moisture and heat that help lead to severe weather and a changing climate,” says Adam Milstein, a technical staff member in Lincoln Laboratory’s Applied Space Systems Group. “The PBL is also where humans live, and the turbulent movement of aerosols throughout the PBL is important for air quality that influences human health.” 

Although vital for studying weather and climate, important features of the PBL, such as its height, are difficult to resolve with current technology. In the past four years, Lincoln Laboratory staff have been studying the PBL, focusing on two different tasks: using machine learning to make 3D-scanned profiles of the atmosphere, and resolving the vertical structure of the atmosphere more clearly in order to better predict droughts.  

This PBL-focused research effort builds on more than a decade of related work on fast, operational neural network algorithms developed by Lincoln Laboratory for NASA missions. These missions include the Time-Resolved Observations of Precipitation structure and storm Intensity with a Constellation of Smallsats (TROPICS) mission as well as Aqua, a satellite that collects data about Earth’s water cycle and observes variables such as ocean temperature, precipitation, and water vapor in the atmosphere. These algorithms retrieve temperature and humidity from the satellite instrument data and have been shown to significantly improve the accuracy and usable global coverage of the observations over previous approaches. For TROPICS, the algorithms help retrieve data that are used to characterize a storm’s rapidly evolving structures in near-real time, and for Aqua, it has helped increase forecasting models, drought monitoring, and fire prediction. 

These operational algorithms for TROPICS and Aqua are based on classic “shallow” neural networks to maximize speed and simplicity, creating a one-dimensional vertical profile for each spectral measurement collected by the instrument over each location. While this approach has improved observations of the atmosphere down to the surface overall, including the PBL, laboratory staff determined that newer “deep” learning techniques that treat the atmosphere over a region of interest as a three-dimensional image are needed to improve PBL details further.

“We hypothesized that deep learning and artificial intelligence (AI) techniques could improve on current approaches by incorporating a better statistical representation of 3D temperature and humidity imagery of the atmosphere into the solutions,” Milstein says. “But it took a while to figure out how to create the best dataset — a mix of real and simulated data; we needed to prepare to train these techniques.”

The team collaborated with Joseph Santanello of the NASA Goddard Space Flight Center and William Blackwell, also of the Applied Space Systems Group, in a recent NASA-funded effort showing that these retrieval algorithms can improve PBL detail, including more accurate determination of the PBL height than the previous state of the art. 

While improved knowledge of the PBL is broadly useful for increasing understanding of climate and weather, one key application is prediction of droughts. According to a Global Drought Snapshot report released last year, droughts are a pressing planetary issue that the global community needs to address. Lack of humidity near the surface, specifically at the level of the PBL, is the leading indicator of drought. While previous studies using remote-sensing techniques have examined the humidity of soil to determine drought risk, studying the atmosphere can help predict when droughts will happen.  

In an effort funded by Lincoln Laboratory’s Climate Change Initiative, Milstein, along with laboratory staff member Michael Pieper, are working with scientists at NASA’s Jet Propulsion Laboratory (JPL) to use neural network techniques to improve drought prediction over the continental United States. While the work builds off of existing operational work JPL has done incorporating (in part) the laboratory’s operational “shallow” neural network approach for Aqua, the team believes that this work and the PBL-focused deep learning research work can be combined to further improve the accuracy of drought prediction. 

“Lincoln Laboratory has been working with NASA for more than a decade on neural network algorithms for estimating temperature and humidity in the atmosphere from space-borne infrared and microwave instruments, including those on the Aqua spacecraft,” Milstein says. “Over that time, we have learned a lot about this problem by working with the science community, including learning about what scientific challenges remain. Our long experience working on this type of remote sensing with NASA scientists, as well as our experience with using neural network techniques, gave us a unique perspective.”

According to Milstein, the next step for this project is to compare the deep learning results to datasets from the National Oceanic and Atmospheric Administration, NASA, and the Department of Energy collected directly in the PBL using radiosondes, a type of instrument flown on a weather balloon. “These direct measurements can be considered a kind of ‘ground truth’ to quantify the accuracy of the techniques we have developed,” Milstein says.

This improved neural network approach holds promise to demonstrate drought prediction that can exceed the capabilities of existing indicators, Milstein says, and to be a tool that scientists can rely on for decades to come.
Go to Source
18/04/2024 – 21:00 /Haley Wahl | MIT Lincoln Laboratory
Twitter: @hoffeldtcom

error: Content is protected !!