Tag Archives: AI News

Advertisements

This post is co-written with HyeKyung Yang, Jieun Lim, and SeungBum Shim from LotteON.

LotteON aims to be a platform that not only sells products, but also provides a personalized recommendation experience tailored to your preferred lifestyle. LotteON operates various specialty stores, including fashion, beauty, luxury, and kids, and strives to provide a personalized shopping experience across all aspects of customers’ lifestyles.

To enhance the shopping experience of LotteON’s customers, the recommendation service development team is continuously improving the recommendation service to provide customers with the products they are looking for or may be interested in at the right time.

In this post, we share how LotteON improved their recommendation service using Amazon SageMaker and machine learning operations (MLOps).

Problem definition

Traditionally, the recommendation service was mainly provided by identifying the relationship between products and providing products that were highly relevant to the product selected by the customer. However, it was necessary to upgrade the recommendation service to analyze each customer’s taste and meet their needs. Therefore, we decided to introduce a deep learning-based recommendation algorithm that can identify not only linear relationships in the data, but also more complex relationships. For this reason, we built the MLOps architecture to manage the created models and provide real-time services.

Another requirement was to build a continuous integration and continuous delivery (CI/CD) pipeline that can be integrated with GitLab, a code repository used by existing recommendation platforms, to add newly developed recommendation models and create a structure that can continuously improve the quality of recommendation services through periodic retraining and redistribution of models.

In the following sections, we introduce the MLOps platform that we built to provide high-quality recommendations to our customers and the overall process of inferring a deep learning-based recommendation algorithm (Neural Collaborative Filtering) in real time and introducing it to LotteON.

Solution architecture

The following diagram illustrates the solution architecture for serving Neural Collaborative Filtering (NCF) algorithm-based recommendation models as MLOps. The main AWS services used are SageMaker, Amazon EMR, AWS CodeBuild, Amazon Simple Storage Service (Amazon S3), Amazon EventBridge, AWS Lambda, and Amazon API Gateway. We’ve combined several AWS services using Amazon SageMaker Pipelines and designed the architecture with the following components in mind:

Data preprocessing
Automated model training and deployment
Real-time inference through model serving
CI/CD structure

The preceding architecture shows the MLOps data flow, which consists of three decoupled passes:

Code preparation and data preprocessing (blue)
Training pipeline and model deployment (green)
Real-time recommendation inference (brown)

Code preparation and data preprocessing

The preparation and preprocessing phase consists of the following steps:

The data scientist publishes the deployment code containing the model and the training pipeline to GitLab, which is used by LotteON, and Jenkins uploads the code to Amazon S3.
The EMR preprocessing batch runs through Airflow according to the specified schedule. The preprocessing data is loaded into MongoDB, which is used as a feature store along with Amazon S3.

Training pipeline and model deployment

The model training and deployment phase consists of the following steps:

After the training data is uploaded to Amazon S3, CodeBuild runs based on the rules specified in EventBridge.
The SageMaker pipeline predefined in CodeBuild runs, and sequentially runs steps such as preprocessing including provisioning, model training, and model registration.
When training is complete (through the Lambda step), the deployed model is updated to the SageMaker endpoint.

Real-time recommendation inference

The inference phase consists of the following steps:

The client application makes an inference request to the API gateway.
The API gateway sends the request to Lambda, which makes an inference request to the model in the SageMaker endpoint to request a list of recommendations.
Lambda receives the list of recommendations and provides them to the API gateway.
The API gateway provides the list of recommendations to the client application using the Recommendation API.

Recommendation model using NCF

NCF is an algorithm based on a paper presented at the International World Wide Web Conference in 2017. It is an algorithm that covers the limitations of linear matrix factorization, which is often used in existing recommendation systems, with collaborative filtering based on the neural net. By adding non-linearity through the neural net, the authors were able to model a more complex relationship between users and items. The data for NCF is interaction data where users react to items, and the overall structure of the model is shown in the following figure (source: https://arxiv.org/abs/1708.05031).

Although NCF has a simple model architecture, it has shown a good performance, which is why we chose it to be the prototype for our MLOps platform. For more information about the model, refer to the paper Neural Collaborative Filtering.

In the following sections, we discuss how this solution helped us build the aforementioned MLOps components:

Data preprocessing
Automating model training and deployment
Real-time inference through model serving
CI/CD structure

MLOps component 1: Data preprocessing

For NCF, we used user-item interaction data, which requires significant resources to process the raw data collected at the application and transform it into a form suitable for learning. With Amazon EMR, which provides fully managed environments like Apache Hadoop and Spark, we were able to process data faster.

The data preprocessing batches were created by writing a shell script to run Amazon EMR through AWS Command Line Interface (AWS CLI) commands, which we registered to Airflow to run at specific intervals. When the preprocessing batch was complete, the training/test data needed for training was partitioned based on runtime and stored in Amazon S3. The following is an example of the AWS CLI command to run Amazon EMR:

aws emr create-cluster –release-label emr-6.0.0
–name “CLUSTER_NAME”
–applications Name=Hadoop Name=Hive Name=Spark
–tags ‘Name=EMR-DATA-PREP’ ‘Owner=MODEL’ ‘Service=LOTTEON’
–ec2-attributes ‘{“KeyName”:”keyname”,”InstanceProfile”:”DefaultRole”,”ServiceAccessSecurityGroup”:”sg-xxxxxxxxxxxxxx”,”SubnetId”:”subnet- xxxxxxxxxxxxxx “,”EmrManagedSlaveSecurityGroup”:”sg- xxxxxxxxxxxxxx “,”EmrManagedMasterSecurityGroup”:”sg-xxxxxxxxxxxxxx “}’
–instance-groups ‘[{“InstanceCount”:1,”InstanceGroupType”:”MASTER”,”InstanceType”:”r5.xlarge”,”Name”:”Master Instance Group”},{“InstanceCount”:2,”InstanceGroupType”:”CORE”,”InstanceType”:”r5.xlarge”,”Name”:”Core Instance Group”},{“InstanceCount”:2,”BidPrice”:”OnDemandPrice”,”InstanceGroupType”:”TASK”,”InstanceType”:”r5.xlarge”,”Name”:”Task Instance Group”}]’
–service-role EMR_DefaultRole
–region ap-northeast-2
–steps Type=CUSTOM_JAR,Name=DATA_PREP,ActionOnFailure=CONTINUE,Jar=s3://ap-northeast-2.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[“s3://bucket/prefix/data_prep_batch.sh”]
–auto-terminate

MLOps component 2: Automated training and deployment of models

In this section, we discuss the components of the model training and deployment pipeline.

Event-based pipeline automation

After the preprocessing batch was complete and the training/test data was stored in Amazon S3, this event invoked CodeBuild and ran the training pipeline in SageMaker. In the process, the version of the result file of the preprocessing batch was recorded, enabling dynamic control of the version and management of the pipeline run history. We used EventBridge, Lambda, and CodeBuild to connect the data preprocessing steps run by Amazon EMR and the SageMaker learning pipeline on an event-based basis.

EventBridge is a serverless service that implements rules to receive events and direct them to destinations, based on the event patterns and destinations you establish. The initial role of EventBridge in our configuration was to invoke a Lambda function on the S3 object creation event when the preprocessing batch stored the training dataset in Amazon S3. The Lambda function dynamically modified the buildspec.yml file, which is indispensable when CodeBuild runs. These modifications encompassed the path, version, and partition information of the data that needed training, which is crucial for carrying out the training pipeline. The subsequent role of EventBridge was to dispatch events, instigated by the alteration of the buildspec.yml file, leading to running CodeBuild.

CodeBuild was responsible for building the source code where the SageMaker pipeline was defined. Throughout this process, it referred to the buildspec.yml file and ran processes such as cloning the source code and installing the libraries needed to build from the path defined in the file. The Project Build tab on the CodeBuild console allowed us to review the build’s success and failure history, along with a real-time log of the SageMaker pipeline’s performance.

SageMaker pipeline for training

SageMaker Pipelines helps you define the steps required for ML services, such as preprocessing, training, and deployment, using the SDK. Each step is visualized within SageMaker Studio, which is very helpful for managing models, and you can also manage the history of trained models and endpoints that can serve the models. You can also set up steps by attaching conditional statements to the results of the steps, so you can adopt only models with good retraining results or prepare for learning failures. Our pipeline contained the following high-level steps:

Model training
Model registration
Model creation
Model deployment

Each step is visualized in the pipeline in Amazon SageMaker Studio, and you can also see the results or progress of each step in real time, as shown in the following screenshot.

Let’s walk through the steps from model training to deployment, using some code examples.

Train the model

First, you define a PyTorch Estimator to use for training and a training step. This requires you to have the training code (for example, train.py) ready in advance and pass the location of the code as an argument of the source_dir. The training step runs the training code you pass as an argument of the entry_point. By default, the training is done by launching the container in the instance you specify, so you’ll need to pass in the path to the training Docker image for the training environment you’ve developed. However, if you specify the framework for your estimator here, you can pass in the version of the framework and Python version to use, and it will automatically fetch the version-appropriate container image from Amazon ECR.

When you’re done defining your PyTorch Estimator, you need to define the steps involved in training it. You can do this by passing the PyTorch Estimator you defined earlier as an argument and the location of the input data. When you pass in the location of the input data, the SageMaker training job will download the train and test data to a specific path in the container using the format /opt/ml/input/data/<channel_name> (for example, /opt/ml/input/data/train).

In addition, when defining a PyTorch Estimator, you can use metric definitions to monitor the learning metrics generated while the model is being trained with Amazon CloudWatch. You can also specify the path where the results of the model artifacts after training are stored by specifying estimator_output_path, and you can use the parameters required for model training by specifying model_hyperparameters. See the following code:

from sagemaker.pytorch import PyTorch
metric_definitions=[
{‘Name’: ‘HR’, ‘Regex’: ‘HR=(.*?);’},
{‘Name’: ‘NDCG’, ‘Regex’: ‘NDCG=(.*?);’},
{‘Name’: ‘Loss’, ‘Regex’: ‘Loss=(.*?);’}
]
estimator_output_path = f’s3://{bucket}/{prefix}’
model_hyperparameter = {‘epochs’: 10,
‘lr’: 0.001,
‘batch_size’: 256,
‘top_k’ : 10,
‘dropout’ : 0.3,
‘factor_num’ : 32,
‘num_layers’ : 3
}
s3_code_uri = ‘s3://code_location/source.tar.gz’

host_estimator = PyTorch(
entry_point=”train.py”,
source_dir = s3_code_uri,
output_path = estimator_output_path,
role=aws_role,
framework_version=’1.8.1′,
py_version=’py3′,
instance_count=1,
instance_type=’ml.p3.2xlarge’,
session = pipeline_session,
hyperparameters=model_hyperparameter,
metric_definitions = metric_definitions
)

from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep
data_loc = f’s3://{bucket}/{prefix}’
step_train = TrainingStep(
name= “NCF-Training”,
estimator=host_estimator,
inputs={
“train”: TrainingInput(s3_data=data_loc),
“test”: TrainingInput(s3_data=data_loc),
}
)

Create a model package group

The next step is to create a model package group to manage your trained models. By registering trained models in model packages, you can manage them by version, as shown in the following screenshot. This information allows you to reference previous versions of your models at any time. This process only needs to be done one time when you first train a model, and you can continue to add and update models as long as they declare the same group name.

See the following code:

import boto3
model_package_group_name = ‘NCF’
sm_client = boto3.client(“sagemaker”)
model_package_group_input_dict = {
“ModelPackageGroupName” : model_package_group_name,
“ModelPackageGroupDescription” : “Model Package Group”
}
response = sm_client.list_model_package_groups(NameContains=model_package_group_name)
if len(response[‘ModelPackageGroupSummaryList’]) == 0:
create_model_pacakge_group_response = sm_client.create_model_package_group(**model_package_group_input_dict)

Add a trained model to a model package group

The next step is to add a trained model to the model package group you created. In the following code, when you declare the Model class, you get the result of the previous model training step, which creates a dependency between the steps. A step with a declared dependency can only be run if the previous step succeeds. However, you can use the DependsOn option to declare a dependency between steps even if the data is not causally related.

After the trained model is registered in the model package group, you can use this information to manage and track future model versions, create a real-time SageMaker endpoint, run a batch transform job, and more.

from sagemaker.workflow.model_step import ModelStep
from sagemaker.model import Model

inference_image_uri = ‘763104351884.dkr.ecr.ap-northeast-2.amazonaws.com/pytorch-inference:1.8.1-gpu-py3’
model = Model(
image_uri=inference_image_uri,
model_data = step_train.properties.ModelArtifacts.S3ModelArtifacts,
role=role,
sagemaker_session=pipeline_session,
)

register_model_step_args = model.register(
content_types=[“text/csv”],
response_types=[“text/csv”],
model_package_group_name=model_package_group_name,
approval_status=’Approved’,
)

step_model_registration = ModelStep(
name=”RegisterModel”,
step_args=register_model_step_args
)

Create a SageMaker model

To create a real-time endpoint, an endpoint configuration and model is required. To create a model, you need two basic elements: an S3 address where the model’s artifacts are stored, and the path to the inference Docker image that will run the model’s artifacts.

When creating a SageMaker model, you must pay attention to the following steps:

Provide the result of the model training step, step_train.properties.ModelArtifacts.S3ModelArtifacts, which will be converted to the S3 path where the model artifact is stored, as an argument of the model_data.
Because you specified the PyTorchModel class, framework_version, and py_version, you use this information to get the path to the inference Docker image through Amazon ECR. This is the inference Docker image that is used for model deployment. Make sure to enter the same PyTorch framework, Python version, and other details that you used to train the model. This means keeping the same PyTorch and Python versions for training and inference.
Provide the inference.py as the entry point script to handle invocations.

This step will set a dependency on the model package registration step you defined via the DependsOn option.

from sagemaker.pytorch.model import PyTorchModel
from sagemaker.workflow.model_step import ModelStep

model_name = ‘NCF-MODEL’
s3_code_uri = ‘s3://code_location/source.tar.gz’

model_inference = PyTorchModel(
name = model_name,
model_data = step_train.properties.ModelArtifacts.S3ModelArtifacts,
image_uri= image_uri,
role=role,
entry_point= ‘inference.py’,
source_dir = s3_code_uri,
framework_version=’1.8.1′,
py_version=’py3′,
model_server_workers=1,
sagemaker_session=pipeline_session
)
step_model_create = ModelStep(
name=”ModelCreation”,
step_args=model_inference.create(instance_type = ‘ml.p3.2xlarge’),
depends_on=step_model_registration
)

Create a SageMaker endpoint

Now you need to define an endpoint configuration based on the created model, which will create an endpoint when deployed. Because the SageMaker Python SDK doesn’t support the step related to deployment (as of this writing), you can use Lambda to register that step. Pass the necessary arguments to Lambda, such as instance_type, and use that information to create the endpoint configuration first. Because you’re calling the endpoint based on endpoint_name, you need to make sure that variable is defined with a unique name. In the following Lambda function code, based on the endpoint_name, you update the model if the endpoint exists, and deploy a new one if it doesn’t:

# lambda_deploy_model.py
import json
import boto3
def lambda_handler(event, context):
sm_client = boto3.client(“sagemaker”)
model_name = event[“model_name”]
endpoint_config_name = event[“endpoint_config_name”]
endpoint_name = event[“endpoint_name”]
instance_type = event[“instance_type”]

create_endpoint_config_response = sm_client.create_endpoint_config(
EndpointConfigName=endpoint_config_name,
ProductionVariants=[
{
“InstanceType”: instance_type,
“InitialVariantWeight”: 1,
“InitialInstanceCount”: 1,
“ModelName”: model_name,
“VariantName”: “AllTraffic”,
}
],
)
print(f”create_endpoint_config_response: {create_endpoint_config_response}”)
existing_endpoints = sm_client.list_endpoints(NameContains=endpoint_name)[‘Endpoints’]
if len(existing_endpoints[“Endpoints”]) > 0:
sm_client.update_endpoint(
EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name
)
else:
sm_client.create_endpoint(
EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name
)
return {“statusCode”: 200, “body”: json.dumps(“Endpoint Created Successfully”)}

To get the Lambda function into a step in the SageMaker pipeline, you can use the SDK associated with the Lambda function. By passing the location of the Lambda function source as an argument of the function, you can automatically register and use the function. In conjunction with this, you can define LambdaStep and pass it the required arguments. See the following code:

from sagemaker.lambda_helper import Lambda
from sagemaker.workflow.lambda_step import (LambdaStep, LambdaOutput, LambdaOutputTypeEnum)
endpoint_name = ‘NCF-ENDPOINT’
endpoint_config_name = ‘NCF-CONF’
deploy_script_path = ‘s3://code_location/lambda_deploy_model.py’
deploy_model_func = Lambda(
function_name=’lambda-deploy-step’,
execution_role_arn=role,
script=deploy_script_path,
handler=”lambda_deploy_model.lambda_handler”
)
output_param_1 = LambdaOutput(output_name=”statusCode”, output_type=LambdaOutputTypeEnum.String)
output_param_2 = LambdaOutput(output_name=”body”, output_type=LambdaOutputTypeEnum.String)

step_deploy_lambda = LambdaStep(
name=”LambdaDeployStep”,
lambda_func=deploy_model_func,
inputs={
“model_name”: step_model_create.properties.ModelName,
“endpoint_config_name”: endpoint_config_name,
“endpoint_name”: endpoint_name,
“instance_type”: ‘ml.p3.2xlarge’,
},
outputs=[output_param_1, output_param_2]
)

Create a SageMaker pipeline

Now you can create a pipeline using the steps you defined. You can do this by defining a name for the pipeline and passing in the steps to be used in the pipeline as arguments. After that, you can run the defined pipeline through the start function. See the following code:

from sagemaker.workflow.pipeline import Pipeline
pipeline_name = ‘NCF-pipeline’
pipeline = Pipeline(
name=pipeline_name,
steps=[step_train, step_model_registration, step_model_create, step_deploy_lambda],
sagemaker_session=pipeline_session,
)

pipeline.start()

After this process is complete, an endpoint is created with the trained model and is ready for use based on the deep learning-based model.

MLOps component 3: Real-time inference with model serving

Now let’s see how to invoke the model in real time from the created endpoint, which can also be accessed using the SageMaker SDK. The following code is an example of getting real-time inference values for input values from an endpoint deployed via the invoke_endpoint function. The features you pass as arguments to the body are passed as input to the endpoint, which returns the inference results in real time.

import boto3
sagemaker_runtime = boto3.client(“sagemaker-runtime”)
endpoint_name=’NCF-ENDPOINT’

response = sagemaker_runtime.invoke_endpoint(
EndpointName=endpoint_name,
Body=bytes(“‘features’: ‘{“user”: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], “item”: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]}’}”)
)
print(response[‘Body’].read())

When we configured the inference function, we had it return the items in the order that the user is most likely to like among the items passed in. The preceding example returns items from 1–25 in order of likelihood of being liked by the user at index 0.

We added business logic to the feature, configured it in Lambda, and connected it with an API gateway to implement the API’s ability to return recommended items in real time. We then conducted performance testing of the online service. We load tested it with Locust using five g4dn.2xlarge instances and found that it could be reliably served in an environment with 1,000 TPS.

MLOps component 4: CI/CD structure

A CI/CD structure is a fundamental part of DevOps, and is also an important part of organizing an MLOps environment. AWS CodeCommit, AWS CodeBuild, AWS CodeDeploy, and AWS CodePipeline collectively provide all the functionality you need for CI/CD, from code shaping to deployment, build, and batch management. The services are not only linked to the same code series, but also to other services such as GitHub and Jenkins, so if you have an existing CI/CD structure, you can use them separately to fill in the gaps. Therefore, we expanded our CI/CD structure by linking only the CodeBuild configuration described earlier to our existing CI/CD pipeline.

We linked our SageMaker notebooks with GitLab for code management, and when we were done, we replicated them to Amazon S3 via Jenkins. After that, we set the S3 path to the default repository path of the NCF CodeBuild project as described earlier, so that we could build the project with CodeBuild.

Conclusion

So far, we’ve seen the end-to-end process of configuring an MLOps environment using AWS services and providing real-time inference services based on deep learning models. By configuring an MLOps environment, we’ve created a foundation for providing high-quality services based on various algorithms to our customers. We’ve also created an environment where we can quickly proceed with prototype development and deployment. The NCF we developed with the prototyping algorithm was also able to achieve good results when it was put into service. In the future, the MLOps platform can help us quickly develop and experiment with models that match LotteON data to provide our customers with a progressively higher-quality recommendation experience.

Using SageMaker in conjunction with various AWS services has given us many advantages in developing and operating our services. As model developers, we didn’t have to worry about configuring the environment settings for frequently used packages and deep learning-related frameworks because the environment settings were configured for each library, and we felt that the connectivity and scalability between AWS services using AWS CLI commands and related SDKs were great. Additionally, as a service operator, it was good to track and monitor the services we were running because CloudWatch connected the logging and monitoring of each service.

You can also check out the NCF and MLOps configuration for hands-on practice on our GitHub repo (Korean).

We hope this post will help you configure your MLOps environment and provide real-time services using AWS services.

About the Authors

SeungBum Shim is a data engineer in the Lotte E-commerce Recommendation Platform Development Team, responsible for discovering ways to use and improve recommendation-related products through LotteON data analysis, and developing MLOps pipelines and ML/DL recommendation models.

HyeKyung Yang is a research engineer in the Lotte E-commerce Recommendation Platform Development Team and is in charge of developing ML/DL recommendation models by analyzing and utilizing various data and developing a dynamic A/B test environment.

Jieun Lim is a data engineer in the Lotte E-commerce Recommendation Platform Development Team and is in charge of operating LotteON’s personalized recommendation system and developing personalized recommendation models and dynamic A/B test environments.

Jesam Kim is an AWS Solutions Architect and helps enterprise customers adopt and troubleshoot cloud technologies and provides architectural design and technical support to address their business needs and challenges, especially in AIML areas such as recommendation services and generative AI.

Gonsoo Moon is an AWS AI/ML Specialist Solutions Architect and provides AI/ML technical support. His main role is to collaborate with customers to solve their AI/ML problems based on various use cases and production experience in AI/ML.

GRN Roundup, Technology

What’s Your Story: Jacki O’Neill

May 16, 2024

Advertisements

In the Microsoft Research Podcast series What’s Your Story, Johannes Gehrke explores the who behind the technical and scientific advancements helping to reshape the world. A systems expert whose 10 years with Microsoft spans research and product, Gehrke talks to members of the company’s research community about what motivates their work and how they got where they are today.

In this episode, Gehrke is joined by Jacki O’Neill, director of Microsoft Research Africa, Nairobi (formerly the Microsoft Africa Research Institute, or MARI) in Kenya. O’Neill pitched the idea for the lab after seeing an opportunity to expand the Microsoft research portfolio. She shares how a desire to build tech that can have global societal impact and a familial connection to the continent factored into the decision; how a belief that life is meant to be exciting has allowed her to take big personal and professional swings; and how her team in Nairobi is applying their respective expertise in human-computer interaction, machine learning, and data science to pursue globally equitable AI.

To learn more about the global impact of AI, efforts to make AI more equitable, and related topics, register for Microsoft Research Forum (opens in new tab), a series of panel discussions and lightning talks around science and technology research in the era of general AI.

Learn more:

Jacki O’Neill at Microsoft Research

Microsoft Research Africa, Nairobi (formerly MARI)

Editor’s note, May 16, 2024 – Since the recording of this podcast episode, the name of the Microsoft Africa Research Institute (MARI) has changed. The name of the lab is now Microsoft Research Africa, Nairobi.

Transcript

[TEASER]

[MUSIC PLAYS UNDER DIALOGUE]

JACKI O’NEILL: I love living in different places, and those experiences are what help us innovate better and design things that are, like, taking another point of view, more creative, I think. Just sparks things in your, in your head. And, I mean, it’s so much fun.

[TEASER ENDS]

JOHANNES GEHRKE: Microsoft Research works at the cutting edge. But how much do we know about the people behind the science and technology that we create? This is What’s Your Story, and I’m Johannes Gehrke. In my 10 years with Microsoft, across product and research, I’ve been continuously excited and inspired by the people I work with, and I’m curious about how they became the talented and passionate people they are today. So I sat down with some of them. Now, I’m sharing their stories with you. In this podcast series, you’ll hear from them about how they grew up, the critical choices that shaped their lives, and their advice to others looking to carve a similar path.

[MUSIC FADES]

In this episode, I’m talking with Jacki O’Neill, director of the Microsoft Africa Research Institute—or MARI, for short—in Nairobi, Kenya. Jacki’s decadelong career at Microsoft began at the company’s India research lab, where she applied her ethnographic and human-computer interaction expertise to advancing equity in the country.

After the opening of two Microsoft software engineering centers in Africa, Jacki made the case for a research lab on the continent. She now leads the MARI team in making technology more inclusive, a role that allows her to pursue her goal of positive local change with global impact. Here’s my conversation with Jacki, beginning with her time growing up in Plymouth, England.

GEHRKE: We just had a discussion maybe a couple of years ago, right, when you were just in transition to Africa. So it’s really great to have you here and both learn a little bit what’s happening there, but also to learn a bit more about your story. Where did you grow up, and how did you end up here at Microsoft?

O’NEILL: Yeah, thanks for asking that. I’ve had a very, well, it’s definitely not been a straight road to get here, but the windy roads are the most interesting ones. I grew up in Plymouth, which is a dockyard and naval town in the southwest of England, so a socially deprived working-class town. So when I was growing up, it was a thriving working-class town, but of course with those industries, you know, they didn’t, they didn’t pass so well through those years. So, you know, by the time I was leaving school, it was quite a deprived city and still is. I think that it’s really important to be in those type of places, though, because you get a very rich view of life, and I left them as soon as I could, [LAUGHS] so …

GEHRKE: When you went to university?

O’NEILL: Went to, well, I went and I was a cook for a year in the Lake District, which is a very beautiful part of the UK, and then went to university.

GEHRKE: Where’s the Lake District?

O’NEILL: It is northwest, and it’s all hills. It’s, like, Wordsworth Country. It’s all hills and poetry and beautiful houses. And, yeah, it was a fantastic time working as a cook there. And then I went to Manchester to do my degree.

GEHRKE: OK. And what is your degree in?

O’NEILL: Ah, so, yes, I had, I did a social science degree to start with. I started at the time when you could get a degree in anything and get any job at the end of it. But by the time I came out of my degree, it was a recession.

GEHRKE: But did you have, did you have specific plans while you were studying of what you want, you know, what profession you wanted to go into?

O’NEILL: Not really. I didn’t. I think I’d, I think like many young people, I didn’t really know, but I felt that I would find something interesting when I came out. And then, you know, I just worked lots of different jobs. [LAUGHS]

GEHRKE: What is your favorite college course?

O’NEILL: My favorite college course—in my degree? Gosh, that’s a good question. It was all so long ago. [LAUGHS]

GEHRKE: OK …

O’NEILL: My favorite, I guess, yeah, no, I, so, I did … my degree was in psychology. I worked, and then I did my master’s in computer science and then my PhD in human-computer interaction.

GEHRKE: That’s quite a change, right, from psychology into computer science, then.

O’NEILL: Yes, yes. And I just, you know, I’d always just wanted to do computing, but when I was at school, it was … we had one computer in the school, and so it was, like, a computer at home or you don’t do computer science. So, you know, I didn’t do it.

GEHRKE: Right.

O’NEILL: So then as computers became more prominent, more available, you know, I was working in libraries, and they started computerizing, and I worked on that project, and then that led me to do a master’s. And so I was like, hey, this is the opportunity to really get into this area, and I loved it. It was fantastic. And Manchester’s computer science department is one of the top departments, and I had an amazing … Carole Goble was my thesis supervisor. She was absolutely amazing and strong for women in computing. But at the end of it, I was like, OK, so I didn’t want to do pure social science and I didn’t want to do pure computer science. What I want to do is do human-computer science, so where you really merge the two. And that’s how I got into HCI, and I think that’s where I started finding my favorite courses. You know, I loved the research methods. I loved those types of things.

GEHRKE: And what is your PhD about?

O’NEILL: Ooh, it was very boring. [LAUGHTER] My PhD was in computer-supported cooperative work [CSCW], and …

GEHRKE: OK. Oh, yeah. Very relevant now, right?

O’NEILL: Yeah, very relevant now. And that was a really exciting time for CSCW, as well, because there were so many different labs. There were Sun Systems, there was Xerox, there was Microsoft—all doing really cool, like, collaborative technologies. So it seemed like a brilliant area to go into. But I was looking at, can we support networking events for businesses?

GEHRKE: Wow. Uh-huh …

O’NEILL: So it was just at the time of the first, you know, things like Webex and things, you know, the first collaborative seminar-y …

GEHRKE: Yeah, so you’re way ahead of the social networks, right, and everything, right?

O’NEILL: Yeah, yeah.

GEHRKE: And there was a whole conference at that point in time, right? CSCW, I think I remember. Wasn’t there …

O’NEILL: Yes, yes, yes.

GEHRKE: So it was and still is, I think, a really big field.

O’NEILL: Yes, it’s a, it’s a, it’s really interesting. And I think one of the things that’s interesting with the foundational models now is many of the things that people like me, HCI people, have been wanting to happen—”Oh, if only we can enable people to interact with technology like this”—are now suddenly possible, which is quite exciting.

GEHRKE: Yeah, so we’ll get to that in a little bit because I think, you know, as you said, the whole field of HCI is now changing with foundational models and what the interfaces are, will be. I think it’s a really interesting, deep research question right now. So, so, OK, so you got your PhD; you’re in Manchester. What’s the next step in your career? Where did you go next?

O’NEILL: Yeah, I actually got a job before I finished my PhD. So I took quite a long time to do my PhD. I think it was seven years in the end, partly because I was teaching. When I was doing—like, lecturing when I was doing my PhD, and I also had a job as a consultant occasionally, working with, I think, I worked with the Co-op Bank. I worked with some usability companies, and you could, I could make enough money to live for a term on, like, two weeks’ consultancy because I didn’t have very high costs. [LAUGHS]

GEHRKE: Right. You lived as a grad student, right?

O’NEILL: Yes. Yeah. Yeah. And, actually, you know, I was living in Manchester. I was living in a squat, so I wasn’t paying any rent, [LAUGHS] so …

GEHRKE: Oh, really?

O’NEILL: Yes. So I didn’t have very many costs.

GEHRKE: OK.

O’NEILL: Which was very handy. So I didn’t have any real incentive to finish my PhD until I got a job, you know. When I finished my master’s, I looked at the job market, and with my computer science master’s, the main job was database manager, [LAUGHS] which didn’t appeal.

GEHRKE: That sounds now really interesting. [LAUGHTER]

O’NEILL: Yeah. So I, actually, that’s why I ended up doing a PhD, because I was like, I don’t want to go back to work yet. You know, I’ve been working for five years before. So, so, yeah, I just was enjoying doing a PhD and doing pieces of work here and there. And then I got a job at Xerox in Cambridge, and then that’s when I got motivated to finish my PhD because working and doing a PhD at the same time is not much fun.

GEHRKE: Right, right. So you got your PhD, had your job lined up, and then you’re starting at Xerox. What were you doing in Xerox?

O’NEILL: Human-computer interaction. Yeah, it was a really exciting time. There was so much going on in the industry. I was so delighted. It was like my dream job to be in industry and to maybe create cool interfaces and, you know, cool collaborative systems. So … and then they closed the lab [LAUGHS] within six months. It wasn’t my fault.

GEHRKE: So quickly?

O’NEILL: Mm-hmm.

GEHRKE: Wow. And what did you do then? I mean, this is your first big job, and …

O’NEILL: Yes …

GEHRKE: … such a quick setback.

O’NEILL: They offered me a job in their lab in France. So I stayed in the UK for a while and worked half in France, half in the UK, and then I shifted to France full time.

GEHRKE: OK. Oh, wow. So do you … where in France did you live then?

O’NEILL: Grenoble.

GEHRKE: OK, yeah. In the middle of …

O’NEILL: In the French Alps.

GEHRKE: … the French Alps. Exactly. Beautiful place.

O’NEILL: Absolutely … yes. Yeah. Skiing, climbing, hiking. So much fun.

GEHRKE: And, OK, so you’re at Xerox PARC in the French Alps. What’s, what’s next?

O’NEILL: They were opening, Xerox was opening a research lab in India. And I’d always wanted to travel. You know, I’d always wanted … and I never really had the money or the opportunity to travel. So when they said they were opening it, I just went to my boss and said, hey, I don’t know what you’d want me to do, but if there’s any opportunities for me to do anything to help …

GEHRKE: Wow.

O’NEILL: … the opening of India, I’d love to. And I went out for a month and then I went out for three months.

GEHRKE: I mean, both of these sound like really bold steps to me. First of all, I mean, Grenoble is probably pure French speaking, right? And, I don’t know, did you have high school French or you were good … [LAUGHS]

O’NEILL: I had high school French, yes, and then we drove, we drove from the UK to Grenoble listening to “learn French” tapes [LAUGHS] …

GEHRKE: OK, wow … [LAUGHS]

O’NEILL: …in the car. Yeah.

GEHRKE: Wow. And that was enough then to get by with a daily …

O’NEILL: Actually, so it was great in France because they expect you to learn the language, so you have French lessons at work. And then, actually, I did an evening class, as well, that was paid for by work, a really intensive one-month, like two hours a night, every night of the week. And that really helped. Yeah, it was, it’s fantastic.

GEHRKE: Wow, that’s really great. And then, and then you took the even bigger step to move to India, right. How was that like, and what was your experience there?

O’NEILL: Yeah, India is just magical. You know, initially, I just went for one month, then three months, and it was just—the people, the culture, the work I was doing, the research I was doing was like no research … you know, I’d spent a lot of time in call centers around Europe doing studies, ethnographic studies, and designing technology. Lots of time looking at photocopiers because I was with Xerox. [LAUGHS] And then so going to India, suddenly, you know, I’m looking at social enterprises. I’m looking at all sorts of businesses and different ways of life and different people. And it was just so rich and so amazing that I was like, OK, I really want to do this. And that’s actually when I applied to Microsoft because Microsoft had the Technology for Emerging Markets group there, which is world-class research in that space. So I was like, OK, if I want to keep on doing this, then that’s what I’m going to apply to. And luckily enough, I got the job, and that’s how I joined Microsoft.

GEHRKE: Wow. So, so, OK, so you’re now at Microsoft in India. That was in Bangalore, right, where our research lab there is?

O’NEILL: Mm-hmm.

GEHRKE: And so what, what were you working on there for the next few years?

O’NEILL: Yeah. So initially, I looked at a few different things. I joined some existing projects. So I was on MEC, which was the educational platform, looking at whether we could bring the power of MOOCs [Massive Open Online Courses] to Indian education to improve the level of education because they have amazing colleges at the top, but, actually, the vast majority of students go to these intermediate colleges, and the teaching level really varies. And so the idea was, can you help with blended learning? Can you help the teachers teach better? That turns out to be really challenging. And, actually, the system ended up being used by the students to teach themselves.

GEHRKE: Oh, like for independent learning?

O’NEILL: Mm-hmm. Mm-hmm. And that was really, so that was interesting, doing some studies there. I looked at … Indrani [Medhi Thies] had done an amazing project where they’d built “Facebook for Farmers.” So I did a study of that, which was really, really fun. And then I worked in financial inclusion, one of my big areas. I spent about five years working with auto-rickshaw drivers in Bangalore, designing technologies to help them understand the loans they’d taken out, which was really, really fun. They’re a very great community to work [with]. You don’t get any nonsense from an auto-rickshaw driver. [LAUGHS]

GEHRKE: Well, I was just thinking, what was it like to, like, live in India and just move there and start out there?

O’NEILL: Uh, it was, I mean, it was fantastic. It’s a great place to live. The people are amazing. The food is amazing. Moving with Microsoft makes it very easy because Microsoft takes care of you when you move so you’re not, you know, some of the stresses that you might have around the move are taken care of. I had a young family. I had a 2-year-old son when we moved out there and within a year had another one, which was not 100 percent planned, because you don’t usually move to a new company and then have a baby. You’re like, oh, sorry. [LAUGHS] But that was all fine. Yeah.

GEHRKE: And, and, you know, you worked with all of these different communities in India, right. How did you connect to the communities? I mean, these were teachers …

O’NEILL: Yeah, you need to, you really need to go with people, so you have to convince some organization that what you’re going to do is going to be beneficial to them and useful for them. And then if they’re trusted by the community, they give you access. And that’s really great because you do have access that you wouldn’t otherwise have. You know, if you’re really wanting to build technologies to support people, you really need to understand what they care about—what do they want help with?—and you only get that if you’ve got a trusted relationship with them. So we worked with, there was one organization that worked with the auto-rickshaw drivers’ wives. It was about empowering women, and we got access to the drivers initially through that organization.

GEHRKE: That’s amazing. I mean, you know, I’ve visited India many times, but I can only imagine how it is to live there, actually. So do you have some of the stories of what is, sort of, most surprising for you given that you’ve lived there?

O’NEILL: Yeah … what’s most surprising? I think, so one thing is, one thing is people want to tell you what they think you want to hear. So if you’re lost, you need to ask quite a few people for directions and then make some sort of assessment about whether the person was just saying “yes, yes, that way” because he knew the way or “yes, yes that way” because he just didn’t want to tell you that he didn’t know. And so you have to, sort of, judge. [LAUGHS] So that’s one, like, useful piece of …

GEHRKE: So the first few times you went in the wrong direction? [LAUGHS]

O’NEILL: Yes, exactly. And then you’re like, “But they said …”; you ask someone else, and they’re like, “No, it’s over there.” And then someone … so that’s … the most useful piece of advice I could give to anyone who’s visiting India, is when you cross the road, just find someone else who’s already crossing the road and cross with them.

GEHRKE: Because it’s so dangerous if you go by yourself potentially?

O’NEILL: Yes, yeah. You get used to it quite quickly, and there’s obviously something that changes in you when you’ve been there a while. You know, when you first go there, all the auto-rickshaw drivers are going to overcharge you and drive around the block twice and all of those things. And I find after about four to five weeks when you’ve been there, they know, like, there must be something that changes in your attitude because they actually know that you’re there longer term and you’re not going to take any nonsense.

GEHRKE: So, so do you behave differently? What’s the change there?

O’NEILL: I don’t know. That’s, I’ve tried to think about this, but I think, I don’t know, it must be just an air of confidence or an air of certainty or something. But, yeah, it’s like something just clicks or changes.

GEHRKE: That’s so interesting. Is it only for the drivers, or is it in other aspects of your life, as well, where, sort of, you get treated differently because you suddenly have become a native?

O’NEILL: I think you notice it most in the drivers because they’re the ones that you’re interacting so much with to get about, you know, to get … you’re always getting a tuk-tuk to go from here to there. And they really do, you know, if they can make extra money out of you, they are going to make extra money out of you.

GEHRKE: They smell it, that you’re a tourist.

O’NEILL: Yeah, yeah, yes. [LAUGHS]

GEHRKE: And then so you were in India and then another opportunity came along. So tell us a little bit about that opportunity, where you ended up now.

O’NEILL: Yes, yes. So when I heard that the ADCs were opening—the Africa Development Center, so our software engineering center in Nairobi and Lagos—I thought that that was a great time to pitch for research in Africa for Microsoft. It seemed like a bit of a hole in our portfolio. I have family connections to Africa. So, actually, one of the reasons for joining Microsoft was partly because I thought there might be opportunities eventually in Africa because we had a great Africa startup program, for example. So, you know, but there wasn’t any research there. And so when I heard the ADCs were open, I just put together a, like, pitch for setting up research in Africa within the ADCs, and, you know, all sorts of people really helped me hone that pitch. And then I flew at the end of February 2020. I flew …

GEHRKE: Oh, just right before the pandemic.

O’NEILL: Mm-hmm. I flew to … I was in Barcelona for a Future of Work event, and then I flew to Nairobi and then Lagos to meet the people who were running the ADCs and to think about where, which one I would want to set up research in if such a thing were to happen. And I did that. I decided that Nairobi was the right one. And when I went there, Jack Ngare ran the ADC, and he was so enthusiastic about having research there. So I did a pitch and got some funding just—I think if it had been two weeks later, I’m not sure. But, you know, it was just before we knew how bad COVID was going to be, so I was very lucky with timing.

GEHRKE: And, I mean, you’ve made these amazing moves throughout your career, right. You, sort of, raised your hand for India when the lab was open; now here in Africa. Why, and how? I’m just, I mean, so curious because people make the most unexpected turns in their careers from time to time. But it’s more like because, you know, they lose their current job or they, their manager moves away and they really think about their career. But you, like, raise your hand from time to time and make these really bold and amazing moves.

O’NEILL: Yeah, I mean, life’s meant to be exciting, isn’t it?

GEHRKE: OK …

O’NEILL: I think. You know, life’s meant to be exciting. I love living in different places and, you know, as an ethnographer, as a person interested in human-computer interaction, it’s, like, those experiences are what help us innovate better and design things that are, like, taking another point of view, more creative, I think. Like, just sparks things in your, in your head. And, I mean, it’s so much fun. Like, I don’t understand why everyone doesn’t do it. [LAUGHS]

GEHRKE: So it’s just really amazing. So if I think about, you know, India, where you said, right, the experience for you was that the drivers were treating you suddenly differently. Did you have a similar experience in Africa, or what is one of the or a few of the defining experiences and stories there?

O’NEILL: Yeah, I think … so the animals are amazing in Kenya. They’ve done such an amazing job at conservation. I imagine that they would, you would only see, like, these big animals in the national parks, but—they’re not everywhere. They’re not going to be, you’re not going to find a hippo walking down the road in Nairobi. But they are all over the place. So you can go camping in Lake Naivasha, which is just an hour and a half from Nairobi, and I was camping with a friend, and the kids were in their tent, and my friend was in her tent, and I was just sitting by the fire. It’s about 10 o’clock. I said, yeah, I might go to bed in a minute. And then I just heard this snort, and I get up with my torch, and I look, and there’s a hippo, [LAUGHS] like, probably less than a meter and a half …

GEHRKE: Wow …

O’NEILL: … away from me. So I carefully went and sat back down by the fire and waited for a while before I moved. [LAUGHS]

GEHRKE: So are they dangerous in that aspect, if you’ve startled them or so … ?

O’NEILL: Yeah, I think … they say that you should never get between a hippo and the water. So, luckily, I was on the other side of the, [LAUGHS] of the hippo and the water. But they are big. I mean, they can be very grumpy.

GEHRKE: And so you should, just, shouldn’t startle them or … ? I’m just trying to understand what’s the recommended behavior. Don’t get between the hippo and the water.

O’NEILL: Yes, that’s recommended, and don’t, yeah, don’t startle them, and just, you know, stay very, stay very calm. So, actually, when you’re camping, if you don’t have an electric fence around the campsite, then you shouldn’t come out of your tent at night. So don’t drink too much beer before you go to bed, [LAUGHTER] because it’s the “zip.” When you unzip it, you can really startle … If there’s any wild animals, lions, or whatever around, then you can really scare them. And you don’t want to scare a lion.

GEHRKE: Yeah, I was thinking, just, actually, about the lions or so, right. I mean, they could be probably even more dangerous than the hippos or, or not really?

O’NEILL: Hippos are actually more dangerous than lions. Yeah, lions will generally not attack you. And apparently, the thing—I haven’t had to try this, I’m glad to say—but the thing you should do if you encounter a lion is just look them in the eye, and then they’ll go off.

GEHRKE: Stare them down.

O’NEILL: Mm-hmm.

GEHRKE: OK.

O’NEILL: I hope I never have to try that because they are quite scary … [LAUGHS]

GEHRKE: I hope I never have to do that but good advice …

O’NEILL: Yes, yeah, yeah. I think hippos are more likely to charge at you. Like, a lion’s more likely to go off in the other direction.

GEHRKE: And what’s the daily life like, you know, living in Nairobi, right? I mean, is it, I mean, it must be very, very different from living in both India, as well as, you know, Great Britain or here.

O’NEILL: Yeah. I mean it is very different. The traffic’s bad but not as crazy as India. Like, I drive in Kenya. I didn’t drive in India because it was a bit too scary with the bikes and everything. It’s a really, it’s a really nice pace, I think, in Nairobi. It’s a beautiful city. There’s nightlife, and there’s cafes and restaurants, but you’ve got countryside so close. You know, compared to Bangalore, it’s quite a small city. And the weather is amazing, and the people are really friendly and kind, and, you know, it’s just, it’s a very nice, it’s a very nice place to live.

GEHRKE: That’s amazing, and you now are leading the Microsoft Africa Research Institute there, right?

O’NEILL: Yes.

GEHRKE: What is the focus of the institute, and what are you studying there?

O’NEILL: Mm-hmm. Yeah, we’re mainly focused on foundational models. It won’t be a surprise to anybody. [LAUGHS] Which actually, you know, it’s worked out very well for us because, you know, we have a mixed disciplinary team. We have HCI and AI and ML and data science.

GEHRKE: And all local?

O’NEILL: All local. Yeah. And, yeah, we’re looking at multilingual languages in models. So we’re working with MSR [Microsoft Research] India, thinking about how can you benchmark these models for different languages. And we’re thinking all the way along the scale from your high-resource, you know, French and German, to your mid-resource Swahili, Hindi, all the way to your low-resource languages because, you know, the vast majority of training data is in English. So we’ve been working a lot. That’s nice because we’re having, you know, in a very short amount of time, you know, four or five months, we’re having both scientific impact with papers but also product impact, working with the Copilot Language Globalization team as they’re rolling out Copilot in different languages.

GEHRKE: I see. So the research that you have will go into, let’s say, Word or PowerPoint or so to make it available in some of the languages from the continent.

O’NEILL: Yes, exactly. Because it’s not just about translation. It’s also if you think about RAI, responsible AI, you know, a lot of that is language based. And so how do … you can’t just translate this to words. You have to find the right list of words in those languages. And then what about things like tone and stuff? So that’s one area. And then related to that, it’s in a much bigger space of equity, the models and equity. You know, what’s going to happen to the digital divide with these models? In some ways, you could imagine that they may be flattening it, but in other ways, they could be increasing it. So we really are trying to map out how … the different elements of the digital divide as it plays out in these models. Because you obviously have your traditional things like access to devices, access to, you know, infrastructure, and things like that. But there’s also the data divide. So not only is most of the training material in English; it’s also mostly from America and the Global North. So it embodies very particular world views. And if you think about data on Africa, data on Africa tends to be collected by particular organizations. So there’s lots of data on poverty and disease and forced migration and things like that. Not much data on, like, the stories, the creativity, wealth, innovation. So what does that mean? Even if the models can speak perfectly, which they can’t yet, but they’ll eventually get quite good at, you know, even smaller languages like Luo, if that model is just translating English content into Luo, that’s not necessarily what we want from a model. So there’s some really interesting questions there to be answered.

GEHRKE: Well, it seems to me like it’s clearly also a question of, like, getting the right kind of data. So where do you get the data, and how do you get the data?

O’NEILL: Yeah, that’s a big question. And it was already a challenge, you know, before these models. You know, many people have been working with Masakhane, which is one of the African NLP communities which is around creating datasets in African languages for training the models. So that was, you know, getting good quality training data is already a challenge. Sriram [Rajamani] from MSR India, though, was telling me of a really interesting project they’ve got going on in India with the Indian government where they are trying to collect data from each region of India so that they can use it to train the OpenAI models, which would be really cool. And we should think about, is that what we can do for different African countries and contexts?

GEHRKE: Exactly. It seems to be very much like a citizen science project, right, where you, sort of, involve the citizens that speak different dialects and then involve them in collecting the right kind of data.

O’NEILL: Yeah, yeah. And maybe collecting the stories, you know, and the cultural attributes and assets from different places.

GEHRKE: That’ll be really, really exciting probably also about preservation of the culture and history, right.

O’NEILL: Yes, yes. But challenging.

GEHRKE: But challenging. [LAUGHTER]

O’NEILL: Yeah.

GEHRKE: So that’s one big aspect of the work. Anything else that’s happening there?

O’NEILL: Yeah. So we’re doing a lot of work, you’ll be unsurprised to hear, on Future of Work and AI. And so we’ve got a project on modern work and LLMs, so looking at the work that enterprise workers, frontline and knowledge workers, are doing and then what bits of their job they would like to get rid of if they could and what bits they would keep and how we can use LLMs to support them. And we’ve also, like, Maxamed [Axmed] on my team, also worked with The Garage to train them up in foundational models, both the LLMs and the vision models, and then they’ve introduced them to a whole load of small businesses in Kenya.

GEHRKE: Oh, wow.

O’NEILL: So that’s really interesting. You got everyone from like car salespeople to lawyers who are now using, like, LLMs as part of their everyday work, which is amazing.

GEHRKE: As part of like composing messages or part of … what’s …

O’NEILL: Yeah. Writing contracts, sales documents for cars, all sorts of really interesting things.

GEHRKE: Oh, wow.

O’NEILL: So we’re going to go out and look at what they’re doing and think about how, you know, what else is needed, what, what more do they need.

GEHRKE: What’s the prevalent form factor in terms of if I think about, like, a computer there? Is it my, is it a mobile phone? Is it a tablet?

O’NEILL: Yeah.

GEHRKE: It’s a mobile phone?

O’NEILL: It’s a mobile phone. Yeah.

GEHRKE: So you have to rethink also, probably, all the interfaces.

O’NEILL: Yes, I mean …

GEHRKE: You mentioned that early on, right, as you think about the next generation of HCI with AI in it, right.

O’NEILL: Yes, yes. I mean conversational interfaces. The idea that you can talk to your phone or enter existing text, you know. If you look at small businesses, a lot of their interactions with customers are on chat. If you can enter that chat into an LLM and extract structured data from it, then suddenly you’ve got all this data that’s been lost to the business becomes usable. So it’s a really exciting space, and I think voice interfaces are going to become really, really, really big. And that’s why there’s opportunities for leapfrogging, because suddenly everyone with a mobile phone potentially has a really powerful office productivity tool in their hand and can do things … you know, many of the small businesses, they don’t employ a designer; they don’t employ an accountant. But now they could maybe have an accountant or a designer in their pocket, which enables them to do more, which is definitely the more positive side of the future of work than some of the …

GEHRKE: Right. You know, this whole enablement story of people is just really amazing, what you can do with LLMs and especially with voice interfaces, as well. Let me conclude maybe with a question about your career. I mean, it seems like you’ve always amazingly managed to somewhat align your career moves with your passion. You moved to India because you’re just excited to live in India. You moved then to, you know, Microsoft Research, but then you moved to Africa again for, what I hear, is a little bit the adventure, as well, right?

O’NEILL: Yes.

GEHRKE: So what’s your advice for people who want to, sort of, align these two and who want to not only work but also want to work on something they’re really passionate about? How do you manage to create that alignment?

O’NEILL: That is a good question. I don’t know. It just, sort of, happens. I mean, I think you have to, you have to be passionate about it; you have to talk about it and decide what you want to do. You know, I never really imagined MARI would happen. But I just started talking to people, and people were saying, before I did the pitch, people were saying to me, oh, what would you like to do in five years, Jacki? And I was like, oh, you know what? If I had my way, I’d love to run a research center in Africa. And then within a couple of years … it was nothing more than an idea in my head. So I think that you have to have the ideas, verbalize it, and maybe it can happen.

GEHRKE: And why a research center in Africa? What’s personal for you there?

O’NEILL: So my children are African; my children are Cameroonian. So I wanted them to grow, spend some time on the continent, and, you know, as a family, we’d always had that idea of moving to the continent eventually. So that was part, that was a personal motivation in there as well as the passion. Yeah.

GEHRKE: So it’s, well, sort of, the confluence of, I guess, opportunity but then also drive on your side? Because that’s what I’ve heard. Very often in careers, that it’s not only about, well, this is what I finally want to do but also watching out for that opportunity.

O’NEILL: Yes.

GEHRKE: So it seems like that played a big role here, as well. And so when you heard about, you know, that there was an Africa Development Center, how did you, what were your next steps then? I mean, you must have been excited, but you also had to take some action.

O’NEILL: Yeah, I mean, I created, [LAUGHS] I created a small pitch, a small set of slides, and then I just started talking to everybody I knew who was doing anything. I didn’t have any contact with the ADCs.

GEHRKE: So you created that energy and excitement about it?

O’NEILL: I just started to, you know, every time anyone would come to India, you know, I was just like, oh, this is what I’d like to do. And you just almost talk it into being, I think.

GEHRKE: And were there some setbacks, or was it just like a straight line from, sort of, the excitement all the way up to realization?

O’NEILL: No, I mean, I didn’t, I don’t think I ever really imagined it would happen, you know. But you’re just doing it, and you’re plugging away, and then taking the, you know, taking the advice of people.

GEHRKE: Really an awesome story. So maybe as a last question, where do you see the center being in like three to five years? I mean, you’re starting off right now, but I’m sure you have really big ambitions for the center, and there’s so much to do on the whole continent.

O’NEILL: No, absolutely. I think that I have a few ambitions. So the most important, I think, I want it to be really established as this thing that’s really beneficial to Microsoft, that Microsoft is like, really, “Yeah, the guys at MARI, they’re doing great research. We really like them.” So that it, sort of, exists without me, you know. At the moment, I think I’m the driver of it. I would …

GEHRKE: So you want to grow the next generation that is basically going to be the next generation of leaders?

O’NEILL: Yes, exactly, exactly. And then I think also grow, I would love to help in growing Microsoft’s market in Africa. We don’t have a particularly big market in Africa, but I think there’s a lot of opportunity, especially now with these, with these large language models. I think that we … so that would be really exciting, you know, if we can help. I don’t see our success only being about growing the African market, but I think it’s part of what we can do, and if we can grow that market, as well as do research that’s relevant for Redmond and relevant globally, that’s really, that’s really exciting, I think, you know. So everything we do, I think, has to have a relevance globally. And I think, you know, at the beginning I was talking about different ways of viewing the world and how that leads to innovation. I think by having researchers who are African, based in Africa, doing this great research, we can create better products for everyone.

GEHRKE: That’s such a great finishing note. Thank you so much for the great conversation, Jacki.

O’NEILL: Thank you, Johannes. It’s been fun.

[MUSIC]

To learn more about Jacki or to see photos of Jacki living and working abroad, visit aka.ms/ResearcherStories (opens in new tab).

[MUSIC FADES]

Opens in a new tab

The post What’s Your Story: Jacki O’Neill appeared first on Microsoft Research.

GRN Roundup, Technology

How to Set your Self Free of the Matrix of Ideology

May 16, 2024

Advertisements

Originally Published on Stefan Speaks

Continue reading on Becoming Human: Artificial Intelligence Magazine »

GRN Roundup, Technology

Scientists use generative AI to answer complex questions in physics

May 16, 2024

Advertisements

When water freezes, it transitions from a liquid phase to a solid phase, resulting in a drastic change in properties like density and volume. Phase transitions in water are so common most of us probably don’t even think about them, but phase transitions in novel materials or complex physical systems are an important area of study.

To fully understand these systems, scientists must be able to recognize phases and detect the transitions between. But how to quantify phase changes in an unknown system is often unclear, especially when data are scarce.

Researchers from MIT and the University of Basel in Switzerland applied generative artificial intelligence models to this problem, developing a new machine-learning framework that can automatically map out phase diagrams for novel physical systems.

Their physics-informed machine-learning approach is more efficient than laborious, manual techniques which rely on theoretical expertise. Importantly, because their approach leverages generative models, it does not require huge, labeled training datasets used in other machine-learning techniques.

Such a framework could help scientists investigate the thermodynamic properties of novel materials or detect entanglement in quantum systems, for instance. Ultimately, this technique could make it possible for scientists to discover unknown phases of matter autonomously.

“If you have a new system with fully unknown properties, how would you choose which observable quantity to study? The hope, at least with data-driven tools, is that you could scan large new systems in an automated way, and it will point you to important changes in the system. This might be a tool in the pipeline of automated scientific discovery of new, exotic properties of phases,” says Frank Schäfer, a postdoc in the Julia Lab in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and co-author of a paper on this approach.

Joining Schäfer on the paper are first author Julian Arnold, a graduate student at the University of Basel; Alan Edelman, applied mathematics professor in the Department of Mathematics and leader of the Julia Lab; and senior author Christoph Bruder, professor in the Department of Physics at the University of Basel. The research is published today in Physical Review Letters.

Detecting phase transitions using AI

While water transitioning to ice might be among the most obvious examples of a phase change, more exotic phase changes, like when a material transitions from being a normal conductor to a superconductor, are of keen interest to scientists.

These transitions can be detected by identifying an “order parameter,” a quantity that is important and expected to change. For instance, water freezes and transitions to a solid phase (ice) when its temperature drops below 0 degrees Celsius. In this case, an appropriate order parameter could be defined in terms of the proportion of water molecules that are part of the crystalline lattice versus those that remain in a disordered state.

In the past, researchers have relied on physics expertise to build phase diagrams manually, drawing on theoretical understanding to know which order parameters are important. Not only is this tedious for complex systems, and perhaps impossible for unknown systems with new behaviors, but it also introduces human bias into the solution.

More recently, researchers have begun using machine learning to build discriminative classifiers that can solve this task by learning to classify a measurement statistic as coming from a particular phase of the physical system, the same way such models classify an image as a cat or dog.

The MIT researchers demonstrated how generative models can be used to solve this classification task much more efficiently, and in a physics-informed manner.

The Julia Programming Language, a popular language for scientific computing that is also used in MIT’s introductory linear algebra classes, offers many tools that make it invaluable for constructing such generative models, Schäfer adds.

Generative models, like those that underlie ChatGPT and Dall-E, typically work by estimating the probability distribution of some data, which they use to generate new data points that fit the distribution (such as new cat images that are similar to existing cat images).

However, when simulations of a physical system using tried-and-true scientific techniques are available, researchers get a model of its probability distribution for free. This distribution describes the measurement statistics of the physical system.

A more knowledgeable model

The MIT team’s insight is that this probability distribution also defines a generative model upon which a classifier can be constructed. They plug the generative model into standard statistical formulas to directly construct a classifier instead of learning it from samples, as was done with discriminative approaches.

“This is a really nice way of incorporating something you know about your physical system deep inside your machine-learning scheme. It goes far beyond just performing feature engineering on your data samples or simple inductive biases,” Schäfer says.

This generative classifier can determine what phase the system is in given some parameter, like temperature or pressure. And because the researchers directly approximate the probability distributions underlying measurements from the physical system, the classifier has system knowledge.

This enables their method to perform better than other machine-learning techniques. And because it can work automatically without the need for extensive training, their approach significantly enhances the computational efficiency of identifying phase transitions.

At the end of the day, similar to how one might ask ChatGPT to solve a math problem, the researchers can ask the generative classifier questions like “does this sample belong to phase I or phase II?” or “was this sample generated at high temperature or low temperature?”

Scientists could also use this approach to solve different binary classification tasks in physical systems, possibly to detect entanglement in quantum systems (Is the state entangled or not?) or determine whether theory A or B is best suited to solve a particular problem. They could also use this approach to better understand and improve large language models like ChatGPT by identifying how certain parameters should be tuned so the chatbot gives the best outputs.

In the future, the researchers also want to study theoretical guarantees regarding how many measurements they would need to effectively detect phase transitions and estimate the amount of computation that would require.

This work was funded, in part, by the Swiss National Science Foundation, the MIT-Switzerland Lockheed Martin Seed Fund, and MIT International Science and Technology Initiatives.

GRN Roundup, Technology

Animal brain inspired AI game changer for autonomous robots

May 15, 2024

Advertisements

A team of researchers has developed a drone that flies autonomously using neuromorphic image processing and control based on the workings of animal brains. Animal brains use less data and energy compared to current deep neural networks running on GPUs (graphic chips). Neuromorphic processors are therefore very suitable for small drones because they don’t need heavy and large hardware and batteries. The results are extraordinary: during flight the drone’s deep neural network processes data up to 64 times faster and consumes three times less energy than when running on a GPU. Further developments of this technology may enable the leap for drones to become as small, agile, and smart as flying insects or birds.

GRN Roundup, Technology

Robots’ and prosthetic hands’ sense of touch could be as fast as humans

May 15, 2024

Advertisements

Research could pave the way for a prosthetic hand and robot to be able to feel touch like a human hand. The technology could also be used to help restore lost functionality to patients after a stroke.

GRN Roundup, Technology

Research Focus: Week of May 13, 2024

May 15, 2024

Advertisements

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

NEW RESEARCH

Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning

Large language models (LLMs) have shown remarkable performance in generating text similar to that created by people, proving to be a valuable asset across various applications. However, adapting these models to incorporate new, out-of-domain knowledge remains a challenge, particularly for facts and events that occur after the model’s training knowledge cutoff date.

In a recent paper: Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning, researchers from Microsoft investigate the effectiveness of supervised fine-tuning (SFT) as a method for knowledge injection in LLMs, specifically focusing on recent sporting events. They compare different dataset generation strategies—token-based and fact-based scaling—to create training data that helps the model learn new information. Their experiments on GPT-4 demonstrate that while token-based scaling can lead to improvements in Q&A accuracy, it may not provide uniform coverage of new knowledge. Fact-based scaling, on the other hand, offers a more systematic approach to ensure even coverage across all facts. The researchers present a novel dataset generation process that leads to more effective knowledge ingestion through SFT, and results show considerable performance improvements in Q&A tasks related to out-of-domain knowledge.

Read the paper

NEW RESEARCH

A Reflection on Human-Notebook Experiences in the Era of AI

Computational notebooks provide an interactive way to work with data. They have been widely used by data professionals to write code, explore data, and generate visualizations, all in one document. Previous research has revealed unique pain points around the user experience in computational notebooks. However, as AI tools like ChatGPT or Copilot have emerged, it is unclear whether these pain points have been reduced or changed, or whether new pain points have arisen. Due to the fast pace of advances in AI technology, most of the development of new AI tools has been primarily driven by technology and not by user experience.

In a recent paper: A Reflection on Human-Notebook Experiences in the Era of AI, researchers from Microsoft summarize literature on how new AI technology has impacted human-notebook interaction and human-computer interaction (HCI) paradigms, new challenges and user behavior around using AI assistants, and recent research on AI assistants in computational notebook scenarios. They outline gaps in existing literature and suggest a future focus on improving macro human-notebook experiences throughout a user’s workflow, measuring and quantifying the value of AI systems, and establishing a set of standards and best practices for AI tools.

Read the paper

NEW RESEARCH

Jacdac: Service-Based Prototyping of Embedded Systems

The traditional approach to programming embedded systems is monolithic: firmware on a microcontroller contains both application code and the drivers needed to communicate with sensors and actuators, using low-level protocols such as I2C, SPI, and RS232. In comparison, software development for the cloud has moved to a service-based development and operation paradigm: a service provides a discrete unit of functionality that can be accessed remotely by an application, or other service, but is independently managed and updated.

In a recent paper: Jacdac: Service-Based Prototyping of Embedded Systems (opens in new tab), researchers from Microsoft propose, design, implement, and evaluate a service-based approach to prototyping embedded systems called Jacdac (opens in new tab). Jacdac defines a service specification language, designed especially for embedded systems, along with a host of specifications for a variety of sensors and actuators. With Jacdac, each sensor/actuator in a system is paired with a low-cost microcontroller that advertises the services that represent the functionality of the underlying hardware over an efficient and low-cost single-wire bus protocol. A separate microcontroller executes the user’s application program, which is a client of the Jacdac services on the bus.

Three Jacdac kits, comprising over twenty modules, have been produced by third-party manufacturers: KittenBot (opens in new tab) and Forward Education (opens in new tab).

Read the paper

NEW RESEARCH

PARIKSHA: A Scalable, Democratic, Transparent Evaluation Platform for Assessing Indic Large Language Models

Evaluation of multilingual LLMs is challenging due to a variety of factors – the lack of benchmarks with sufficient linguistic diversity, contamination of popular benchmarks into LLM pre-training data, and the lack of local, cultural nuances in translated benchmarks. Hence, it is difficult to extensively evaluate LLMs in a multilingual setting, leading to lack of fair comparisons between models and difficulties in replicating the evaluation setup used by some models. Recently, several Indic (Indian language) LLMs have been created to help build more locally and culturally relevant LLMs.

In a recent paper: PARIKSHA: A Scalable, Democratic, Transparent Evaluation Platform for Assessing Indic Large Language Models, researchers from Microsoft present an evaluation framework, which is the first comprehensive evaluation of Indic LLMs using a combination of human and LLM-based evaluation. The researchers conduct a total of 90,000 human evaluations and 50,000 LLM-based evaluations of 29 models to present leaderboards for 10 Indic languages. Pariksha provides inclusive evaluation by engaging a community of workers that represent India’s large and diverse workforce and also serves as a research platform for improving the process of evaluation. For transparency on the process, the evaluation artifacts will be released. Conducting Pariksha at regular intervals, the researchers aim to enable models to improve over time with insights and artifacts from their evaluations.

Read the paper

NEW RESEARCH

Tinker, Tailor, Configure, Customize: The Articulation Work of Customizing AI Fairness Checklists

Many responsible AI resources, such as toolkits, playbooks, and checklists, have been developed to support AI practitioners in identifying, measuring, and mitigating potential fairness-related harms. These resources are often designed to be general purpose, in order to address a variety of use cases, domains, and deployment contexts. However, this can lead to decontextualization, where such resources lack the level of relevance or specificity needed to use them.

To understand how AI practitioners might contextualize one such resource, an AI fairness checklist, for their particular use cases, domains, and deployment contexts, researchers from Microsoft conducted a retrospective contextual inquiry with 13 AI practitioners from seven organizations. In a recent paper: Tinker, Tailor, Configure, Customize: The Articulation Work of Customizing AI Fairness Checklists, they identify how contextualizing this checklist introduces new forms of work for AI practitioners and other stakeholders, while opening up new sites for negotiation and contestation of values in AI. The researchers also identify how the contextualization process may help AI practitioners develop a shared language around AI fairness. They also identify dynamics related to ownership over this process that suggest larger issues of accountability in responsible AI work.

Read the paper

NEW RESEARCH

MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels

LLMs are becoming indispensable tools for many creative and information related tasks, but they still come with limitations, including a tendency to fabricate content. State-of-the-art algorithms pair the LLM with an external, dynamically updated knowledge base to ground the LLM’s answers and provide up-to-date information. However, these techniques require large amounts of relevant, labeled training data that have not previously been publicly available.

In a recent paper: MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels presented at the 2024 ACM Web Conference, researchers from Microsoft introduce a novel dataset that closely mimics real-world web document and query distribution. MS MARCO Web Search contains 10 million unique queries across 93 languages with millions of relevant labeled query-document pairs. It uses ClueWeb22’s 10 billion high-quality web pages as the document corpus and provides rich information for various kinds of downstream tasks.

This dataset unlocks several new research directions that previous datasets cannot well support, including generic end-to-end neural indexer models, generic embedding models, and next generation information access systems with LLMs. MS MARCO Web Search offers a retrieval benchmark with three web scale retrieval challenge tasks, each with automatic evaluation and leaderboard. These tasks demand innovation in both machine learning and information retrieval systems. The researchers intend for MS MARCO Web Search to lay the groundwork for future advancements in AI and systems research.

View dataset

Read the paper

VIDEO

AI Case Studies for Natural Science Research with Bonnie Kruft

Among the stunning changes and disruptions driven by AI, one of the most significant is the impact on scientific discovery. In her presentation at EmTech Digital 2024 (opens in new tab), Bonnie Kruft, partner deputy director at Microsoft Research AI for Science, outlined some examples of how generative AI enables groundbreaking research in the natural sciences. Recent breakthroughs aided by AI include small molecular inhibitors for treating infectious disease, the discovery of new materials for energy storage, and new drug development.

Catch a replay of the presentation, including a follow-up Q&A with the audience, and hear how researchers are reducing discovery times from years to months. The discussion explores safe and responsible AI practices, how large language models can work with science-based models, and what lies ahead for AI in science.

Watch the video

Microsoft Research in the news

The tiny glass blocks that can preserve your data for centuries

The Times UK | April 27, 2024

Microsoft’s Project Silica is an innovative form of long-term storage – potentially revolutionizing how important data can be preserved for future generations.

These Recyclable Circuit Boards Could Stem E-Waste

IEEE Spectrum | May 2, 2024

New research from the University of Washington and Microsoft show that vitrimer-based PCBs can be broken down into a gel for repeated reuse. The research stems from the Microsoft Research Climate Initiative.

Today’s AI models are impressive. Teams of them will be formidable

The Economist | May 13, 2024

Teams of LLMs are more capable and intelligent than solitary agents because a single job can be split into many smaller, more specialized tasks, says Chi Wang, a principal researcher at Microsoft Research in Redmond, Washington.

You Only Cache Once: Decoder-Decoder Architectures for Language Models

Microsoft Research LinkedIn | May 11, 2024

YOCO is a novel decoder-decoder architecture for LLMs, enhancing memory efficiency by caching key-value pairs only once. It slashes KV cache memory and prefilling time and makes 1M-length LLMs practical.

Peter Lee discusses new technologies that will drive the future of drug discovery

AAPS | May 10, 2024

The president of Microsoft Research explores how new advances in technologies, such as AI and machine learning, are transforming biotechnology, in the closing plenary of the AAPS National Biotechnology Conference (NBC) on Thursday, May 16.

PKSHA develops advanced LLMs in collaboration with Microsoft Japan

Business Wire | April 29, 2024

PKSHA Technology has developed one of the first Japanese-English LLMs in collaboration with Microsoft Japan. This development primarily focuses on boosting productivity within contact centers and corporate help desks.

BRAID fellowships include three collaborations with Microsoft Research

Bridging Responsible AI Divides | May 2024

BRAID fellowships support individual researchers in partnership with public and private organizations to address challenges in the field of responsible AI. Among the latest fellowships are three supported by Microsoft Research.

View more news and awards

Opens in a new tab

The post Research Focus: Week of May 13, 2024 appeared first on Microsoft Research.

GRN Roundup, Technology

Learn how Amazon Ads created a generative AI-powered image generation capability using Amazon SageMaker

May 15, 2024

Advertisements

Amazon Ads helps advertisers and brands achieve their business goals by developing innovative solutions that reach millions of Amazon customers at every stage of their journey. At Amazon Ads, we believe that what makes advertising effective is delivering relevant ads in the right context and at the right moment within the consumer buying journey. With that goal, Amazon Ads has used artificial intelligence (AI), applied science, and analytics to help its customers drive desired business outcomes for nearly two decades.

In a March 2023 survey, Amazon Ads found that among advertisers who were unable to build successful campaigns, nearly 75 percent cited building the creative content as one of their biggest challenges. To help advertisers more seamlessly address this challenge, Amazon Ads rolled out an image generation capability that quickly and easily develops lifestyle imagery, which helps advertisers bring their brand stories to life. This blog post shares more about how generative AI solutions from Amazon Ads help brands create more visually rich consumer experiences.

In this blog post, we describe the architectural and operational details of how Amazon Ads implemented its generative AI-powered image creation solution on AWS. Before diving deeper into the solution, we start by highlighting the creative experience of an advertiser enabled by generative AI. Next, we present the solution architecture and process flows for machine learning (ML) model building, deployment, and inferencing. We end with lessons learned.

Advertiser creative experience

When building ad creative, advertisers prefer to customize the creative in a way that makes it relevant to their desired audiences. For example, an advertiser might have static images of their product against a white background. From an advertiser point of view, the process is handled in three steps:

Image generation converts product-only images into rich, contextually relevant images using generative AI. The approach preserves the original product features, requiring no technical expertise.
Anyone with access to the Amazon Ads console can create custom brand images without needing technical or design expertise.
Advertisers can create multiple contextually relevant and engaging product images with no additional cost.

A benefit of the image-generation solution is the automatic creation of relevant product images based on product selection only, with no additional input required from the advertisers. While there are options to enhance background imagery such as prompts, themes, and custom product images, they are not necessary to generate compelling creative. If advertisers do not supply this information, the model will infer it based on information from their product listing on amazon.com.

Figure 1. An example from the image generation solution showing a hydro flask with various backgrounds.

Solution overview

Figure 2 shows a simplified solution architecture for inferencing and model deployment. The steps for the model development and deployment are shown in blue circles and depicted by roman-numerals (i,ii, … iv.) whereas inferencing steps are in orange with Hindu-Arabic numbers (1,2,… 8.).

Figure 2. Solution architecture for inferencing and model deployment.

Amazon SageMaker is at the center of model development and deployment. The team used Amazon SageMaker JumpStart to rapidly prototype and iterate under their desired conditions (step i). Acting as a model hub, JumpStart provided a large selection of foundation models and the team quickly ran their benchmarks on candidate models. After selecting candidate large language models (LLMs), the science teams can proceed with the remaining steps by adding more customization. Amazon Ads applied scientists use SageMaker Studio as the web-based interface to work with SageMaker (step ii). SageMaker has the appropriate access policies to view some intermediary model results, which can be used for further experimentation (step iii).

The Amazon Ads team manually reviewed images at scale through a human-in-the-loop process where the team ensured that the application provides high quality and responsible images. To do that, the team deployed testing endpoints using SageMaker and generated a large number of images spanning various scenarios and conditions (step iv). Here, Amazon SageMaker Ground Truth allowed ML engineers to easily build the human-in-the-loop workflow (step v). The workflow allowed the Amazon Ads team to experiment with different foundation models and configurations through blind A/B testing to ensure that feedback to the generated images is unbiased. After the chosen model is ready to be moved into production, the model is deployed (step vi) using the team’s own in-house Model Lifecycle Manager tool. Under the hood, this tool uses artifacts generated by SageMaker (step vii) which is then deployed into the production AWS account (step viii), using SageMaker SDKs .

Regarding the inference, customers using Amazon Ads now have a new API to receive these generated images. The Amazon API Gateway receives the PUT request (step 1). The request is then processed by AWS Lambda, which uses AWS Step Functions to orchestrate the process (step 2). The product image is fetched from an image repository, which is a part of an existing solution predating this creative feature. The next step is to process customer text prompts and customize the image through content ingestion guardrails. Amazon Comprehend is used to detect undesired context in the text prompt, whereas Amazon Rekognition processes images for content moderation purposes (step 3). If the inputs pass the inspection, then the text continues as a prompt, while the image is processed by removing the background (step 4). Then, the deployed text-to-image model is used for image generation using the prompt and the processed image (step 5). The image is then uploaded into an Amazon Simple Storage Services (Amazon S3) bucket for images and the metadata about the image is stored in an Amazon DynamoDB table (step 6). This whole process starting from step 2 is orchestrated by AWS Step Functions. Finally, the Lambda function receives the image and meta-data (step 7) which are then sent to the Amazon Ads client service through the API Gateway (step 8).

Conclusion

This post presented the technical solution for the Amazon Ads generative AI-powered image generation solution, which advertisers can use to create customized brand images without needing a dedicated design team. Advertisers have a series of features to generate and customize images such as writing text prompts, selecting different themes, swapping the featured product, or uploading a new image of the product from their device or asset library allowing them to create impactful images for advertising their products.

The architecture uses modular microservices with separate components for model development, registry, model lifecycle management (which is an orchestration and step function-based solution to process advertiser inputs), select the appropriate model, and track the job throughout the service, and a customer facing API. Here, Amazon SageMaker is at the center of the solution, starting from JumpStart to final SageMaker deployment.

If you plan to build your generative AI application on Amazon SageMaker, the fastest way is with SageMaker JumpStart. Watch this presentation to learn how you can start your project with JumpStart.

About the Authors

Anita Lacea is the Single-Threaded Leader of generative AI image ads at Amazon, enabling advertisers to create visually stunning ads with the click of a button. Anita pairs her broad expertise across the hardware and software industry with the latest innovations in generative AI to develop performant and cost-optimized solutions for her customers, revolutionizing the way businesses connect with their audiences. She is passionate about traditional visual arts and is an exhibiting printmaker.

Burak Gozluklu is a Principal AI/ML Specialist Solutions Architect located in Boston, MA. He helps strategic customers adopt AWS technologies and specifically Generative AI solutions to achieve their business objectives. Burak has a PhD in Aerospace Engineering from METU, an MS in Systems Engineering, and a post-doc in system dynamics from MIT in Cambridge, MA. Burak is still a research affiliate in MIT. Burak is passionate about yoga and meditation.

Christopher de Beer is a senior software development engineer at Amazon located in Edinburgh, UK. With a background in visual design. He works on creative building products for advertising, focusing on video generation, helping advertisers to reach their customers through visual communication. Building products that automate creative production, using traditional as well as generative techniques, to reduce friction and delight customers. Outside of his work as an engineer Christopher is passionate about Human-Computer Interaction (HCI) and interface design.

Yashal Shakti Kanungo is an Applied Scientist III at Amazon Ads. His focus is on generative foundational models that take a variety of user inputs and generate text, images, and videos. It’s a blend of research and applied science, constantly pushing the boundaries of what’s possible in generative AI. Over the years, he has researched and deployed a variety of these models in production across the online advertising spectrum ranging from ad sourcing, click-prediction, headline generation, image generation, and more.

Sravan Sripada is a Senior Applied Scientist at Amazon located in Seattle, WA. His primary focus lies in developing generative AI models that enable advertisers to create engaging ad creatives (images, video, etc.) with minimal effort. Previously, he worked on utilizing machine learning for preventing fraud and abuse on the Amazon store platform. When not at work, He is passionate about engaging in outdoor activities and dedicating time to meditation.

Cathy Willcock is a Principal Technical Business Development Manager located in Seattle, WA. Cathy leads the AWS technical account team supporting Amazon Ads adoption of AWS cloud technologies. Her team works across Amazon Ads enabling discovery, testing, design, analysis, and deployments of AWS services at scale, with a particular focus on innovation to shape the landscape across the AdTech and MarTech industry. Cathy has led engineering, product, and marketing teams and is an inventor of ground-to-air calling (1-800-RINGSKY).

GRN Roundup, Technology

Accelerate NLP inference with ONNX Runtime on AWS Graviton processors

May 15, 2024

Advertisements

ONNX is an open source machine learning (ML) framework that provides interoperability across a wide range of frameworks, operating systems, and hardware platforms. ONNX Runtime is the runtime engine used for model inference and training with ONNX.

AWS Graviton3 processors are optimized for ML workloads, including support for bfloat16, Scalable Vector Extension (SVE), and Matrix Multiplication (MMLA) instructions. Bfloat16 accelerated SGEMM kernels and int8 MMLA accelerated Quantized GEMM (QGEMM) kernels in ONNX have improved inference performance by up to 65% for fp32 inference and up to 30% for int8 quantized inference for several natural language processing (NLP) models on AWS Graviton3-based Amazon Elastic Compute Cloud (Amazon EC2) instances. Starting version v1.17.0, the ONNX Runtime supports these optimized kernels.

In this post, we show how to run ONNX Runtime inference on AWS Graviton3-based EC2 instances and how to configure them to use optimized GEMM kernels. We also demonstrate the resulting speedup through benchmarking.

Optimized GEMM kernels

ONNX Runtime supports the Microsoft Linear Algebra Subroutine (MLAS) backend as the default Execution Provider (EP) for deep learning operators. AWS Graviton3-based EC2 instances (c7g, m7g, r7g, c7gn, and Hpc7g instances) support bfloat16 format and MMLA instructions for the deep learning operator acceleration. These instructions improve the SIMD hardware utilization and reduce the end-to-end inference latency by up to 1.65 times compared to the armv8 DOT product instruction-based kernels.

The AWS team implemented MLAS kernels for bfloat16 fast math and int8 quantized General Matrix Multiply (GEMM) using BFMMLA, SMMLA, and UMMLA instructions, which have higher matrix multiplication throughput compared to DOT instructions. The bfloat16 support allows efficient deployment of models trained using bfloat16, fp32, and automatic mixed precision (AMP) without the need for quantization. As shown in the following diagrams, the optimized GEMM kernels are integrated into the ONNX Runtime CPU EP as MLAS kernels.

The first figure illustrates the ONNX software stack, highlighting (in orange) the components optimized for inference performance improvement on the AWS Graviton3 platform.

The following diagram illustrates the ONNX Runtime EP flow, highlighting (in orange) the components optimized for inference performance improvement on the AWS Graviton3 platform.

Enable the optimizations

The optimizations are part of the ONNX Runtime 1.17.0 release, and are available starting with onnxruntime-1.17.0 python wheels and conda-1.17.0 packages. Optimized int8 kernels are enabled by default, and will be picked up automatically for AWS Graviton3 Processors. Bfloat16 fast math kernels, on the other hand, are not enabled by default and need the following session options in ONNX Runtime to enable them:

# For C++ applications

SessionOptions so;
so.config_options.AddConfigEntry( kOrtSessionOptionsMlasGemmFastMathArm64Bfloat16, “1”);

# For Python applications

sess_options = onnxruntime.SessionOptions()
sess_options.add_session_config_entry(“mlas.enable_gemm_fastmath_arm64_bfloat16”, “1”)

Benchmark results

We started with measuring the inference throughput, in queries per second, for the fp32 model without any of our optimizations (using ONNX Runtime 1.16.0), which is marked at 1.0 with the red dotted line in the following graph. Then we compared the improvements from bfloat16 fast math kernels from ONNX Runtime 1.17.1 for the same fp32 model inference. The normalized results are plotted in the graph. You can see that for the BERT, RoBERTa, and GPT2 models, the throughput improvement is up to 65%. Similar improvements are observed for the inference latency.

Similar to the preceding fp32 inference comparison graph, we started with measuring the inference throughput, in queries per second, for the int8 quantized model without any of our optimizations (using ONNX Runtime 1.16.0), which is marked at 1.0 with the red dotted line in the following graph. Then we compared the improvements from the optimized MMLA kernels from ONNX Runtime 1.17.1 for the same model inference. The normalized results are plotted in the graph. You can see that for the BERT, RoBERTa, and GPT2 models, the throughput improvement is up to 30%. Similar improvements are observed for the inference latency.

Benchmark setup

We used an AWS Graviton3-based c7g.4xl EC2 instance with Ubuntu 22.04 based AMI to demonstrate the performance improvements with the optimized GEMM kernels from ONNX Runtime. The instance and the AMI details are mentioned in the following snippet:

Instance: c7g.4xl instance
Region: us-west-2
AMI: ami-0a24e6e101933d294 (Ubuntu 22.04/Jammy with 6.5.0-1014-aws kernel)

The ONNX Runtime repo provides inference benchmarking scripts for transformers-based language models. The scripts support a wide range of models, frameworks, and formats. We picked PyTorch-based BERT, RoBERTa, and GPT models to cover the common language tasks like text classification, sentiment analysis, and predicting the masked word. The models cover both encoder and decoder transformers architecture.

The following code lists the steps to run inference for the fp32 model with bfloat16 fast math mode and int8 quantized mode using the ONNX Runtime benchmarking script. The script downloads the models, exports them to ONNX format, quantizes them into int8 for int8 inference, and runs inference for different sequence lengths and batch sizes. Upon successful completion of the script, it will print the inference throughput in queries/sec (QPS) and latency in msec along with the system configuration. Refer to the ONNX Runtime Benchmarking script for more details.

# Install Python
sudo apt-get update
sudo apt-get install -y python3 python3-pip

# Upgrade pip3 to the latest version
python3 -m pip install –upgrade pip

# Install onnx and onnx runtime
# NOTE: We used 1.17.1 instead of 1.17.0 as it was the latest
# version available while collecting data for this post
python3 -m pip install onnx==1.15.0 onnxruntime==1.17.1

# Install the dependencies
python3 -m pip install transformers==4.38.1 torch==2.2.1 psutil==5.9.8

# Clone onnxruntime repo to get the benchmarking scripts
git clone –recursive https://github.com/microsoft/onnxruntime.git
cd onnxruntime
git checkout 430a086f22684ad0020819dc3e7712f36fe9f016
cd onnxruntime/python/tools/transformers

# To run bert-large fp32 inference with bfloat16 fast math mode
python3 benchmark.py -m bert-large-uncased -p fp32 –enable_arm64_bfloat16_fastmath_mlas_gemm

# To run bert-base fp32 inference with bfloat16 fast math mode
python3 benchmark.py -m bert-base-cased -p fp32 –enable_arm64_bfloat16_fastmath_mlas_gemm

# To run roberta-base fp32 inference with bfloat16 fast math mode
python3 benchmark.py -m roberta-base -p fp32 –enable_arm64_bfloat16_fastmath_mlas_gemm

# To run gpt2 fp32 inference with bfloat16 fast math mode
python3 benchmark.py -m gpt2 -p fp32 –enable_arm64_bfloat16_fastmath_mlas_gemm

# To run bert-large int8 quantized inference
python3 benchmark.py -m bert-large-uncased -p int8

# To run bert-base int8 quantized inference
python3 benchmark.py -m bert-base-cased -p int8

# To run roberta-base int8 quantized inference
python3 benchmark.py -m roberta-base -p int8

# To run gpt2 int8 quantized inference
python3 benchmark.py -m gpt2 -p int8

Conclusion

In this post, we discussed how to run ONNX Runtime inference on an AWS Graviton3-based EC2 instance and how to configure the instance to use optimized GEMM kernels. We also demonstrated the resulting speedups. We hope that you will give it a try!

If you find use cases where similar performance gains are not observed on AWS Graviton, please open an issue on the AWS Graviton Technical Guide GitHub to let us know about it.

About the Author

Sunita Nadampalli is a Software Development Manager at AWS. She leads Graviton software performance optimizations for Machine Learning and HPC workloads. She is passionate about open source software development and delivering high-performance and sustainable software solutions with Arm SoCs.

GRN Roundup, Technology

Microsoft at CHI 2024: Innovations in human-centered design

May 15, 2024

Advertisements

The ways people engage with technology, through its design and functionality, determine its utility and acceptance in everyday use, setting the stage for widespread adoption. When computing tools and services respect the diversity of people’s experiences and abilities, technology is not only functional but also universally accessible. Human-computer interaction (HCI) plays a crucial role in this process, examining how technology integrates into our daily lives and exploring ways digital tools can be shaped to meet individual needs and enhance our interactions with the world.

The ACM CHI Conference on Human Factors in Computing Systems is a premier forum that brings together researchers and experts in the field, and Microsoft is honored to support CHI 2024 as a returning sponsor. We’re pleased to announce that 33 papers by Microsoft researchers and their collaborators have been accepted this year, with four winning the Best Paper Award and seven receiving honorable mentions.

This research aims to redefine how people work, collaborate, and play using technology, with a focus on design innovation to create more personalized, engaging, and effective interactions. Several projects emphasize customizing the user experience to better meet individual needs, such as exploring the potential of large language models (LLMs) to help reduce procrastination. Others investigate ways to boost realism in virtual and mixed reality environments, using touch to create a more immersive experience. There are also studies that address the challenges of understanding how people interact with technology. These include applying psychology and cognitive science to examine the use of generative AI and social media, with the goal of using the insights to guide future research and design directions. This post highlights these projects.

Best Paper Award recipients

DynaVis: Dynamically Synthesized UI Widgets for Visualization Editing
Priyan Vaithilingam, Elena L. Glassman, Jeevana Priya Inala, Chenglong Wang
GUIs used for editing visualizations can overwhelm users or limit their interactions. To address this, the authors introduce DynaVis, which combines natural language interfaces with dynamically synthesized UI widgets, enabling people to initiate and refine edits using natural language.

Generative Echo Chamber? Effects of LLM-Powered Search Systems on Diverse Information Seeking
Nikhil Sharma, Q. Vera Liao, Ziang Xiao
Conversational search systems powered by LLMs potentially improve on traditional search methods, yet their influence on increasing selective exposure and fostering echo chambers remains underexplored. This research suggests that LLM-driven conversational search may enhance biased information querying, particularly when the LLM’s outputs reinforce user views, emphasizing significant implications for the development and regulation of these technologies.

Piet: Facilitating Color Authoring for Motion Graphics Video
Xinyu Shi, Yinghou Wang, Yun Wang, Jian Zhao
Motion graphic (MG) videos use animated visuals and color to effectively communicate complex ideas, yet existing color authoring tools are lacking. This work introduces Piet, a tool prototype that offers an interactive palette and support for quick theme changes and controlled focus, significantly streamlining the color design process.

The Metacognitive Demands and Opportunities of Generative AI
Lev Tankelevitch, Viktor Kewenig, Auste Simkute, Ava Elizabeth Scott, Advait Sarkar, Abigail Sellen, Sean Rintel
Generative AI systems offer unprecedented opportunities for transforming professional and personal work, yet they present challenges around prompting, evaluating and relying on outputs, and optimizing workflows. This paper shows that metacognition—the psychological ability to monitor and control one’s thoughts and behavior—offers a valuable lens through which to understand and design for these usability challenges.

Honorable Mentions

Big or Small, It’s All in Your Head: Visuo-Haptic Illusion of Size-Change Using Finger-Repositioning
Myung Jin Kim, Eyal Ofek, Michel Pahud, Mike J. Sinclair, Andrea Bianchi
This research introduces a fixed-sized VR controller that uses finger repositioning to create a visuo-haptic illusion of dynamic size changes in handheld virtual objects, allowing users to perceive virtual objects as significantly smaller or larger than the actual device.

LLMR: Real-time Prompting of Interactive Worlds Using Large Language Models
Fernanda De La Torre, Cathy Mengying Fang, Han Huang, Andrzej Banburski-Fahey, Judith Amores, Jaron Lanier
Large Language Model for Mixed Reality (LLMR) is a framework for the real-time creation and modification of interactive mixed reality experiences using LLMs. It uses novel strategies to tackle difficult cases where ideal training data is scarce or where the design goal requires the synthesis of internal dynamics, intuitive analysis, or advanced interactivity.

Observer Effect in Social Media Use
Koustuv Saha, Pranshu Gupta, Gloria Mark, Emre Kiciman, Munmun De Choudhury
This work investigates the observer effect in behavioral assessments on social media use. The observer effect is a phenomenon in which individuals alter their behavior due to awareness of being monitored. Conducted over an average of 82 months (about 7 years) retrospectively and five months prospectively using Facebook data, the study found that deviations in expected behavior and language post-enrollment in the study reflected individual psychological traits. The authors recommend ways to mitigate the observer effect in these scenarios.

Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming
Hussein Mozannar, Gagan Bansal, Adam Fourney, Eric Horvitz
By investigating how developers use GitHub Copilot, the authors created CUPS, a taxonomy of programmer activities during system interaction. This approach not only elucidates interaction patterns and inefficiencies but can also drive more effective metrics and UI design for code-recommendation systems with the goal of improving programmer productivity.

SharedNeRF: Leveraging Photorealistic and View-dependent Rendering for Real-time and Remote Collaboration
Mose Sakashita, Bala Kumaravel, Nicolai Marquardt, Andrew D. Wilson
SharedNeRF, a system for synchronous remote collaboration, utilizes neural radiance field (NeRF) technology to provide photorealistic, viewpoint-specific renderings that are seamlessly integrated with point clouds to capture dynamic movements and changes in a shared space. A preliminary study demonstrated its effectiveness, as participants used this high-fidelity, multi-perspective visualization to successfully complete a flower arrangement task.

Understanding the Role of Large Language Models in Personalizing and Scaffolding Strategies to Combat Academic Procrastination
Ananya Bhattacharjee, Yuchen Zeng, Sarah Yi Xu, Dana Kulzhabayeva, Minyi Ma, Rachel Kornfield, Syed Ishtiaque Ahmed, Alex Mariakakis, Mary P. Czerwinski, Anastasia Kuzminykh, Michael Liut, Joseph Jay Williams
In this study, the authors explore the potential of LLMs for customizing academic procrastination interventions, employing a technology probe to generate personalized advice. Their findings emphasize the need for LLMs to offer structured, deadline-oriented advice and adaptive questioning techniques, providing key design insights for LLM-based tools while highlighting cautions against their use for therapeutic guidance.

Where Are We So Far? Understanding Data Storytelling Tools from the Perspective of Human-AI Collaboration
Haotian Li, Yun Wang, Huamin Qu
This paper evaluates data storytelling tools using a dual framework to analyze the stages of the storytelling workflow—analysis, planning, implementation, communication—and the roles of humans and AI in each stage, such as creators, assistants, optimizers, and reviewers. The study identifies common collaboration patterns in existing tools, summarizes lessons from these patterns, and highlights future research opportunities for human-AI collaboration in data storytelling.

Learn more about our work and contributions to CHI 2024, including our full list of publications, on our conference webpage.

Opens in a new tab

The post Microsoft at CHI 2024: Innovations in human-centered design appeared first on Microsoft Research.

GRN Roundup, Technology

Build a serverless exam generator application from your own lecture content using Amazon Bedrock

May 15, 2024

Advertisements

Crafting new questions for exams and quizzes can be tedious and time-consuming for educators. The time required varies based on factors like subject matter, question types, experience level, and class level. Multiple-choice questions require substantial time to generate quality distractors and ensure a single unambiguous answer, and composing effective true-false questions demands careful effort to avoid vagueness and assess deeper understanding. Creating high-quality assessment questions of any format necessitates meticulous attention to detail from educators in order to produce fair and valid student evaluations. To streamline this cumbersome process, we propose an automated exam generation solution based on Amazon Bedrock.

In this post, we explore how to build an application that generates tests tailored to your own lecture content. We cover the technical implementation using the Anthropic Claude large language model (LLM) on Amazon Bedrock and AWS Lambda deployed with the AWS Serverless Application Model (AWS SAM). This solution enables educators to instantly create curriculum-aligned assessments with minimal effort. Students can take personalized quizzes and get immediate feedback on their performance. This solution simplifies the exam creation process while benefiting both teachers and learners.

Amazon Bedrock

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon using a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. In this post, we focus on a text generation use case, and can choose from Amazon Titan Text G1 and other models on Amazon Bedrock, including Anthropic Claude, AI21 Labs Jurassic, Meta Llama 2, and Cohere Command.

With the ability to scale up to 200,000-token context windows, Anthropic Claude v2.1 on Amazon Bedrock is our preferred choice for this post. It is typically helpful when working with lengthy documents such as entire books. When we talk about tokens, we refer to the smallest individual “atoms” of a language model, and can varyingly correspond to words, subwords, characters, or even bytes (in the case of Unicode). For Anthropic Claude on Amazon Bedrock, the average token is about 3.5 English characters. The 200,000 tokens supported by Anthropic Claude v2.1 on Amazon Bedrock would be equivalent to roughly 150,000 words or over 500 pages of documents.

This post demonstrates how to use advanced prompt engineering to control an LLM’s behavior and responses. It shows how to randomly generate questions and answers from lecture files, implemented as a simple serverless application.

Solution overview

The following diagram illustrates the application architecture. We distinguish two paths: the educator path (1) and the learner path (2).

As first-time users, both educator and learner need to complete the sign-up process, which is done by two separate Amazon Cognito user pools. For the educator, when the sign-up is complete, Amazon Cognito invokes the Lambda function called CognitoPostSignupFn to subscribe the educator to an Amazon Simple Notification Service (Amazon SNS) topic. The educator must approve the subscription to this topic in order to be notified by email with the scorecard of each learner who will be taking the generated exam.

Figure 1: Architectural diagram of the exam generator application

The workflow includes the following steps:

The educator opens the landing page for generating an exam under the domain gen-exam.<your-domain-name> through Amazon Route 53, which redirects the request to the Application Load Balancer (ALB).

1.1 The ALB communicates with Amazon Cognito to authenticate the educator on the educator user pool.

1.2 The educator uploads a lecture as a PDF file into the exam generation front-end.

1.3 The Amazon Elastic Container Service (Amazon ECS) container running on AWS Fargate uploads the file to Amazon Simple Storage Service (Amazon S3) in the Examgen bucket under the prefix exams.

1.4 The S3 bucket is configured using event notification. Whenever a new file is uploaded, a PutObject is activated to send the file to the ExamGenFn Lambda function.

1.5 The Lambda function ExamGenFn invokes the Anthropic Claude v2.1 model on Amazon Bedrock to generate exam questions and answers as a JSON file.

1.6 The Amazon Bedrock API returns the output Q&A JSON file to the Lambda function.

1.7 The ExamGenFn Lambda function saves the output file to the same S3 bucket under the prefix Questions-bank. (You can choose to save it to a different S3 bucket.)

1.8 The ExamGenFn Lambda function sends an email notification to the educator through the SNS topic to notify that the exam has been generated.

The learner opens the landing page to take the exam under the domain take-exam.<your-domain-name> through Route 53, which redirects the request to the ALB.

2.1 The ALB communicates with Amazon Cognito to authenticate the learner on the learner user pool.

2.2 The learner accesses the frontend and selects a test to take.

2.3 The container image sends the REST API request to Amazon API Gateway (using the GET method).

2.4 API Gateway communicates with the TakeExamFn Lambda function as a proxy.

2.5 The Lambda TakeExamFn function retrieves from S3 bucket under the prefix Questions-bank the available exam in JSON format.

2.6 The JSON file is returned to API Gateway.

2.7 API Gateway transmits the JSON file to the ECS container in the front-end.

2.8 The container presents the exam as a UI using the Streamlit framework. The learner then takes the exams. When the learner is finished and submits their answers, the ECS container performs a comparison between the answers provided and the correct answers, and then shows the score results to the learner.

2.9 The ECS container stores the scorecard in an Amazon DynamoDB table.

2.10 The Lambda DynamoDBTriggerFn function detects the new scorecard record on the DynamoDB table and sends an email notification to the educator with the learner’s scorecard.

This is an event-driven architecture made up of individual AWS services that are loosely integrated with each other, with each service handling a specific function. It uses AWS serverless technologies, allowing you build and run your application without having to manage your own servers. All server management is done by AWS, providing many benefits such as automatic scaling and built-in high availability, letting you take your idea to production quickly.

Prerequisites

In this section, we go through the prerequisite steps to complete before you can set up this solution.

Enable model access through Amazon Bedrock

You can add access to a model from the Amazon Bedrock console. For this walkthrough, you need to request access to the Anthropic Claude model on Amazon Bedrock. For more information, see Model access.

Install the necessary packages

You need to install the following:

The AWS Command Line Interface (AWS CLI), an open source tool that enables you to interact with AWS services using commands in your command line shell. For instructions, see Install or update to the latest version of the AWS CLI.
The AWS SAM CLI, which is your toolkit for building and running your serverless application on AWS.
Streamlit, an open source Python framework for building the front-end.
The Docker engine.
Python.
Git.

Register a DNS domain and create certificates

If you don’t already have a DNS domain registered, you need to create one in order to not expose the DNS of your ALB. For instructions, refer to Registering a new domain.

You also need to request two public certificates, one for each front-end: gen-exam.<your-domain-name> and take-exam.<your-domain-name>. Refer to Requesting a public certificate to request a public certificate on AWS Certificate Manager.

Save the values for genCertificateArn and takeCertificateArn.

If you want to build the app in a development environment without using your own domain, you can uncomment the following section in the sam template:

# un-comment if you need to test with HTTP traffic and no certifcate
# ExamGenALBHTTPListener:
# Type: AWS::ElasticLoadBalancingV2::Listener
# Properties:
# LoadBalancerArn: !Ref ExamGenALB
# Protocol: HTTP
# Port: 80
# DefaultActions:
# – Type: forward
# TargetGroupArn: !Ref ExamGenTG

Chain-of-Thought (CoT) Prompting

Before we embark on constructing the app, let’s delve into prompt engineering. We use Chain-of-Thought (CoT) Prompting, which allows the model to break down complex reasoning into smaller, more manageable steps. By providing the AI with intermediate prompts that guide its reasoning process step by step, CoT prompting enables the model to tackle sophisticated reasoning tasks. Guiding the AI through an analytical chain of thought in this way allows it to develop complex reasoning capabilities that would otherwise be beyond its unaided abilities.

In the ExamGenFn Lambda function, we use the following prompt to guide the model through reasoning steps. You can change the prompt and give it different personas and instructions, and see how it behaves.

template_instruction = f”””Human:
You are a teacher during examination time and you are responsible for creating exam questions from the student study book.
Before creating the questions
– Analyze the book found between <exam_book> </exam_book> tags, to identify distinct chapters, sections, or themes for question generation.
– For true/false questions, select statements that can be clearly identified as true or false based on the book’s content.
– For MCQs, develop questions that challenge the understanding of the material, ensuring one correct answer and {n_mcq_options-1} distractors that are relevant but incorrect.
– Randomize the selection of pages or topics for each run to generate a new set of questions, ensuring no two sets are identical.
Please provide the questions in this format exactly for MCQ:
– The output should be like
“question”: “What is the colour of the car in the book?”,
“options”: [“Blue”, “Green”, “Yellow”, “Grey”],
“correct_answer”: “Yellow”
For True/False:
– the output should be like
“question”: “is the sky Blue?”,
“options”: [“True”, “False”],
“correct_answer”: “True”

Generate {n_tfq} true/false and {n_mcq} multiple-choice questions (MCQs) ensuring each question pertains to different pages or topics within the book. For MCQs, provide [n_mcq_options] options for each question. Focus on creating unique questions that cover a broad spectrum of the book’s content, avoiding repetition and ensuring a diverse examination of the material. Use the following guidelines:

1. True/False Questions:
– Craft each true/false question based on factual statements or key concepts from the book.
– Ensure each question spans a wide range of topics to cover the book comprehensively.

2. Multiple-Choice Questions (MCQs):
– Formulate each MCQ to assess understanding of significant themes, events, or facts.
– Include {n_mcq_options} options per MCQ, making sure one is correct and the others are plausible but incorrect.
– Diversify the content areas and pages/topics for each MCQ to avoid overlap and repetition.
“””

Build the exam generator application

The application presented in this post is available in the following GitHub repo with the building blocks code. Let’s start with a git pull on the repo.

We recommend using temporary credentials with the AWS CLI to make programmatic requests for AWS resources using the AWS CLI.

Build the front-end using Streamlit and Docker

You build two containers, one for generating exams and one for taking exams. Let’s start with building the generating exam Docker image:

Go to the following path in the repo and build your Docker image:

user@exam-gen ~ % cd exam-gen-ai-blog/frontend/generate-exam-fe

user@exam-gen generate-exam-fe % docker build -t <your-image-name>:tag .

Authenticate the Docker CLI to Amazon Elastic Container Registry (Amazon ECR):

aws ecr get-login-password –region <your-region> | docker login –username AWS –password-stdin <your-account-id>.dkr.ecr.<your-region>.amazonaws.com

Create a new repository in Amazon ECR:

aws ecr create-repository –repository-name <your-repository-name>

Tag your Docker image with the ECR repository URI:

docker tag <your-image-name>:tag your-account-id.dkr.ecr.<your-region>.amazonaws.com/<your-ecr-repository>:tag

Push your tagged Docker image to your ECR repository:

docker push <your-account-id>.dkr.ecr.<your-region>.amazonaws.com/<your-ecr-repository>:tag

Navigate to this path in the repo to build your Docker image for taking the exam:

user@exam-gen ~ % cd exam-gen-ai-blog/frontend/take-exam-fe

Because the authentication and the ECR repo are already done, run directly the following command:

user@exam-gen take-exam-fe % docker build -t <your-image-name>:tag .

docker tag <your-image-name>:tag your-account-id.dkr.ecr.<your-region>.amazonaws.com/<your-ecr-repository>:tag

docker push <your-account-id>.dkr.ecr.<your-region>.amazonaws.com/<your-ecr-repository>:tag

Copy the values for GenExamImageUri and TakeExamImageUri.

Now that you have both containers ready to run, let’s build the rest of the components using AWS SAM.

Build solution components with AWS SAM

AWS SAM consists of two parts:

AWS SAM template specification – An open source framework that you can use to define your serverless application infrastructure on AWS
AWS SAM CLI – A command line tool that you can use with AWS SAM templates and supported third-party integrations to build and run your serverless applications

For further information, refer to Using the AWS Serverless Application Model (AWS SAM).

Go to the home directory user@exam-gen ~ % cd exam-gen-ai-blog and run the sam build command.

Before you run sam deploy, be aware of the following:

The ECS containers are deployed on Fargate, which needs a VPC with two subnets in different Availability Zones. We use the default VPC for simplicity. You can create your own VPC or use an existing one in your AWS account and update the sam template. To list your VPC IDs and subnets within a selected VPC ID, run the following commands to extract your VpcId and your two SubnetId:

aws ec2 describe-vpcs
aws ec2 describe-subnets

GenExamCallbackURL (for generating exam) and TakeExamCallbackURL (for taking exam) are used by Amazon Cognito. They are URLs where the user is redirected to after a successful sign-in.

Now let’s deploy the sam template:

sam deploy –stack-name <your-stack-name> –guided
–parameter-overrides
DefaultVPCID=”your-default-vpc-id”
SubnetIdOne=”your-subnet-one-id”
SubnetIdTwo=”your-subnet-two-id”
genCertificateArn=”arn:aws:acm:<your-region>:<your-account-id>:certificate/<your-certificate-id>”
takeCertificateArn=”arn:aws:acm:<your-region>:<your-account-id>:certificate/<your-certificate-id>”
GenExamImageUri=”<your-gen-image-uri>”
TakeExamImageUri=”<your-take-image-uri>”
GenExamCallbackURL=”gen-exam.<your-domain-name>”
TakeExamCallbackURL=”take-exam.<your-domain-name>”
NotificationEmail=”your-email-address@example.com”
–capabilities CAPABILITY_NAMED_IAM

#Shows you resources changes to be deployed and require a ‘Y’ to initiate deploy
Confirm changes before deploy [Y/n]: n
#SAM needs permission to be able to create roles to connect to the resources in your template
Allow SAM CLI IAM role creation [Y/n]: y
#Preserves the state of previously provisioned resources when an operation fails
Disable rollback [Y/n]: n
Save arguments to configuration file [Y/n]: n

Looking for resources needed for deployment:
Creating the required resources…

Successfully created!

You can follow the creation on the AWS CloudFormation console.

This following video demonstrates running the sam build and sam deploy commands.

Figure 2: SAM build and SAM deploy execution

The final step is to get the DNS names for the deployed ALB, map them to the certificate domains names in Route 53, and add them as a CNAME record.

Test the solution

You can use your browser to test the solution.

Navigate to gen-exam.<your-domain-name>.

You’ll receive an email with a confirmation code.

Enter the verification code and choose Confirm account.

Once verified, you will land on a page to generate your quiz.

Choose the amount of multiple choice and true/false questions you want to generate, then choose Browse files to upload an input file.

For this example, we use the whitepaper AWS Cloud Adoption Framework: Security Perspective as our input file. We generate four multiple-choice questions and one true/false question.

Confirm your subscription to the SNS topic (you’ll receive an email).

Then you’ll receive an email confirming the exam has been generated.

Switch to take-exam.<your-domain-name>, and you’ll find the exam on the dropdown menu.

Choose the exam, then choose Load quiz.

Then you can take the exam and choose Submit to display the results.

The educator will receive an email with the scorecard of the learner.

You have just built a simple application that randomly generates questions and answers from uploaded documents. Learners can take the generated exams and educators can receive scorecards via email when tests are complete. The integration with the DynamoDB table allows you to store the responses on a long-term basis.

Expanding the solution

There are many possibilities to build on top of this and create a fully featured learning and testing application. One area of expansion is uploading multiple documents at once. As of this writing, users can only upload one document at a time, but support for bulk uploads would improve efficiency and make it easier to work with large sets of source materials. Educators could be empowered to gather and upload content from various documents and websites as source material for questions. This provides greater flexibility compared to using a single document. Moreover, with a data store, they could view and analyze learner answers via a scorecard interface to track progress over time.

Clean up

It’s important to clean up your resources in the following order:

On the Amazon S3 console, empty the bucket by deleting any files and folders.

On the AWS CloudFormation console, delete the stack.

Conclusion

In this post, we showed how to build a generative AI application powered by Amazon Bedrock that creates exam questions using lecture documents as input to support educators with an automated tool to continuously modernize quiz material and improve learners’ skills. Learners will be able to take the freshly generated exam and get the score results. With the capabilities of Amazon Bedrock and the AWS SAM, you can increase educators’ productivity and foster student success.

For more information on working with generative AI on AWS for education use cases, refer to Generative AI in education: Building AI solutions using course lecture content.

About the Authors

Merieme Ezzaouia is a Solutions Architect at AWS dedicated to the public sector. She helps customers in education and sports turn their concepts into tangible solutions, develop new services, and foster innovation. Beyond work, Merieme’s passions include gardening, traveling the world, and reading.

Mohammed Reda is a Solutions Architect at Amazon Web Services. He helps UK schools, universities, and EdTech companies adopt cloud technologies, improve their educational offerings, and innovate on AWS. Outside of work, Mohammed enjoys running and watching cooking shows.

GRN Roundup, Technology

Energy and Utility Companies are Ready for AI — Let’s Explore the Benefits.

May 15, 2024

Advertisements

Energy and Utility Companies are Ready for AI — Let’s Explore the Benefits.

Explore byteLAKE’s Data Insights: Fueling Efficiency, Sustainability, and Cost Reductions in Energy and Utility Sectors through AI Integration.

According to a study in the article, Utilities say they’re ready for AI. Where should they start? (power-grid.com, close to 75% of energy and utility companies have either adopted AI or are actively considering its integration into their operations. It’s no surprise to me, given that Artificial Intelligence (AI) plays a crucial role in transforming energy utilities and powering smart cities. Here are some ways AI is making an impact:

Smart City Initiatives: AI helps optimize utility operations in smart cities. By analyzing vast amounts of data, it contributes to sustainability and efficiency. For example, geospatial analysis tools assist cities in preemptively managing road maintenance costs.Real-Time Insights: Energy firms and grid operators integrate AI, machine learning (ML), and the Internet of Things (IoT) to capture real-time data. This enables optimal product and service delivery.Enhancing Security: AI technologies analyze extensive data to identify patterns indicating cyber threats within power grids, bolstering security.Energy Projections: AI assists in discovering new energy projections and optimizing production from existing infrastructures in the energy and utilities industry.

Just looking at these examples, AI is a driving force behind sustainable energy practices and smarter cities.

But let’s start from the beginning. What is AI? Artificial intelligence (AI) comprises a collection of algorithms that have the remarkable ability to transform various types of data, including images, sounds, videos, and sensor data, into valuable insights and actionable information. By integrating AI with online forecasts, such as weather predictions, and real-time inputs from operators, AI systems can analyze vast amounts of data to identify patterns, optimize maintenance tasks, suggest optimal machinery settings, and support decision-making processes aimed at reducing overall energy consumption and improving efficiency.

Utilities companies are under increasing pressure to optimize their operations, reduce costs, and minimize their environmental impact. With AI, utility companies can automate operations, optimize costs, reduce energy consumption, and lower their carbon footprint, all while enhancing overall efficiency and performance.

One of the key advantages of AI in utilities is its ability to leverage vast amounts of data from various sources, including IoT devices, historical data, online data, and weather forecasts. By harnessing this data, AI can dynamically adjust energy pricing to synchronize with demand fluctuations, helping utility companies maximize revenue while ensuring cost-effectiveness for consumers. Additionally, AI can analyze this data to suggest strategies for lowering energy costs, optimizing consumption, and reducing waste, ultimately leading to a more sustainable and efficient energy ecosystem.

In the context of smart cities, AI plays a pivotal role in orchestrating various interconnected systems to enhance overall functionality and livability. By integrating AI into smart city infrastructure, companies can optimize energy management, improve resource allocation, and enhance overall sustainability.

Take a moment to explore the illustration below, depicting the typical deployment of byteLAKE’s Data Insights in both Smart City and Smart Factory contexts. In the Smart City scenario, AI integrated into Data Insights harnesses data from SCADA systems (Supervisory Control and Data Acquisition), empowering utility companies to optimize their operations effectively. By analyzing this data, the AI suggests the ideal supply temperature required to deliver the necessary heat level to all infrastructure nodes while simultaneously reducing overall costs. Moreover, byteLAKE’s Data Insights utilizes AI to optimize costs and minimize energy losses in district heating networks and factories. For instance, even a slight reduction in flow temperature, say by 1–2 degrees, can translate into substantial savings, amounting to millions of euros annually. Furthermore, Data Insights offers additional benefits, such as forecasting and optimization, predictive maintenance, and robust monitoring and management capabilities. Through AI-driven insights, utility companies can proactively address challenges, streamline operations, and enhance overall efficiency, thereby fostering a sustainable and resilient energy ecosystem.

byteLAKE’s Data Insights: AI for Energy and Utility Companies.

In smart factory settings, AI is revolutionizing energy management by optimizing the utilization of different energy sources. By analyzing data from SCADA systems, sensors, weather forecasts, and other sources, AI can forecast energy demand and optimize the operation of heating plants, reducing costs and ensuring efficient heat distribution. Additionally, AI can predict equipment failures and maintenance needs, enabling proactive maintenance to avoid unexpected downtime and optimize overall operational efficiency. Furthermore, AI algorithms can monitor production lines, detect errors, and improve remote management of individual elements, ensuring smooth and uninterrupted operation.

For example, AI can optimize the management of different energy sources, based on contracts with Utility companies, but also renewable energy, and other sources, by combining data from weather forecasts, schedules, shifts, and planned activities. This enables Smart Factories to develop proactive strategies to optimize energy consumption, reduce waste, and lower carbon emissions, ultimately contributing to a greener and more sustainable future.

byteLAKE’s Data Insights uses AI to optimize costs and reduce energy losses in district heating networks and factories. For example, reducing the flow temperature by just 1–2 degrees can save millions of euros per year.

In summary, AI is transforming Utilities, Smart Cities, and Smart Factories by revolutionizing energy management, optimizing costs, and reducing environmental impact. By harnessing the power of AI, Utility companies can automate operations, optimize energy consumption, and enhance overall sustainability, paving the way for a more efficient and greener future. With AI-driven solutions, companies can stay ahead of the curve, driving innovation and delivering value to both consumers and the environment.

If you want to learn more, you might find these articles interesting as they are related to the industrial AI subject:

A Comprehensive Guide to Deploying AI in Industries: From Idea to Deployment | by Marcin Rojek | Apr, 2024 | Medium Harnessing the Power of AI and IoT: Revolutionizing Data Insights for Industry 4.0 | by Marcin Rojek | Apr, 2024 | Becoming Human: Artificial Intelligence Magazine https://medium.com/media/a4b6a14acf115882e6876ab485c50b86/href

Energy and Utility Companies are Ready for AI — Let’s Explore the Benefits. was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

GRN Roundup, Technology

RAG architecture with Voyage AI embedding models on Amazon SageMaker JumpStart and Anthropic Claude 3 models

May 14, 2024

Advertisements

This post is a guest post co-written with Tengyu Ma and Wen Phan from Voyage AI.

Organizations today have access to vast amounts of data, much of it proprietary, which holds the potential to unlock valuable insights when used effectively in generative artificial intelligence (AI) applications. Retrieval Augmented Generation (RAG) is a powerful technique designed to tap into this reservoir of information. By dynamically pulling relevant data from these extensive databases during the response generation process, RAG enables AI models to produce more accurate, relevant, and contextually rich outputs.

Embedding models are crucial components in the RAG architecture, serving as the foundation for effectively identifying and retrieving the most relevant information from a large dataset. These models convert large volumes of text into compact, numerical representations, allowing the system to quickly sift through and match query-related data with unprecedented precision. By facilitating a more efficient and accurate retrieval process, embedding models make sure that the generative component of RAG is fed with the most pertinent information.

In this post, we provide an overview of the state-of-the-art embedding models by Voyage AI and show a RAG implementation with Voyage AI’s text embedding model on Amazon SageMaker Jumpstart, Anthropic’s Claude 3 model on Amazon Bedrock, and Amazon OpenSearch Service. Voyage AI’s embedding models are the preferred embedding models for Anthropic. In addition to general-purpose embedding models, Voyage AI offers domain-specific embedding models that are tuned to a particular domain.

RAG architecture and embedding models

RAG is the predominant design pattern for enterprise chatbots where a retrieval system fetches validated sources and documents that are pertinent to the query and inputs them to a large language model (LLM) to generate a response. It combines the generative capabilities of models with the informational breadth found in vast databases, enabling the model to pull relevant external documents to enhance its responses. This results in outputs that are not only contextually rich but also factually accurate, significantly boosting the reliability and utility of LLMs across diverse applications.

Let’s briefly review RAG using the following figure.

RAG systems are empowered by semantic search using dense-vector representations of the documents called embeddings. These vectors are stored in a vector store, where they can be efficiently retrieved later. At query time, a query is also converted into a vector and then used to find and retrieve similar documents stored in the vector store via a k-nearest neighbor (k-NN) search against the document vector representations. Finally, the retrieved documents along with the query are used to prompt the generative model, often resulting in higher-quality responses and fewer hallucinations.

Embedding models are neural network models that transform queries and documents into embeddings. The retrieval quality is solely decided by how the data is represented as vectors, and the effectiveness of embedding models is evaluated based on their accuracy in retrieving relevant information. Therefore, the retrieval quality of the embedding models is highly correlated with the quality of the RAG system responses—to make your RAG more successful, you should consider improving your embeddings. Check out this blog for a detailed explanation.

Voyage AI’s general-purpose and domain-specific embedding models

Voyage AI develops cutting-edge embedding models with state-of-the-art retrieval accuracy. voyage-large-2 is Voyage’s most powerful generalist embedding model, outperforming popular competing models. Voyage also offers voyage-2, a base generalist embedding model optimized for latency and quality. The following table summarizes the Voyage embedding models currently available on SageMaker JumpStart.

Voyage AI Model
SageMaker JumpStart Model ID
Description

voyage-2
voyage-2-embedding
General-purpose embedding model optimized for a balance between cost, latency, and retrieval quality

voyage-large-2
voyage-large-2-embedding
General-purpose embedding model optimized for retrieval quality

voyage-code-2
voyage-code-2-embedding
Domain-specific embedding model optimized for code retrieval (17% better than alternatives)

In addition to general-purpose embedding models, Voyage AI offers domain-specific ones that are tuned to a particular domain. These domain-specific embedding models are trained on massive domain-specific datasets, allowing them to deeply understand and excel in that domain. For example, Voyage’s code embedding model (voyage-code-2) outperforms general-purpose embedding models on code-related data documents, achieving about a 15% improvement over the next best model. This performance gap over the next best general-purpose embedding improves even more for datasets requiring deeper code understanding. See voyage-code-2: Elevate Your Code Retrieval for voyage-code-2 details. More recently, Voyage released a legal embedding model (voyage-law-2) that is optimized for legal retrieval and tops the MTEB leaderboard for legal retrieval. See Domain-Specific Embeddings and Retrieval: Legal Edition (voyage-law-2) for voyage-law-2 details. Voyage AI plans to continue releasing additional domain-specific embedding models in the near future, including finance, healthcare, and multi-language. For a list of all available Voyage AI embedding models, see Embeddings.

Voyage AI offers API endpoints for embedding models, making it seamless to integrate with other components of your RAG stack. The Voyage AI embedding models are available on AWS Marketplace and deployable as Amazon SageMaker endpoints within your account and VPC, eliminating security and compliance concerns. As part of SageMaker JumpStart, you can deploy Voyage AI embedding models with a few clicks and start running your RAG stack on AWS.

Solution overview

In this RAG solution, we use Voyage AI embedding models deployed with SageMaker JumpStart to demonstrate an example using the Apple 2022 annual report (SEC Form 10-K) as the corpus to retrieve from. Specifically, we deploy the SageMaker model package of the voyage-large-2 model. For the LLM, we use the Anthropic Claude 3 Sonnet model on Amazon Bedrock. We use OpenSearch Service as the vector store. You can also follow along with the notebook. The following diagram illustrates the solution architecture.

SageMaker JumpStart is the machine learning (ML) hub of SageMaker that offers one-click access to over 350 open source and third-party models. These models can be discovered and deployed through the Amazon SageMaker Studio UI or using the SageMaker Python SDK. SageMaker JumpStart provides notebooks to customize and deploy foundation models into your VPC.

Anthropic’s Claude 3 models are the next generation of state-of-the-art models from Anthropic. For the vast majority of workloads, Sonnet is faster on inputs and outputs than Anthropic’s Claude 2 and 2.1 models, with higher levels of intelligence. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like Anthropic through an API, making it straightforward to build generative AI applications. To follow along, be sure to request model access to Anthropic Claude 3 Sonnet on Amazon Bedrock.

Amazon OpenSearch Service is a managed service that makes it straightforward to deploy, operate, and scale OpenSearch, a popular open source, distributed search analytics suite derived from Elasticsearch. OpenSearch provides the ability to do vector search via the k-NN search.

Prerequisites

To follow along, you need to create an OpenSearch Service domain. For the purposes of this walkthrough, the Easy create option is fine. Keep the Enable fine-grained access control option selected. Select Create master user and provide a user name and password. After the domain has been created, the domain details will have the domain endpoint, which you’ll need—along with the user name and password—to access your OpenSearch instance. You don’t need to worry about creating an index or inserting data. We use the OpenSearch Python client to work with our vector store in the walkthrough.

Deploy Embedding model endpoint

To use voyage-large-2, you need to subscribe to the SageMaker model package in AWS Marketplace. For instructions, see Subscribe to the model package. Choosing the model card in the SageMaker JumpStart UI will also bring you to the model listing page on AWS Marketplace.

After you’re subscribed, you can initialize and deploy the embedding model as a SageMaker endpoint as follows:

# Set embedding endpoint configuration
(embedding_model_id, embedding_model_version, embedding_instance_type) = (
“voyage-large-2-embedding”,
“*”,
“ml.g5.xlarge”, # See AWS Marketplace model package for supported instance types
)

# Instantiate embedding model from JumpStart
from sagemaker.jumpstart.model import JumpStartModel

embedding_model = JumpStartModel(
model_id=embedding_model_id,
model_version=embedding_model_version,
instance_type=embedding_instance_type,
)

# Deploy model as inference endpoint. This can take several minutes to deploy (5 to 10 minutes)
embedding_endpoint = embedding_model.deploy()

Vectorize Documents

With the embedding endpoint deployed, you can index your documents for retrieval.

Transform and chunk documents

You need a list of strings to invoke the deployed voyage-large-2 model. For many documents, like our example annual report, each string is a semantically meaningful chunk of text. There are several ways you can load and chunk documents for vectorization. The code in this section is just one example; feel free to use what suits your data source and files.

In this walkthrough, we load and chunk the source PDF file with the LangChain PyPDFLoader (which uses pypdf) and recursive character text splitter:

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = PyPDFLoader(“apple-10k-2022.pdf”)
document_chunks = loader.load_and_split(
RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=100,
length_function=len,
is_separator_regex=False,
)
)

In practice, selecting the text splitting chunk size and overlap requires some experimentation. The are many techniques for appropriately chunking documents for high-quality retrieval, but that is beyond the scope of this post.

Generate document embeddings

You can now vectorize your documents—or more precisely, your document chunks. See the following code:

# Set batch size
BATCH_SIZE = 45
In [ ]:
# Vectorize chunks in batches
index_list = []
for i in range(0, len(chunk_list), BATCH_SIZE):
docs_playload = {
“input”: chunk_list[i:i + BATCH_SIZE],
“input_type”: “document”,
“truncation”: “true”,
}

embed_docs_response = embedding_endpoint.predict(json.dumps(docs_playload))

doc_embeddings_list = [d[“embedding”] for d in embed_docs_response[“data”]]
index_list += [
{“document”: document, “embedding”: embedding}
for document, embedding in zip(chunk_list[i:i + BATCH_SIZE], doc_embeddings_list)
]

Create a vector store index

The next step is to populate your OpenSearch vector search index with the document embeddings using the OpenSearch Python client:

# Populate index with document, embedding, and ID
for id, i in zip(range(0, len(index_list)), index_list):
index_response = opensearch_client.index(
index=INDEX_NAME_OPENSEARCH,
body={
“document”: i[“document”],
“embedding”: i[“embedding”],
},
id=id,
refresh=True,
)

Retrieve relevant documents

With your indexed vector store, you can now use embeddings to find relevant documents to your query:

# Set number of documents to retrieve
TOP_K = 3
In [ ]:
# Set vector search payload
vector_search_payload = {
“size”: TOP_K,
“query”: {“knn”: {“embedding”: {“vector”: query_embedding, “k”: TOP_K}}},
}
In [ ]:
vector_search_response = opensearch_client.search(
index=INDEX_NAME_OPENSEARCH,
body=vector_search_payload,
)

The following is a formatted semantic search result of the top three most-relevant document chunks, indicating the index ID, similarity score, and the first several characters of the chunk:

ID: 4
Score: 0.7956404
Document: under Section 404(b) of the Sarbanes-Oxley Act (15 U.S.C. 7262(b)) by the registered public accounting firm that prepared or issued its audit report. ☒
Indicate by check mark whether the Registrant is a shell company (as defined in Rule 12b-2 of the Act).
Yes ☐ No ☒
The aggregate market value of the voting and non-voting stock held by non-affiliates of the Registrant, as of March 25, 2022, the last business day of the Registrant’s most recently completed second fiscal quarter, was approximately $2,830,067,000,000. Solely for purposes of this disclosure, shares of common stock held by executive officers and directors of the Registrant as of such date have been excluded because such persons may be deemed to be affiliates. This determination of executive officers and directors as affiliates is not necessarily a conclusive determination for any other purposes. 15,908,118,000 shares of common stock were issued and outstanding as of October 14, 2022.

ID: 5
Score: 0.7367379
Document: 15,908,118,000 shares of common stock were issued and outstanding as of October 14, 2022.
DOCUMENTS INCORPORATED BY REFERENCE
Portions of the Registrant’s definitive proxy statement relating to its 2023 annual meeting of shareholders are incorporated by reference into Part III of this Annual Report on Form 10-K where indicated. The Registrant’s definitive proxy statement will be filed with the U.S. Securities and Exchange Commission within 120 days after the end of the fiscal year to which this report relates.

ID: 178
Score: 0.7263324
Document: Note 3 – Financial Instruments
Cash, Cash Equivalents and Marketable Securities
The following tables show the Company’ s cash, cash equivalents and marketable securities by significant investment category as of September 24, 2022 and September 25, 2021 (in millions):
2022
Adjusted Cost
Unrealized Gains
Unrealized Losses
Fair Value
Cash and Cash Equivalents
Current Marketable Securities
Non-Current Marketable Securities
Cash $ 18,546 $ — $ — $ 18,546 $ 18,546 $ — $ —
Level 1 :
Money market funds 2,929 — — 2,929 2,929 — —
Mutual funds 274 — (47) 227 — 227 —
Subtotal 3,203 — (47) 3,156 2,929 227 —
Level 2 :
U.S. Treasury securities 25,134 — (1,725) 23,409 338 5,091 17,980
U.S. agency securities 5,823 — (655) 5,168 — 240 4,928
Non-U.S. government securities 16,948 2 (1,201) 15,749 — 8,806 6,943 Certificates of deposit and time deposits 2,067 — — 2,067 1,805 262 —
Commercial paper 718 — — 718 28 690 —
Corporate debt securities 87,148 9 (7,707) 79,450 — 9,023 70,427

The top retrieved document chunk (ID 4 with a score of 0.7956404) contains a statement that provides a direct answer to our query:

The aggregate market value of the voting and non-voting stock held by non-affiliates of the Registrant, as of March 25, 2022, the last business day of the Registrant’s most recently completed second fiscal quarter, was approximately $2,830,067,000,000.

This additional context will enable Claude to provide a response that answers your query.

Generate a retrieval augmented response

You can now prompt Claude to use the retrieved documents to answer your query:

# Create retrieval-augmented prompt
rag_prompt = f”””Human:

INSTRUCTIONS:
Answer the QUERY using the CONTEXT text provided below. Keep your answer
grounded in the facts of the CONTEXT. If the CONTEXT doesn’t contain the
facts to answer the QUERY just respond with “I do not have enough context
to respond to this query.”.

QUERY: {query}

CONTEXT: {context}

Assistant:
“””

Next initialize the Amazon Bedrock client to invoke Anthropic’s Claude3 Sonnet model in us-east-1.

# List available LLMs on Amazon Bedrock
bedrock_client = boto3.client(‘bedrock’, region_name=’us-east-1′)
bedrock_fm = bedrock_client.list_foundation_models()
print([(m[“modelId”], m[“modelName”]) for m in bedrock_fm[“modelSummaries”]])
In [ ]:
# Set Bedrock LLM model ID
LLM_MODEL_ID = “anthropic.claude-3-sonnet-20240229-v1:0”
In [ ]:
bedrock_runtime_client = boto3.client(‘bedrock-runtime’, region_name=’us-east-1′)
In [ ]:
# Invoke LLM inference
try:
llm_rag_response = bedrock_runtime_client.invoke_model(
modelId=LLM_MODEL_ID,
body=json.dumps(rag_payload),
accept=”application/json”,
contentType=”application/json”,
)
except botocore.exceptions.ClientError as error:
print(“Client error”)
raise error

Claude produces a correct response that answers the query using the context provided:

According to the context provided, “The aggregate market value of the voting and non-voting stock held by non-affiliates of the Registrant, as of March 25, 2022, the last business day of the Registrant’s most recently completed second fiscal quarter, was approximately $2,830,067,000,000.”

Generate a response without retrieved documents

For comparison, let’s see how Claude will respond to a query directly—without any additional context. Specifically, we ask “What is the aggregate market value of the voting and non-voting stock held by non-affiliates of the Registrant, as of March 25, 2022?”

The following is Claude’s response:

Unfortunately, without access to the specific financial filings or reports for the company you are asking about, I do not have enough information to determine the aggregate market value of the voting and non-voting stock held by non-affiliates as of March 25, 2022.

Public companies are required to disclose this information in their annual reports and other SEC filings, but I do not have access to those documents for the particular company or “Registrant” you are referring to. The aggregate market value calculation takes into account things like the number of outstanding shares, recent stock prices, and which shareholders are considered affiliates versus non-affiliates.

Unless you are able to provide the company’s name and the specific report containing this data point, I cannot lookup or calculate that aggregate market value figure. My role is to provide helpful information to you, but I do not have a way to access each public company’s proprietary financial data directly. Please let me know if you can share more details about the company in question.

Claude indicates it does not have enough context, especially because the query didn’t explicitly indicate the registrant (the company).

Clean up

If you’ve been following along, don’t forget to delete the SageMaker endpoint and delete the OpenSearch Service domain when you’re done so you don’t incur unnecessary costs:

embedding_endpoint.delete_model()
embedding_endpoint.delete_endpoint()

Conclusion

Embeddings are at the heart of a RAG system, and Voyage AI offers the best general-purpose and domain-specific embedding models today. Get started or level up your existing RAG stack on AWS today with Voyage AI embedding models—it’s seamless with SageMaker JumpStart. You can try the notebook in this post yourself. Learn more about Voyage AI and follow them on X (Twitter) or LinkedIn for updates!

About the Authors

Tengyu Ma is CEO and Co-Founder of Voyage AI and an assistant professor of computer science at Stanford University. His research interests broadly include topics in machine learning, algorithms and their theory, such as deep learning, (deep) reinforcement learning, pre-training / foundation models, robustness, non-convex optimization, distributed optimization, and high-dimensional statistics. Tengyu earned his PhD from Princeton University and has worked at Facebook and Google as visiting scientists.

Wen Phan is Head of Product at Voyage AI and has spent the last decade developing and commercializing AI and data products for enterprises. He has worked with hundreds of users and organizations around the world to apply AI and data to their use cases in financial services, healthcare, defense, and technology, to name a few. Wen holds a B.S. in electrical engineering and M.S. in analytics and decision sciences. Personally, he enjoys spinning hip-hop records, dining out, and spending time with his wife and two kids — oh, and guzzling cookies and cream milkshakes, too!

Vivek Gangasani is an AI/ML Solutions Architect working with Generative AI startups on AWS. He helps world leading AI startups train, host and operationalize LLMs to build innovative Generative AI solutions. Currently, he is focused on developing strategies for fine-tuning and optimizing the inference performance at scale for LLMs. In his free time, Vivek enjoys hiking, watching movies and trying different cuisines.

GRN Roundup, Technology

Incorporate offline and online human – machine workflows into your generative AI applications on AWS

May 14, 2024

Advertisements

Recent advances in artificial intelligence have led to the emergence of generative AI that can produce human-like novel content such as images, text, and audio. These models are pre-trained on massive datasets and, to sometimes fine-tuned with smaller sets of more task specific data. An important aspect of developing effective generative AI application is Reinforcement Learning from Human Feedback (RLHF). RLHF is a technique that combines rewards and comparisons, with human feedback to pre-train or fine-tune a machine learning (ML) model. Using evaluations and critiques of its outputs, a generative model can continue to refine and improve its performance. The interplay between Generative AI and human input paves the way for more accurate and responsible applications. You can learn how to improve your LLMs with RLHF on Amazon SageMaker, see Improving your LLMs with RLHF on Amazon SageMaker.

Athough RLHF is the predominant technique for incorporating human involvement, it is not the only available human in the loop technique. RLHF is an offline, asynchronous technique, where humans provide feedback on the generated outputs, based on input prompts. Humans can also add value by intervening into an existing communication happening between generative AI and users. For instance, as decided by AI or desired by the user, a human can be called into an existing conversation and take over the discussion.

In this post, we introduce a solution for integrating a “near-real-time human workflow” where humans are prompted by the generative AI system to take action when a situation or issue arises. This can also be a ruled-based method that can determine where, when and how your expert teams can be part of generative AI – user conversations. The entire conversation in this use case, starting with generative AI and then bringing in human agents who take over, is logged so that the interaction can be used as part of the knowledge base. Together with RLHF, near-real-time human-in-the-loop methods enable the development of responsible and effective generative AI applications.

This blog post uses RLHF as an offline human-in-the-loop approach and the near-real-time human intervention as an online approach. We present the solution and provide an example by simulating a case where the tier one AWS experts are notified to help customers using a chat-bot. We use an Amazon Titan model on Amazon Bedrock to find the sentiment of the customer using a Q&A bot and then notifying about negative sentiment to a human to take the appropriate actions. We also have another expert group providing feedback using Amazon SageMaker GroundTruth on completion quality for the RLHF based training. We used this feedback to finetune the model deployed on Amazon Bedrock to power the chat-bot. We provide LangChain and AWS SDK code-snippets, architecture and discussions to guide you on this important topic.

SageMaker GroudTruth

SageMaker Ground Truth offers the most comprehensive set of human-in-the-loop capabilities, allowing you to harness the power of human feedback across the ML lifecycle to improve the accuracy and relevancy of models. You can complete a variety of human-in-the-loop tasks with SageMaker Ground Truth, from data generation and annotation to model review, customization, and evaluation, through either a self-service or an AWS-managed offering.

Amazon Bedrock

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon with a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. With Amazon Bedrock, you can easily experiment with and evaluate top FMs for your use case, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that run tasks using your enterprise systems and data sources. Because Amazon Bedrock is serverless, you don’t have to manage any infrastructure, and you can securely integrate and deploy generative AI capabilities into your applications using the AWS services you are already familiar with.

Example use-case

In this use case, we work with a generative AI powered Q&A bot, which answers questions about SageMaker. We built the RAG solution as detailed in the following GitHub repo and used SageMaker documentation as the knowledge base. You can build such chatbots following the same process. The interface of the Q&A looks like the following screenshot. Amazon SageMaker Sample and used Amazon SageMaker documentation as the knowledge base. You can easily build such chatbots following the same process. Eventually, the interface of the Q&A looks like in Figure 1.

Figure 1. UI and the Chatbot example application to test human-workflow scenario.

In this scenario, we incorporate two human workflows to increase customer satisfaction. The first is to send the interactions to human experts to assess and provide scores. This is an offline process that is part of the RLHF. A second real-time human workflow is initiated as decided by the LLM. We use a simple notification workflow in this post, but you can use any real-time human workflow to take over the AI-human conversation.

Solution overview

The solution consists of three main modules:

Near real-time human engagement workflow
Offline human feedback workflow for RLHF
Fine-tuning and deployment for RLHF

The RLHF and real-time human engagement workflows are independent. Therefore, you can use either or both based on your needs. In both scenarios, fine-tuning is a common final step to incorporate these learnings into LLMs. In the following sections, we provide the details about incorporating these steps one by one and divide the solution into related sections for you to choose and deploy.

The following diagram illustrates the solution architecture and workflow.

Figure 2. Solutions architecture for human-machine workflow modules

Implementation

Prerequisites

Our solution is an add-on to an existing Generative AI application. In our example, we used a Q&A chatbot for SageMaker as explained in the previous section. However, you can also bring your own application. The blog post assumes that you have expert teams or workforce who performs reviews or join workflows.

Build a near real-time human engagement workflow workflow

This section presents how an LLM can invoke a human workflow to perform a predefined activity. We use AWS Step Functions which is a serverless workflow orchestration service that you can use for human-machine workflows. In our case, we call the human experts into action, in real time, but you can build any workflow following the tutorial Deploying an Example Human Approval Project.

Decision workflow to trigger real time human engagement

In this scenario, the customer interacts with the Q&A bot (Step-1 in the previous architecture diagram), and if the interaction shows strong negative sentiment, it will invoke a pre-existing human workflow (Step-2 in Figure 2). In our case, it is a simple email notification (Step-3 in Figure 2) but you can extend this interaction such as including the experts into the chat-zone to take over the conversation and more (Step-4 in Figure 2).

Before we dive deep into the solution, it is important to discuss the workflow logic. The following figure shows the details of the decision workflow. The interaction starts with a customer communication. Here, before the LLM provides an answer to the customer request, the prompt-chain starts with an internal prompt asking the LLM to go over the customer response and look for clear negative sentiment. This prompt and internal sentiment analysis are not visible to customer. This is an internal chain before proceeding with the next steps of which responses may be reflected to the customer based on your preference. If the sentiment is negative, the next step is to trigger a pre-built engagement human-workflow while the chatbot informs the customer about the extra support coming to help. Otherwise, if the sentiment is neutral or positive, the normal response to the customer request will be provided.

This workflow is a demonstrative example and you can add to or modify it as you prefer. For example, you can make any other decision check, not limited to sentiment. You can also prepare your own response to the customer with the right prompting the chain so that you can implement your designed customer experience. Here, our simple example demonstrates how you can easily build such prompt in chains and engage external existing workflows, in our case, it is a human-workflow using Amazon Bedrock. We also use the same LLM to respond to this internal sentiment prompt check for simplicity. However, you can include different LLMs, which might have been fine-tuned for specific tasks, such as sentiment analysis, so that you rely on a different LLM for the Q&A chatbot experience. Adding more serial steps into chains increases the latency because now the customer query or request is being processed more than once.

Figure 3. Real-time (online) human workflow triggered by LLM.

Implementing the decision workflow with Amazon Bedrock

To implement the decision workflow, we used Amazon Bedrock and its LangChain integrations. The prompt chain is run through SequentialChain from LangChain. Because our human workflow is orchestrated with Step Functions, we also use LangChain’s StepFunction library.

First, define the LLM and prompt template:

prompt = PromptTemplate(
input_variables=[“text”],
template=”{text}”,)
llm = Bedrock(model_id=”amazon.titan-tg1-large”)
llmchain_toxic = LLMChain(llm=llm, prompt=prompt,output_key=”response”)

Then you feed the response from the first LLM to the next LLM through an LLM chain, where the second instruct is to find the sentiment of the response. We also instruct the LLM to provide 0 as positive and 1 as negative response.

templateResponseSentiment=”””Find the sentiment of below sentence, respond 0 if positive and respond 1 if negative
{response} “””

prompt_sentiment= PromptTemplate( input_variables=[“response”], template = templateResponseSentiment)
llmchain_sentiment= LLMChain(llm=llm, prompt=prompt_sentiment,output_key=”sentiment”)

from langchain.chains import SequentialChain
overall_chain = SequentialChain(chains=[llmchain_toxic, llmchain_sentiment], input_variables=[“text”],output_variables=[“response”, “sentiment”],verbose=True)

Run a sequential chain to find the sentiment:

response= overall_chain({ “text”: “Can you code for me for SageMaker” })
print(“response payload ” + str(response))
print(“n response sentiment: ” + response[‘sentiment’])

If the sentiment is negative, the model doesn’t provide the response back to customer, instead it invokes a workflow that will notify a human in loop:

if “1” in response_sentiment[‘sentiment’] : # 1 represents negative sentiment
print(‘triggered workflow, check email of the human on notification and add to workflow anything else you may want’)
lambda_client = boto3.client(‘lambda’)
#create input – send the response from LLM and detected sentiment
lambda_payload1=”{“response”: “” + response[‘text’] +””,”response_sentiment”: ” + “”1″}”
lambda_client.invoke(FunctionName=’triggerWorkflow’, InvocationType=’Event’, Payload=lambda_payload1)

If you choose to have your human experts join a chat with the users, you can add these interactions of your expert teams to your knowledge base. This way, when the same or similar issue is raised, the chatbot can use these in their answers. In this post, we did not show this method, but you can create a knowledge base in Amazon Bedrock to use these human-to-human interactions for future conversations in your chatbot.

Build an offline human feedback workflow

In this scenario, we assume that the chat transcripts are stored in an Amazon Simple Storage Service (Amazon S3) bucket in JSON format, a typical chat transcript format, for the human experts to provide annotations and labels on each LLM response. The transcripts are sent for a labeling task performed by a labeling workforce using Amazon SageMaker Ground Truth. However, in some cases, it’s impossible to label all the transcripts due to resource limitations. In these cases, you may want to randomly sample the transcripts or use a pattern that can be sent to the labeling workforce based on your business case.

Pre-annotation Lambda function
The process starts with an AWS Lambda function. The pre-annotation Lambda function is invoked based on chron job or based on an event or on-demand. Here, we use the on-demand option. SageMaker Ground Truth sends the Lambda function a JSON-formatted request to provide details about the labeling job and the data object. More information can be found here. Following is the code snippet for the pre-processing Lambda function:

import json
def lambda_handler(event, context):
return {
“taskInput”: event[‘dataObject’]
}

# JSON formatted request

{
“version”: “2018-10-16”,
“labelingJobArn”: <labelingJobArn>
“dataObject” : {
“source-ref”: <s3Uri where dataset containing the chabot responses are stored>
}
}

Custom workflow for SageMaker Ground Truth
The remaining part of sending the examples, UI, and storing the results of the feedback are performed by SageMaker Ground Truth and invoked by the pre-annotation Lambda function. We use the labeling job with the custom template option in SageMaker Ground Truth. The workflow allows labelers to rate the relevance of an answer to a question from 1–5, with 5 being the most relevant. Here, we assumed a conventional RLHF workflow where the labeling workforce provides the score based on their expectation from the LLM in this situation. The following code shows an example:

https://assets.crowd.aws/crowd-html-elements.js
<crowd-form>
<crowd-classifier
name=”relevance”
categories=”[‘1’, ‘2’, ‘3’, ‘4’, ‘5’]”
header=”How relevant is the below answer to the question: {{ task.input.source }}”
>
<classification-target>
{{ task.input.source }}
</classification-target>
<full-instructions header=”Conversation Relevance Instructions”>
<h2>How relevant is the below answer to the given question?</h2>
</full-instructions>
<short-instructions>
How relevant is the below answer to the question: {{ task.input.source }}
</short-instructions>
</crowd-classifier>
</crowd-form>

In our scenario, we used the following UI for our labeling workers to score the complete response given for the prompt. This provides feedback on the answer to a question given by the chatbot, marking it as 1–5, with 5 being most the relevant answer to the question.

Figure 4. Two examples from RLHF feedback UI.

Post annotation Lambda function
When all workers complete the labeling task, SageMaker Ground Truth invokes the post-annotation Lambda function with a pointer to the dataset object and the workers’ annotations. This post-processing Lambda function is generally used for annotation consolidation, which has SageMaker Ground Truth create a manifest file and uploads it to an S3 bucket for persistently storing consolidated annotations. The following code shows the postprocessing Lambda function:

import json
import boto3
from urllib.parse import urlparse

def lambda_handler(event, context):
consolidated_labels = []

parsed_url = urlparse(event[‘payload’][‘s3Uri’]);
s3 = boto3.client(‘s3’)
textFile = s3.get_object(Bucket = parsed_url.netloc, Key = parsed_url.path[1:])
filecont = textFile[‘Body’].read()
annotations = json.loads(filecont);

for dataset in annotations:
for annotation in dataset[‘annotations’]:
new_annotation = json.loads(annotation[‘annotationData’][‘content’])
label = {
‘datasetObjectId’: dataset[‘datasetObjectId’],
‘consolidatedAnnotation’ : {
‘content’: {
event[‘labelAttributeName’]: {
‘workerId’: annotation[‘workerId’],
‘result’: new_annotation,
‘labeledContent’: dataset[‘dataObject’]
}
}
}
}
consolidated_labels.append(label)

return consolidated_labels

You can use the output manifest file to further fine-tune your LLM model, as detailed in the next section. The following code is a snippet of the created manifest file:

JSON:

{“source”:”what is amazon SageMaker?,AWS SageMaker is a machine learning service that allows you to train and deploy machine learning models in the cloud.”,”RHLF-custom-feedback”:{“workerId”:”private.us-east-1.8c185c045aed3bef”,”result”:{“relevance”:{“label”:”5 – Highly Relevant”}},”labeledContent”:{“content”:”what is amazon SageMaker?,AWS SageMaker is a machine learning service that allows you to train and deploy machine learning models in the cloud.”}},”RHLF-custom-feedback-metadata”:{“type”:”groundtruth/custom”,”job-name”:”rhlf-custom-feedback”,”human-annotated”:”yes”,”creation-date”:”2023-08-09T02:46:05.852000″}}

Fine-tune the LLM using RLHF

To demonstrate RLHF in both near real-time and offline workflows, we collected 50 human-annotated samples using SageMaker Ground Truth. The data is used for RLHF training on a Flan-T5 XL model by PEFT/LoRA with 8-bit quantization:

from peft import LoraConfig

lora_config = LoraConfig(
r=32,
lora_alpha=32,
lora_dropout=0.05,
target_modules=[“q”, “v”],
bias=”none”,
task_type=”SEQ_2_SEQ_LM”,
)

The training uses the learning rate 1e-5 for 10 epochs, and the batch size = 1 to use one sample at a time.

learning_rate=1e-5
max_ppo_epochs=10
mini_batch_size=1
batch_size=1

config = PPOConfig(
model_name=model,
learning_rate=learning_rate,
ppo_epochs=max_ppo_epochs,
mini_batch_size=mini_batch_size,
batch_size=batch_size,
)

ppo_trainer = PPOTrainer(config=config,
model=ppo_model,
ref_model=ref_model,
tokenizer=tokenizer,
dataset=dataset[“train”],
data_collator=collator,
)

Because there are only 50 human-annotated samples collected from SageMaker Ground Truth, it is not sufficient to train a reward model for reinforcement learning. Therefore, we decided to take the annotated evaluation score for each sample and use them as the reward value in the reinforcement learning process. This should be close enough to the reward value generated from a reward model. Our experiment showed that this method is effective for a small training set. You can see the curve of the training process in the following chart.

Figure 5. Reward/mean chart

After the training, we replaced the Flan-T5 foundation model in the AWS support chatbot with the RLHF trained model. In the following examples, you can observe that the response quality after RLHF is improved and the answers are more comprehensive and contain more useful information:

Question: How does SageMaker protect my data?
Response before RLHF: SageMaker stores code in ML storage volumes
Response after RLHF: SageMaker stores code in ML storage volumes, secured by security groups and optionally encrypted at rest.
Question: What is Amazon SageMaker?
Response before RLHF: AWS SageMaker is a machine learning service that allows you to train and deploy machine learning models in the cloud.
Response after RLHF: A fully managed service to prepare data and build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows.

Clean up

To clean up your resources, first start by stopping and deactivating any active human workflow or fine-tuning jobs. Removing the prompt chaining is a good start for de-coupling the workflows from your existing application. Then, continue by deleting the resources for the real-time human workflow manually. Finally, delete the RLHF resources. If you created a new Q&A chatbot application, then first stop and then delete the resources used for the Q&A chatbot part of the blogpost.

Conclusion

This post presented solutions for incorporating both offline and online human workflows into generative AI applications on AWS. The offline human feedback workflow uses SageMaker Ground Truth to collect human evaluations on chatbot responses. These evaluations are used to provide reward signals for fine-tuning the chatbot’s underlying language model with RLHF. The online human workflow uses LangChain and Step Functions to invoke real-time human intervention based on sentiment analysis of the chatbot responses. This allows human experts to seamlessly take over or step into conversations when the AI reaches its limits. This capability is important for implementations that require using your existing expert teams in critical, sensitive, or determined topics and themes. Together, these human-in-the-loop techniques, offline RLHF workflows, and online real-time workflows enable you to develop responsible and robust generative AI applications.

The provided solutions integrate multiple AWS services, like Amazon Bedrock, SageMaker, SageMaker Ground Truth, Lambda, Amazon S3, and Step Functions. By following the architectures, code snippets, and examples discussed in this post, you can start incorporating human oversight into your own generative AI applications on AWS. This paves the way towards higher-quality completions and building trustworthy AI solutions that complement and collaborate with human intelligence.

Building generative AI applications is effortless with Amazon Bedrock. We recommend starting your experiments following this Quick Start with Bedrock.

About the Authors

Tulip Gupta is a Senior Solutions Architect at Amazon Web Services. She works with Amazon media and entertainment (M&E) customers to design, build, and deploy technology solutions on AWS, and has a particular interest in Gen AI and machine learning focussed on M&E. She assists customers in adopting best practices while deploying solutions in AWS. Linkedin

Burak Gozluku is a Principal AI/ML Specialist Solutions Architect located in Boston, MA. He helps strategic customers adopt AWS technologies and specifically Generative AI solutions to achieve their business objectives. Burak has a PhD in Aerospace Engineering from METU, an MS in Systems Engineering, and a post-doc in system dynamics from MIT in Cambridge, MA. Burak is still a research affiliate in MIT. Burak is passionate about yoga and meditation.

Yunfei bai is a Senior Solutions Architect at AWS. With a background in AI/ML, data science, and analytics, Yunfei helps customers adopt AWS services to deliver business results. He designs AI/ML and data analytics solutions that overcome complex technical challenges and drive strategic objectives. Yunfei has a PhD in Electronic and Electrical Engineering. Outside of work, Yunfei enjoys reading and music.

Rachna Chadha is a Principal Solution Architect AI/ML in Strategic Accounts at AWS. Rachna is an optimist who believes that ethical and responsible use of AI can improve society in future and bring economical and social prosperity. In her spare time, Rachna likes spending time with her family, hiking and listening to music.

GRN Roundup, Technology

Build generative AI applications with Amazon Titan Text Premier, Amazon Bedrock, and AWS CDK

May 14, 2024

Advertisements

Amazon Titan Text Premier, the latest addition to the Amazon Titan family of large language models (LLMs), is now generally available in Amazon Bedrock. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Amazon Titan Text Premier is an advanced, high-performance, and cost-effective LLM engineered to deliver superior performance for enterprise-grade text generation applications, including optimized performance for Retrieval Augmented Generation (RAG) and agents. The model is built from the ground up following safe, secure, and trustworthy responsible AI practices, and excels in delivering exceptional generative AI text capabilities at scale.

Exclusive to Amazon Bedrock, Amazon Titan Text models support a wide range of text-related tasks, including summarization, text generation, classiﬁcation, question-answering, and information extraction. With Amazon Titan Text Premier, you can unlock new levels of efficiency and productivity for your text generation needs.

In this post, we explore building and deploying two sample applications powered by Amazon Titan Text Premier. To accelerate development and deployment, we use the open source AWS Generative AI CDK Constructs (launched by Werner Vogels at AWS re:Invent 2023). AWS Cloud Development Kit (AWS CDK) constructs accelerate application development by providing developers with reusable infrastructure patterns you can seamlessly incorporate into your applications, freeing you to focus on what differentiates your application.

Document Explorer sample application

The Document Explorer sample generative AI application can help you quickly understand how to build end-to-end generative AI applications on AWS. It includes examples of key components needed in generative AI applications, such as:

Data ingestion pipeline – Ingests documents, converts them to text, and stores them in a knowledge base for retrieval. This enables use cases like RAG to tailor generative AI applications to your data.
Document summarization – Summarizes PDF documents using Amazon Titan Premier through Amazon Bedrock.
Question answering – Answers natural language questions by retrieving relevant documents from the knowledge base and using LLMs like Amazon Titan Premier through Amazon Bedrock.

Follow the steps in the README to clone and deploy the application in your account. The application deploys all the required infrastructure, as shown in the following architecture diagram.

After you deploy the application, upload a sample PDF file to the input Amazon Simple Storage Service (Amazon S3) bucket by choosing Select Document in the navigation pane. For example, you can download Amazon’s Annual Letters to Shareholders from 1997–2023 and upload using the web interface. On the Amazon S3 console, you can see that the files you uploaded are now found in the S3 bucket whose name begins with persistencestack-inputassets.

After you have uploaded a file, open a document to see it rendered in the browser.

Choose Q&A in the navigation pane, and choose your preferred model (for this example, Amazon Titan Premier). You can now ask a question against the document you uploaded.

The following diagram illustrates a sample workflow in Document Explorer.

Don’t forget to delete the AWS CloudFormation stacks to avoid unexpected charges. First make sure to remove all data from the S3 buckets, specifically anything in the buckets whose names begin with persistencestack. Then run the following command from a terminal:

cdk destroy -all

Amazon Bedrock Agent and Custom Knowledge Base sample application

The Amazon Bedrock Agent and Custom Knowledge Base sample generative AI application is a chat assistant designed to answer questions about literature using RAG from a selection of books from Project Gutenberg.

This app deploys an Amazon Bedrock agent that can consult an Amazon Bedrock knowledge base backed by Amazon OpenSearch Serverless as a vector store. An S3 bucket is created to store the books for the knowledge base.

Follow the steps in the README to clone the sample application in your account. The following diagram illustrates the deployed solution architecture.

Update the file defining which foundation model to use when creating the agent:

const agent = new bedrock.Agent(this, ‘Agent’, {
foundationModel: bedrock.BedrockFoundationModel.AMAZON_TITAN_PREMIER_V1_0
,
instruction: ‘You are a helpful and friendly agent that answers questions about literature.’,
knowledgeBases: [kb],
});

Follow the steps in the README to deploy the code sample in your account and ingest the example documents.

Navigate to the Agents page on the Amazon Bedrock console in your AWS Region and find your newly created agent. The AgentId can be found in the CloudFormation stack outputs section.

Now you can ask some questions. You may need to tell the agent what book you want to ask about or refresh the session when asking about different books. The following are some examples of questions you may ask:

What are the most popular books in the library?
Who is Mr. Bingley quite taken with at the ball in Meryton?

The following screenshot shows an example of the workflow.

Don’t forget to delete the CloudFormation stack to avoid unexpected charges. Remove all the data from the S3 buckets, then run the following command from a terminal:

cdk destroy

Conclusion

Amazon Titan Text Premier is available today in the US East (N. Virginia) Region. Custom fine-tuning for Amazon Titan Text Premier is also available today in preview in the US East (N. Virginia) Region. Check the full Region list for future updates.

To learn more about the Amazon Titan family of models, visit the Amazon Titan product page. For pricing details, review Amazon Bedrock Pricing. Visit the AWS Generative AI CDK Constructs GitHub repository for more details on available constructs and additional documentation. For practical examples to get started, check out the AWS samples repository.

About the authors

Alain Krok is a Senior Solutions Architect with a passion for emerging technologies. His past experience includes designing and implementing IIoT solutions for the oil and gas industry and working on robotics projects. He enjoys pushing the limits and indulging in extreme sports when he is not designing software.

Laith Al-Saadoon is a Principal Prototyping Architect on the Prototyping and Cloud Engineering (PACE) team. He builds prototypes and solutions using generative AI, machine learning, data analytics, IoT & edge computing, and full-stack development to solve real-world customer challenges. In his personal time, Laith enjoys the outdoors–fishing, photography, drone flights, and hiking.

Justin Lewis leads the Emerging Technology Accelerator at AWS. Justin and his team help customers build with emerging technologies like generative AI by providing open source software examples to inspire their own innovation. He lives in the San Francisco Bay Area with his wife and son.

Anupam Dewan is a Senior Solutions Architect with a passion for Generative AI and its applications in real life. He and his team enable Amazon Builders who build customer facing application using generative AI. He lives in Seattle area, and outside of work loves to go on hiking and enjoy nature.

GRN Roundup, Technology

RASCAL: Novel robotics for scalable and highly available automated storage and retrieval

May 14, 2024

Advertisements

This research paper was presented at the
41st IEEE International Conference on Robotics and Automation (opens in new tab) (ICRA 2024), the premier international forum for robotics research.

Over the past decade, robotics has revolutionized numerous industries that rely on storage systems, such as manufacturing and warehousing. In these contexts, robotics streamlines operations and increase efficiency, and automated storage and retrieval systems (ASRS) are at the heart of this technological shift, exemplifying the transition to smarter, computer-controlled logistics solutions. These systems quickly move items from storage to fulfilment stations, helping to increase speed and accuracy in the overall process. Yet despite these advances, current ASRS—whether rail-based, fixed, or free-roaming—continue to face challenges, often sacrificing scalability and availability for higher throughput capacity. For instance, the use of fixed robots in traditional tape storage libraries, typically used for archival storage, can lead to availability limitations, as the robots cannot pass each other, and a single robot failure can restrict access to a significant portion of the library.

Our paper, published at ICRA 2024, introduces RASCAL: A Scalable, High-redundancy Robot for Automated Storage and Retrieval Systems, which addresses these concerns. RASCAL is an untethered robot that improves the efficiency of vertical storage systems by operating across evenly spaced, parallel shelves and horizontal rails. Designed to maximize scalability and redundancy, it handles the storage and retrieval of small objects. RASCAL was inspired by the challenges of managing archival storage media in datacenters, and it’s the key component of Project Silica’s storage and retrieval system. However, RASCAL’s modularity enables it to be used in other scenarios as well.

An innovative approach to archival storage

RASCAL’s design is based on four key principles:

Addressability: This allows any robot to access any item being stored on the shelves.

Scalability: The system can adjust retrieval capacity and storage space by adding or removing robots and shelving with negligible downtime.

Availability: A single robot failure minimally impacts access to items and routing, and it does not obstruct the operation of other robots.

Serviceability: Robots can easily be added or removed from the rails without the need for special training.

RASCAL’s motion system supports horizontal and vertical movement along storage panels assembled from contiguous storage racks. The parallel rail system enables independent and flexible movement. These rails are designed to be passive—functioning without the need for active power or energy sources, relying instead on their physical structure and positioning to guide and support the robot’s movement along the storage panels. The robot can travel along and between these rails using various pathways to reach a given item. Video 1 shows how RASCAL operates multiple robots on a single storage panel.

Video 1. Multiple robots in action

RASCAL utilizes a special rail geometry, allowing the robot to passively latch onto the rails with opposing wheels mounted on each end, as illustrated in Figure 1. This design ensures that the robot is securely held in place by gravity alone. The passive nature of this latching mechanism simplifies the process of adding or removing robots from the rails, as it does not require any tools or power.

Figure 1. The RASCAL prototype in a Silica library.

The robot features two rotating assemblies known as wings, each equipped with wheels that allow it to move horizontally. The wings rotate in a choreographed sequence to enable ascent and descent. RASCAL climbs by unlatching one wing from its current rail while remaining attached to the other. It then rotates and secures its free wing to a new rail either two levels up or down. This is shown in Video 2.

Video 2. RASCAL’s novel climbing maneuver.

Video 3. RASCAL performing a pick operation.

Video 3 demonstrates RASCAL’s item-selection system, or picker interface, which is designed to handle various robotic tool attachments for precise pick-and-place operations. This interface can rotate in alternating directions during climbs, ensuring that the robotic tool attachment, or end effector, remains oriented towards the shelving while stationary, preventing the cables from tangling.

Advancing robotics and automation

As digital economies grow, the need for efficient storage and retrieval systems becomes increasingly urgent. Breakthroughs in robotics technology are poised to drive productivity, efficiency, and innovation across numerous industries. Developments like RASCAL, with its flexible design and advanced capabilities, are leading the way for the next generation of robotics and automation.

Opens in a new tab

The post RASCAL: Novel robotics for scalable and highly available automated storage and retrieval appeared first on Microsoft Research.

Independent Non-Profit News

Exit mobile version

%%footer%%

Problem definition

Solution architecture

Code preparation and data preprocessing

Training pipeline and model deployment

Real-time recommendation inference

Recommendation model using NCF

MLOps component 1: Data preprocessing

MLOps component 2: Automated training and deployment of models

Event-based pipeline automation

SageMaker pipeline for training

Train the model

Create a model package group

Add a trained model to a model package group

Create a SageMaker model

Create a SageMaker endpoint

Create a SageMaker pipeline

MLOps component 3: Real-time inference with model serving

MLOps component 4: CI/CD structure

Conclusion

About the Authors

Thank you for sharing!

Learn more:

Subscribe to the Microsoft Research Podcast:

Transcript

Thank you for sharing!

Thank you for sharing!

Thank you for sharing!

Thank you for sharing!

Thank you for sharing!

NEW RESEARCH

Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning

NEW RESEARCH

A Reflection on Human-Notebook Experiences in the Era of AI

AI Frontiers: Models and Systems with Ece Kamar

NEW RESEARCH

Jacdac: Service-Based Prototyping of Embedded Systems

NEW RESEARCH

PARIKSHA: A Scalable, Democratic, Transparent Evaluation Platform for Assessing Indic Large Language Models

NEW RESEARCH

Tinker, Tailor, Configure, Customize: The Articulation Work of Customizing AI Fairness Checklists

NEW RESEARCH

MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels

VIDEO

AI Case Studies for Natural Science Research with Bonnie Kruft

Microsoft Research in the news

Thank you for sharing!

Advertiser creative experience

Solution overview

Conclusion

About the Authors

Thank you for sharing!

Optimized GEMM kernels

Enable the optimizations

Benchmark results

Benchmark setup

Conclusion

About the Author

Thank you for sharing!

Microsoft Research Forum

Best Paper Award recipients

Honorable Mentions

Thank you for sharing!

Amazon Bedrock

Solution overview

Prerequisites

Enable model access through Amazon Bedrock

Install the necessary packages

Register a DNS domain and create certificates

Chain-of-Thought (CoT) Prompting

Build the exam generator application

Build the front-end using Streamlit and Docker

Build solution components with AWS SAM

Test the solution

Expanding the solution

Clean up

Conclusion

About the Authors

Thank you for sharing!

Energy and Utility Companies are Ready for AI — Let’s Explore the Benefits.

Explore byteLAKE’s Data Insights: Fueling Efficiency, Sustainability, and Cost Reductions in Energy and Utility Sectors through AI Integration.

AI Explainer: Foundation models and the next era of AI