Deploy the Hugging Face (PyAnnote) speaker classification model as an asynchronous endpoint on Amazon SageMaker

Speaker classification is a vital course of in audio evaluation, which segments audio information based mostly on speaker identification. This text takes an in-depth have a look at integrating Hugging Face’s PyAnnote with the Amazon SageMaker asynchronous endpoint for speaker classification.

We offer complete steerage on the best way to deploy speaker segmentation and clustering options utilizing SageMaker on the AWS Cloud. You need to use this answer for purposes that deal with multi-speaker (greater than 100) recordings.

Resolution overview

Amazon Transcribe is the premier service for speaker classification in AWS. Nonetheless, for unsupported languages, you are able to do inference utilizing one other mannequin (PyAnnote in our instance) that shall be deployed in SageMaker. For brief audio information with inference occasions as much as 60 seconds, you should utilize real-time inference. For instances longer than 60 seconds, asynchronous inference must be used. An added good thing about asynchronous inference is value financial savings by mechanically scaling the occasion depend to zero when there aren’t any requests to course of.

Hugging Face is a well-liked open supply hub for machine studying (ML) patterns. AWS and Hugging Face have shaped a partnership that enables seamless integration by way of SageMaker with a set of AWS Deep Studying Containers (DLCs) for coaching and inference in PyTorch or TensorFlow, in addition to the Hugging Face estimator for the SageMaker Python SDK and predictors. SageMaker options and capabilities make it simple for builders and information scientists to get began with pure language processing (NLP) on AWS.

Integration of this answer includes utilizing Hugging Face’s pretrained speaker binarization mannequin (utilizing the PyAnnote library). PyAnnote is an open supply toolkit written in Python for speaker classification. This mannequin is educated on a pattern audio dataset to attain environment friendly speaker partitioning in audio information. The mannequin is deployed on SageMaker as an asynchronous endpoint setting, offering environment friendly and scalable binary job processing.

The diagram beneath reveals the structure of the answer.

For this text, we used the next audio information.

Stereo or multi-channel audio information are mechanically downmixed to mono by averaging the channels. Audio information sampled at completely different charges are mechanically resampled to 16kHz when loaded.

conditions

Full the next conditions:

Create a SageMaker area.
Make sure that your AWS Id and Entry Administration (IAM) consumer has the required entry permissions to create the SageMaker function.
Be sure that the AWS account has service quota for the SageMaker endpoint internet hosting the ml.g5.2xlarge execution occasion.

Create a mannequin operate for accessing PyAnnote speaker binarization from Hugging Face

You need to use Hugging Face Hub to entry the pretrained PyAnnote speaker binarization mannequin you want. While you create a SageMaker endpoint, you should utilize the identical script to obtain the mannequin archive.

Please have a look at the next code:

from PyAnnote.audio import Pipeline

def model_fn(model_dir):
# Load the mannequin from the desired mannequin listing
mannequin = Pipeline.from_pretrained(
"PyAnnote/speaker-diarization-3.1",
use_auth_token="Exchange-with-the-Hugging-face-auth-token")
return mannequin

Encapsulated mannequin code

Put together the required information, corresponding to inference.py, which comprise the inference code:

%%writefile mannequin/code/inference.py
from PyAnnote.audio import Pipeline
import subprocess
import boto3
from urllib.parse import urlparse
import pandas as pd
from io import StringIO
import os
import torch

def model_fn(model_dir):
    # Load the mannequin from the desired mannequin listing
    mannequin = Pipeline.from_pretrained(
        "PyAnnote/speaker-diarization-3.1",
        use_auth_token="hf_oBxxxxxxxxxxxx)
    return mannequin 


def diarization_from_s3(mannequin, s3_file, language=None):
    s3 = boto3.shopper("s3")
    o = urlparse(s3_file, allow_fragments=False)
    bucket = o.netloc
    key = o.path.lstrip("/")
    s3.download_file(bucket, key, "tmp.wav")
    end result = mannequin("tmp.wav")
    information = {} 
    for flip, _, speaker in end result.itertracks(yield_label=True):
        information[turn] = (flip.begin, flip.finish, speaker)
    data_df = pd.DataFrame(information.values(), columns=["start", "end", "speaker"])
    print(data_df.form)
    end result = data_df.to_json(orient="cut up")
    return end result


def predict_fn(information, mannequin):
    s3_file = information.pop("s3_file")
    language = information.pop("language", None)
    end result = diarization_from_s3(mannequin, s3_file, language)
    return {
        "diarization_from_s3": end result
    }

put together one necessities.txt File containing the Python libraries required to run inference:

with open("mannequin/code/necessities.txt", "w") as f:
    f.write("transformers==4.25.1n")
    f.write("boto3n")
    f.write("PyAnnote.audion")
    f.write("soundfilen")
    f.write("librosan")
    f.write("onnxruntimen")
    f.write("wgetn")
    f.write("pandas")

Lastly, compression inference.py and necessities.txt file and reserve it as mannequin.tar.gz:

Configure the SageMaker mannequin

Outline a SageMaker mannequin useful resource by specifying the picture URI, the situation of the mannequin information in Amazon Easy Storage Service (S3), and the SageMaker function:

import sagemaker
import boto3

sess = sagemaker.Session()

sagemaker_session_bucket = None
if sagemaker_session_bucket is None and sess is just not None:
    sagemaker_session_bucket = sess.default_bucket()

strive:
    function = sagemaker.get_execution_role()
besides ValueError:
    iam = boto3.shopper("iam")
    function = iam.get_role(RoleName="sagemaker_execution_role")["Role"]["Arn"]

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

print(f"sagemaker function arn: {function}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session area: {sess.boto_region_name}")

Add mannequin to Amazon S3

Add the compressed PyAnnote Hugging Face mannequin file to the S3 bucket:

s3_location = f"s3://{sagemaker_session_bucket}/whisper/mannequin/mannequin.tar.gz"
!aws s3 cp mannequin.tar.gz $s3_location

Create a SageMaker asynchronous endpoint

Configure the asynchronous endpoint to deploy the mannequin on SageMaker utilizing the offered asynchronous inference configuration:

from sagemaker.huggingface.mannequin import HuggingFaceModel
from sagemaker.async_inference.async_inference_config import AsyncInferenceConfig
from sagemaker.s3 import s3_path_join
from sagemaker.utils import name_from_base

async_endpoint_name = name_from_base("custom-asyc")

# create Hugging Face Mannequin Class
huggingface_model = HuggingFaceModel(
    model_data=s3_location,  # path to your mannequin and script
    function=function,  # iam function with permissions to create an Endpoint
    transformers_version="4.17",  # transformers model used
    pytorch_version="1.10",  # pytorch model used
    py_version="py38",  # python model used
)

# create async endpoint configuration
async_config = AsyncInferenceConfig(
    output_path=s3_path_join(
        "s3://", sagemaker_session_bucket, "async_inference/output"
    ),  # The place our outcomes shall be saved
    # Add nofitication SNS if wanted
    notification_config={
        # "SuccessTopic": "PUT YOUR SUCCESS SNS TOPIC ARN",
        # "ErrorTopic": "PUT YOUR ERROR SNS TOPIC ARN",
    },  #  Notification configuration
)

env = {"MODEL_SERVER_WORKERS": "2"}

# deploy the endpoint endpoint
async_predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.xx",
    async_inference_config=async_config,
    endpoint_name=async_endpoint_name,
    env=env,
)

Take a look at endpoint

Consider endpoint performance by sending audio information for classification and retrieving the JSON output saved in a specified S3 output path:

# Exchange with a path to audio object in S3
from sagemaker.async_inference import WaiterConfig
res = async_predictor.predict_async(information=information)
print(f"Response output path: {res.output_path}")
print("Begin Polling to get response:")

config = WaiterConfig(
  max_attempts=10, #  variety of makes an attempt
  delay=10#  time in seconds to attend between makes an attempt
  )
res.get_result(config)
#import waiterconfig

To deploy this answer at scale, we advocate utilizing AWS Lambda, Amazon Easy Notification Service (Amazon SNS), or Amazon Easy Queue Service (Amazon SQS). These providers are designed for scalability, event-driven structure, and environment friendly useful resource utilization. They assist decouple asynchronous inference processes from end result processing, permitting you to scale every part independently and deal with bursts of inference requests extra effectively.

end result

Mannequin output is saved in s3://sagemaker-xxxx /async_inference/output/. The output reveals that the audio recording has been divided into three columns:

Begin (begin time in seconds)
Finish (finish time in seconds)
Speaker (speaker tag)

The next code reveals an instance of our outcomes:

[0.9762308998, 8.9049235993, "SPEAKER_01"]

[9.533106961, 12.1646859083, "SPEAKER_01"]

[13.1324278438, 13.9303904924, "SPEAKER_00"]

[14.3548387097, 26.1884550085, "SPEAKER_00"]

[27.2410865874, 28.2258064516, "SPEAKER_01"]

[28.3446519525, 31.298811545, "SPEAKER_01"]

clear up

You’ll be able to set MinCapacity to 0 to set the scaling coverage to zero; asynchronous inference enables you to mechanically scale to zero with no request. You need not delete the endpoint, it scales up from scratch when it is wanted once more, decreasing prices when not in use. Please have a look at the next code:

# Widespread class representing software autoscaling for SageMaker 
shopper = boto3.shopper('application-autoscaling') 

# That is the format wherein software autoscaling references the endpoint
resource_id='endpoint/' + <endpoint_name> + '/variant/' + <'variant1'> 

# Outline and register your endpoint variant
response = shopper.register_scalable_target(
    ServiceNamespace="sagemaker", 
    ResourceId=resource_id,
    ScalableDimension='sagemaker:variant:DesiredInstanceCount', # The variety of EC2 situations to your Amazon SageMaker mannequin endpoint variant.
    MinCapacity=0,
    MaxCapacity=5
)

If you wish to delete the endpoint, use the next code:

async_predictor.delete_endpoint(async_endpoint_name)

Advantages of asynchronous endpoint deployment

This answer has the next benefits:

The answer can effectively deal with a number of or giant audio information.
This instance makes use of a single occasion for demonstration. If you wish to use this answer for a whole bunch or 1000’s of movies and use non-synchronized endpoints for processing throughout a number of situations, you should utilize the autoscaling technique, which is designed for big quantities of supply information. Autoscaling dynamically adjusts the variety of situations provisioned for a mannequin in response to adjustments in workload.
The answer optimizes sources and reduces system load by separating long-running duties from real-time inference.

in conclusion

On this article, we offer a easy approach to deploy Hugging Face’s speaker binarization mannequin on SageMaker utilizing a Python script. Utilizing asynchronous endpoints offers an environment friendly and scalable manner to supply classification predictions as a service, seamlessly adapting to concurrent requests.

Begin classifying asynchronous audio system to your audio initiatives at the moment. In case you have any questions on establishing and working your personal asynchronous binarization endpoint, please get in contact within the feedback.

Concerning the writer

Sanjay Tiwari is an professional AI/ML options architect who spends time working with strategic clients to outline enterprise necessities, ship L300 periods round particular use instances, and design AI/ML purposes and providers which can be scalable, dependable, and performant. He helped launch and scale the AI/ML-powered Amazon SageMaker service and applied a number of proof-of-concepts utilizing the Amazon AI service. As a part of his digital transformation journey, he additionally developed a sophisticated analytics platform.

Kiran Chalapalli Is a deep technical enterprise developer in AWS Public Sector. He has over 8 years of expertise within the AI/ML area and 23 years of total software program growth and gross sales expertise. Kiran assists public sector enterprises throughout India in exploring and co-creating cloud-based options that use synthetic intelligence, machine studying and methods to generate synthetic intelligence (together with giant language fashions).

Source link

Deploy the Hugging Face (PyAnnote) speaker classification model as an asynchronous endpoint on Amazon SageMaker

Mistral-NeMo-Instruct-2407 and Mistral-NeMo-Base-2407 are now available on SageMaker JumpStart

Promote AI trust through new responsible AI tools, capabilities and resources

Amazon Bedrock Marketplace Now Includes NVIDIA Models: NVIDIA Nemotron-4 NIM Microservices Launched

Query structured data from Amazon Q Business using Amazon QuickSight integration

Leave A Reply Cancel Reply

Deploy the Hugging Face (PyAnnote) speaker classification model as an asynchronous endpoint on Amazon SageMaker

Resolution overview

conditions

Create a mannequin operate for accessing PyAnnote speaker binarization from Hugging Face

Encapsulated mannequin code

Configure the SageMaker mannequin

Add mannequin to Amazon S3

Create a SageMaker asynchronous endpoint

Take a look at endpoint

end result

clear up

Advantages of asynchronous endpoint deployment

in conclusion

Concerning the writer

Related Posts

Mistral-NeMo-Instruct-2407 and Mistral-NeMo-Base-2407 are now available on SageMaker JumpStart

Promote AI trust through new responsible AI tools, capabilities and resources

Amazon Bedrock Marketplace Now Includes NVIDIA Models: NVIDIA Nemotron-4 NIM Microservices Launched

Query structured data from Amazon Q Business using Amazon QuickSight integration

Leave A Reply Cancel Reply