Scroll Top

Combining SageMaker Pipelines with SageMaker Projects

Combining SageMaker Pipelines with SageMaker Projects

Combining SageMaker Pipelines with SageMaker Projects

In the previous blog, we discussed SageMaker pipelines and how they can be implemented using notebooks in SageMaker Studio.

In this blog, we will see how pipelines can be combined with SageMaker projects and what benefits this approach provides. This blog is also centred around the deployment of the customer churn prediction model, as was the case before, but the implementation is done as part of projects in this case. To revisit the model and connect with Studio, please refer to ‘The Customer Churn Use Case’  and ‘Getting Connected with AWS SageMaker Studio’ sections from the previous blog.

The blog is organized as follows : 

  1. SageMaker Projects
  2. Customer Churn Model Deployment with Projects
  3. How do Projects add Value?
  4. Conclusion

1. SageMaker Projects

SageMaker projects are a product provided by AWS Service Catalog, which can be used for the creation of end-to-end ML solutions with CI/CD.

They come with a SageMaker-provided template, one or more repositories for the building and deployment of ML solutions, a SageMaker pipeline with steps for data preprocessing, training, evaluation and deployment, a CodePipeline pipeline which runs pipeline every time a new version of the code is checked in and a model group which contains model versions.

In the next section, we illustrate how these components are used together by taking up the deployment of the Customer Churn prediction model.

2. Customer Churn Model Deployment with Projects

The steps to create and deploy the customer churn prediction model as part of a SageMaker project are delineated below :

1. Connect to Amazon SageMaker Studio. As mentioned above, please refer to the previous blog for the detailed steps.

2. From the Components and registries menu (bottom-most option on the left bar), click on Projects and then ‘Create Project.’

Create Project

This opens the Create Project page, which lists a number of project templates with their description. 

The template we will be using for this project is ‘MLOps template for model building, training, and deployment’, as highlighted below.

MLOps template for model building

3. Choose ‘Select project template’ and enter a name for the project. In this case, the project is named ‘CustomerChurn’.

CustomerChurn

  This kicks off the creation of the project which gets completed in a short while

project which gets completed

Successfully - creating customer churn

4. The launched template consists of two repositories of sample code, and to update the code, the repositories need to be cloned to the local SageMaker instance. The hyperlinks can be clicked to do the same, as shown below for the two repositories.

Clone repositories

Clone repositories - 2

CustomerChurn - 3

The following architecture [1] is deployed once the project is created using the MLOps template chosen above.

AWS CodeCommit

    In this process,  these two repositories are added to AWS CodeCommit.

(a) Model Build

 This contains code for data processing, model training, evaluation, and accuracy-based conditional registration. A build specification file is also available and is used by AWS CodePipeline and AWS CodeBuild for ensuring automatic execution of the pipeline.

(b) Model Deploy 

This contains configuration files for deployment of the model, which are used by  CodePipeline and CodeBuild for creating endpoints in staging and production.              

Two Codepipeline pipelines are associated with these two repositories :

(a) Model Build

(b) Model Deploy

CodePipelines, in general, help in automating build, test and deploy phases of one’s release process, when there is a code change, based on the release model specified.

In this case, the two pipelines are invoked/automatically triggered  in the following scenarios :

(a) Model Build 

Whenever a new commit is made to the ModelBuild CodeCommit repository, the ModelBuild pipeline is automatically triggered and the pipeline is run from end-to-end.

(b) Model Deploy

When a new model is added to the model registry and its status is marked as ‘Approved’, the ModelDeploy pipeline is invoked and the model is deployed to a staging endpoint, and followed by another manual approval step, it gets deployed to a production endpoint.

5. Download the Customer Churn dataset available with SageMaker Sample Files and push to S3 repository for further use in the project. This code is taken from [1]. 

Code

6. Since this is the implementation of the customer churn use case, the abalone directory inside the Model Build Repo should be renamed to customer_churn.

customer_churn

7. The following path inside codebuild-buldspec.yaml :

should be updated to point to customer_churn, so as to locate the pipeline.py under the renamed folder.

Code 2

Code 3

8. The preprocess.py under customer_churn should be replaced with the customer churn preprocessing script available here. This script deals with feature engineering of the customer churn dataset.

Code 4

9. The pipeline steps are to be defined under pipeline.py and the existing script under customer_churn should now be replaced with the script available here .

Code 5

The “InputDataUrl” parameter in the script should be updated with the Amazon S3 URL in Step 1.

For example,  the Amazon S3 URL was generated as

s3://sagemaker-us-east-2-XXXXXXXXXXXXX/sagemaker/DEMO-xgboost-churn/data/RawData.csv in the author’s case.

Code 6

This will point the script to the churn dataset in the available S3 bucket.

10. Similarly, the evaluation.py code should also be updated with the evaluation script here.

11. Stage all the changes, commit and push them to CodeCommit as shown below. You will be required to enter your name and email address while committing. 

Churn

Churn 1

Who is committing

Comitted Changes

Push comitted Changes

Git Push

The Model Build Repo gets updated herewith.

Model Build Repo

The implementation has an Amazon Eventbridge event, which monitors for commits. Once the changes are committed and pushed to the CodeCommit repository, the build process starts for the Model Build pipeline.

Amazon Eventbridge

12. Go to SageMaker Resources – Pipelines and see a pipeline getting created and executed.

SageMaker Resources - Pipelines

Sagemaker Resources

At the end of the build process, if the condition specified in the pipeline holds true, the model is pushed to the model registry.

13. To check whether the model is pushed to the Model registry, go to ‘Model registry’ under SageMaker Resources.

Customer churn - ppp

The model is pushed to the Registry because its accuracy is greater than 0.8 (the condition specified in pipeline.py). This can be verified by clicking on the model version (or anywhere in the first row), and checking the Model quality tab. The accuracy value is 0.948.

Customer churn - pp

14. Since the ModelApprovalStatus is set to PendingManualApproval, the model has to be manually approved first before it makes its way to deployment.

As the model satisfies our requirement, we can push it to Staging by clicking on ‘Update Status’. 

Update Model - pending

Update Model - Approved

The status will be updated from Pending to Approved.

Successfully - Approved

Once the model is approved, the deployment pipeline is triggered and the model is deployed to staging.

deployment pipeline

Deploy Staging

This creates an endpoint for the model in Staging

The endpoint can be located under ‘SageMaker Resources’ – ‘Endpoints’.

Endpoint Details

15. The model needs to be approved once again before deployment to Production. This can be achieved by clicking on ‘Review’ under ApproveDeployment. 

ApproveDeployment

Review

This kicks off the DeployProd process, and the model gets deployed to Production.

deployed to Production

deployed to Prod

This concludes the deployment of the model in Production and creation of the associated endpoint.

deployment of the model in Production and creation

 

3. How do Projects add Value?

SageMaker pipelines can be used independently to create ML workflows in notebooks but it becomes difficult to collaborate in a team of data scientists and ML engineers.

SageMaker projects, on the other hand, enable version control and make for code consistency and efficient collaboration across teams.

Projects also provide MLOps templates which provision the underlying resources needed for CI/CD capabilities. They also allow a trigger based approach (like one based on code change) for model generation, unlike pipelines where the notebook cells have to be executed individually for the steps to be executed. 

4. Conclusion

SageMaker projects provide an easy and effective way of creating an end-to-end ML solution, and allow for version control, code consistency and efficient collaboration between different teams.  

Taken together, this facilitates creation, automation and end-to-end management of ML workflows at scale, leading to faster productionization of ML models.

If you’d like to learn more about these services or need assistance, let us know and we’ll reach out to you. Know More.

References :

  1. Building, automating, managing, and scaling ML workflows using Amazon SageMaker Pipelines
  2. Amazon SageMaker Samples

Related Posts

Leave a comment

Privacy Preferences
When you visit our website, it may store information through your browser from specific services, usually in form of cookies. Here you can change your privacy preferences. Please note that blocking some types of cookies may impact your experience on our website and the services we offer.