Reduce Amazon SageMaker costs by shutting down notebook instances automatically

Amazon SageMaker is a powerful yet costly service. I’ve had projects where it accounted for two-thirds of the AWS bill. A less-than-obvious issue is that a SageMaker notebook instance incurs costs all the time that it is running, and it does not shut down automatically, when you stop working with it — at night, for example, though you schedule may vary.

It’s a good practice to remember to always shut down your Amazon SageMaker notebook instance, but if you want additional security, set up a scheduled shutdown. SageMaker does not provide this OOTB, but you can achieve this with the following steps:

  1. Write a simple Lambda function that stops the notebook via AWS SDK,
  2. Set up a Step Function that filters incoming events to those triggered by your notebook instance and instance status “InService”, waits for a specified amount of time (say, 8–10 hours) and triggers your Lambda function.
  3. Set up a CloudWatch event rule that triggers an event any time that a SageMaker notebook instance changes its state. Set your newly created step function as its target.

You will need the Step Function for two reasons: To delay the execution of your Lambda by a specific interval, and because you cannot set up the CloudWatch event rule in a more fine-grained way; it will always trigger on all notebook instances and all state changes.

Before we look at each of the steps, let’s set up the necessary IAM role.

The IAM part

You may go for a separate role for each of the three steps, but I prefer to unite them into one. It would need the following assume policy:

{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": [
"lambda.amazonaws.com",
"events.amazonaws.com",
"states.amazonaws.com"
]
},
"Effect": "Allow",
"Sid": ""
}
]
}

The function will need just three permissions:

  • sagemaker:StopNotebookInstance for the Lambda function
  • lambda:InvokeFunction for your Step Function state machine
  • states:StartExecution for your CloudWatch event rule

That’s it, you don’t need any permissions to list or describe anything.

Killer Lambda function

Create a Lambda function that uses AWS SDK to stop your notebook instance. For Node.js, its code looks like this:

const AWS = require("aws-sdk");
const sagemaker = new AWS.SageMaker({apiVersion: '2017-07-24'});
module.exports.handler = (event, context, callback) => {
sagemaker.stopNotebookInstance(
{ NotebookInstanceName: 'your-instance-name' },
(err, data) => {
if (err) return callback(null, err);
return callback(null, err);
}
);
};

Step Function state machine

Create a state machine with the following definition:

{
"Comment": "A state machine that shuts down notebook your-notebook-name after 10 hours",
"StartAt": "Filter",
"States": {
"Filter": {
"Type": "Choice",
"Choices": [
{
"And": [
{
"Variable": "$.NotebookInstanceName",
"StringEquals": "your-notebook-name"
},
{
"Variable": "$.NotebookInstanceStatus",
"StringEquals": "InService"
}
],
"Next": "Wait"
}
],
"Default": "DoNothing"
},
"Wait": {
"Type": "Wait",
"Seconds": 36000,
"Next": "Invoke Shutdown"
},
"Invoke Shutdown": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "your-function-arn"
},
"End": true
},
"DoNothing": {
"Type": "Pass",
"End": true
}
}
}

I’ve highlighted in bold the parameters that you will need to adjust. Note that the step invoking the Lambda function requires a parameter called FunctionName, but it is the ARN, not the name of the Lambda function that you will need to provide. The step DoNothing is necessary because you need to point the choice step to next step even if the condition is not fulfilled.

CloudWatch event rule

Create an event rule with the event type “SageMaker Notebook Instance State Change” as source type and your step function state machine as a target. For input configuration, choose “Part of the matched event” and enter $.detail to narrow down the input to the relevant part. You may choose “Matched event”, but then you would need to adjust the two Variable attributes in the step named Filter.

That’s it, you’re all set.

Image courtesy: Free-Photos from Pixabay

Consultant at Deloitte, doing Cloud, Data Science and Web Dev