Machine Learning and Predictive Scaling

Machine Learning is a branch of artificial intelligence (AI) that focuses on the use of data and algorithms to provide predictive analytics, which can gradually self-improve its predicting accuracy over the learning course. Applying Machine Learning in a wide variety of use cases has become a fast growing phenomenon in recent years. This includes auto scaling in cloud computing. 

Auto scaling, in cloud computing, refers to the capability of scaling the compute resources as per the demand at the time – when the compute needed is higher than the system is currently running, the system scales out (adds more) compute resources to better share the load; when the needed compute is lower, the system scales in (reduces the capacity of) the compute for the benefits both financially and environmentally.

Some metrics are used to indicate the level of needed compute. For example, CPU utilisation is one of the common metrics used. Let’s use Amazon EC2 compute service as an example. Say there are currently three EC2 instances running in an auto scaling group. The overall load is distributed across them. We setup a particular configuration so that when these instances’ CPU utilisation is going over 80% due to an increased load, the system will be triggered to add additional instances to the auto scaling group. Then the load will be distributed among more instances and each instance will settle with less load, and less CPU utilisation accordingly. This way, the stress level in each instance is lowered and further increases of the load can also be accommodated. On the other hand, when the overall load becomes lighter, and the CPU utilisation is going below 60%, then the system will reduce the number of running instances. Such auto scaling is called dynamic scaling. Dynamic scaling passively reacts to load changing situations. 

Scheduled scaling is another type of scaling. You manually schedule the times when additional instances are fired up (added to the auto scaling group) and the times when the number of instances will be reduced. This is typically because you (a human) have figured out some ways to predict the load demand of a known workload. For example, based on observation over the past a few weeks, you, know, that the load usually starts increasing from 7am on Monday and only start to decrease from 7pm on that day. It will be so again on Tuesdays and Wednesdays. Then for Thursdays and Fridays, the pattern continues but with higher load levels. As such, you go ahead to manually schedule the auto scaling so that an additional three EC2 instances are launched at 7 am on Mondays, Tuesdays and Wednesdays, and the same number of instances is removed at 7pm on these days, respectively. Then for Thursdays and Fridays, instead of three, you configure it to be five additional instances, to handle the even higher load on these days of the week. 

Using Machine Learning in auto scaling adds a (machine based) intelligent predicative dimension. The Machine Learning algorithm will observe historical system data, apply analysis and provide predictive decisions on when to scale out and when to scale in the Amazon EC2 instances. E.g., humans do not need to configure the above-mentioned scheduled scaling (which are proactive but non intelligent, as they are pre-defined and static). AWS calls this type of Machine Learning based auto scaling Predictive Scaling.  

The benefits of machine learning enabled Predictive Scaling are typically on solving the following issues: 

Quite often, the applications running on EC2s have different characteristics during powering up and powering down times. For example, certain applications can take minutes or tens of minutes to initialise. So when the dynamic scaling algorithm (as mentioned just now, it is reactive in nature) is triggered by the defined event (e.g., the CPU utilisation has reached 80%), the additional EC2 instances may still be ten or twenty minutes away from being in service. During these periods, the load may continue to increase and some users would experience outage. There are ways to solve such issues, but that would involve configuring the metric triggers to be more conservative (like, using 70% instead of 80% utilisation as the scale out trigger while using 50% instead of 60% as the scale in trigger). The efficiency of auto scaling would be greatly reduced, due to such a static, reactive approach. (It is actually quite ironic that dynamic scaling can be so static.) Predictive Scaling can handle these scenarios much better as it takes a proactive approach. 

Using the traditional scheduled scaling can also mitigate the dynamic-scaling-delay issues, but Predictive Scaling is more automatic and more intelligent than the traditional scheduled scaling. You do not need to manually configure the schedules, which can be different on everyday of the week, and the Machine Learning algorithm continues to learn from the evolving systems data so will adjust the mechanism when the load patterns change over time.

It is also worth mentioning that the benefits of Predictive Scaling will continue to expand with further progress in the Machine Learning space. 

At the same time, the following aspects or limitations need to be observed with Predictive Scaling at this stage:


The metric data in CloudWatch is used for the machine learning. Predictive Scaling uses data from the previous 14 days to create an hourly forecast for the next 48 hours. 


Forecast data is updated every six hours. The most recent CloudWatch metric data is used for the forecast update.


A minimum of 24 hours’ worth of historical data is required for any prediction to be given – if that CloudWatch metric is only just enabled, it takes 24 hours before Predictive Scaling to start giving predictions.


Cyclical load patterns are particularly suitable for adopting Predictive Scaling, e.g., periods of highs and lows during a day or a week. They can have their own dynamics, like various peak levels on different days of the week. On the other hand, totally random load patterns will not suit Predictive Scaling.


Once (or multiple times) a week or a fortnight workload patterns, such as batch processing or testing jobs, are particularly suitable for Predictive Scaling. But if the event is not repeated frequently enough, then it would not be learned as a pattern.


The predicted capacity will be actioned on at the beginning of the hour (currently it is not granular to the minutes level – the proactive action of adding / removing instances will be at 7am, or 8am, or whatever the hour, but will not be possible to be at 7:38, or 18:55…etc. So even if the analysis of the CloudWatch data suggests that 7:38 would be the perfect time to add more instances, the Predictive Scaling mechanism will do it at the beginning of the hour – 7am in this example). 


Use the same capacity EC2 instances. Predictive Scaling relies on metrics recorded in CloudWatch to analyse and learn. If different capacities of compute are used then the values of the metric will confuse and skew the learning (for example, m7g.large is a type of EC2 instance that uses the powerful AWS Graviton3 processors. A CPU utilisation of 70% on a m7g.large EC2 instance will mean totally different things to a 70% utilisation on a t3.large type of Ec2 instance, which uses a less powerful CPU type). 


It is a good idea to start Predictive Scaling with the Forecast Only mode – i.e., though a Predictive Scaling Policy is defined, it does not actually take action on scaling, only generates forecasts for review. Then you (human) can adjust the parameters. 

AWS offers the following configuration file as an example:


cat <<EoF > predictive-scaling-policy-cpu.json

{

    "MetricSpecifications": [

        {

            "TargetValue": 25,

            "PredefinedMetricPairSpecification": {

                "PredefinedMetricType": "ASGCPUUtilization"

            }

        }

    ],

    "Mode": "ForecastOnly"

}

EoF



This json format configuration file will produce forecasts based on CPU Utilisation, with each instance handling 25% of the average hourly CPU utilisation for the Auto Scaling group.

The following command can then be used to add the predictive scaling policy to the Auto Scaling group:


aws autoscaling put-scaling-policy \

    --auto-scaling-group-name "Example Application Auto Scaling Group" \

    --policy-name "CPUUtilizationpolicy" \

    --policy-type "PredictiveScaling" \

    --predictive-scaling-configuration file://predictive-scaling-policy-cpu.json


Say after a period of Forecast Only run, you are happy with the way Predictive Scaling would work, so would like to adjust the CPU utilisation parameter to 20% and enable Predictive Scaling in action. You can edit the json file to be:


cat <<EoF > predictive-scaling-policy-cpu.json

{

    "MetricSpecifications": [

        {

            "TargetValue": 20,

            "PredefinedMetricPairSpecification": {

                "PredefinedMetricType": "ASGCPUUtilization"

            }

        }

    ],

    "Mode": "ForecastAndScale"

}

EoF


(Note the mode specified is Forecast And Scale now.) 

Rerun the ‘aws autoscaling put-scaling-policy’ command to enable this Predictive Scaling policy in action. 


AWS also introduced the capability of giving recommendations on enabling the Forecast and Scale mode. Instead of a human manually considering the accuracy and the potential impact of predictive scaling after observing the behaviours of the Forecast Only Mode, you get recommendation from the Machine Learning algorithm on whether Predictive Scaling does a better job at optimising the EC2 capacity than the existing scaling configuration, based on up to eight weeks of past data. And the recommendation rationale is quite logical:


If the Predictive Scaling policy will mean an availability increase (or it is maintained at the same level) AND cost reduction (or maintained at the same level), then the recommendation is: switch on the Forecast and Scale mode

If availability reduces, then disable predictive scaling

If the Predictive Scaling Policy would increase both availability and cost, then the humans should decide based on their situation / considerations


There is no doubt that AWS Predictive Scaling will continue to evolve. New features will be added, limitations will be improved, and use cases broadened.  


                                                                                                                            -- Simon Wang    

 

Comments

Popular posts from this blog

Fairness Evaluation and Model Explainability In AI

AWS and Generative AI

Amazon CloudFront and Its Primary and Secondary Origins