Machine learning requires huge data sets — as well as processing power and other computing resources — to train models, perform tasks, and solve problems. The data and resource requirements only grow larger as the ML application gets smarter and more sophisticated. Because scaling often means expensive hardware upgrades and additions, organizations often struggle to host their machine learning models on traditional infrastructure.
In addition, as the machine learning field continues to advance, more and more computing power will be required to stay current and competitive. Even if a business currently meets its ML infrastructure requirements, it may soon find itself left behind the curve if they don’t invest in more processing power or adopt a cloud native approach.
Cloud native architectures address these challenges by providing scalable infrastructure on which to deploy machine learning models. Cloud native machine learning applications run in a serverless, elastic environment where resources are available on-demand as processing and data needs fluctuate. You can develop, train, and deploy your ML models with fewer limitations.
Another benefit of adopting a cloud native approach is that it makes it easier to apply DevOps practices and principles – such as CI/CD – to your ML projects. CI/CD (continuous integration/continuous delivery) uses automation to streamline development and allow for constant and simultaneous collaboration. This can help you deploy more sophisticated machine learning models faster. It also brings you a big step closer to creating a fully integrated team of developers, engineers, testers, and data scientists, all working together to accomplish business goals.
In this blog, we’ll describe how to use CI/CD tools and practices to streamline the deployment of cloud native machine learning models.
Continuous integration/continuous delivery is a methodology used by DevOps organizations to streamline the various steps involved in code releases. The goal of CI/CD is to frequently merge code changes, constantly test and validate code, automatically move code through environments, and continuously learn and improve your processes. A cloud native CI/CD pipeline is the collection of tools to continuously integrate and deliver microservices applications in a containerized cloud-based environment. Let’s examine how you can use these tools to deploy, develop, and train data science models for cloud native machine learning applications.
Cloud native continuous integration (CI) tools cover a variety of functions, including source code version control, integration testing, and build automation.
Cloud native source code repositories allow team members to collaborate on machine learning infrastructure and code from anywhere in the world. CI version control allows engineers to frequently merge changes without stepping on anyone’s toes and ensures that mistakes can be quickly undone as needed. The repository acts as a single source of truth for an ML project, reducing the risk of miscommunication and misunderstandings.
Automated integration tests run on all new code that’s checked into the repository. This ensures that updates and changes won’t introduce any bugs that break the existing code base. This is what allows for truly continuous integration because you can merge changes as frequently as you want without waiting for manual testing or reducing the quality of your machine learning model.
Build automation tools for cloud native machine learning applications will package up the source code into model artifacts to be delivered to the next stage in the pipeline.
Cloud native continuous delivery (CD) tools automatically move ML artifacts to the testing, staging, and production environments in your CI/CD pipeline. CD tools allow you to create programmatic gates or quality thresholds that artifacts must meet before they move to the next step of the pipeline, ensuring that continuous code delivery doesn’t reduce overall quality or functionality.
After development, ML artifacts are delivered to a cloud native testing or QA environment. For cloud native machine learning applications, pull requests should go through code quality checks and smoke tests using production-like runs in the test environment. That means running some small chunk of real data through your model to ensure that the ML application produces the expected results without anything breaking. CI/CD test automation allows you to use the same testing methods on all ML code, ensuring consistent quality. It also reduces bottlenecks to perform comprehensive testing without slowing down releases.
Once machine learning code is integrated, packaged, delivered, and tested, it’s time to deploy it to your production infrastructure. In many pipelines, a human validator needs to review test results and approve the finished code before it’s manually deployed. Other CI/CD pipelines include an extra stage, called continuous deployment, which automates this process. Continuous deployment tools typically provide a final quality test or validation step that runs automatically before the model is deployed to production. This is particularly effective for small, urgent updates like hot fixes.
In DevOps and CI/CD, the development cycle doesn’t end on release day. After a machine learning model is deployed to production, it must be monitored for things like data drift and concept drift, as well as bugs and performance issues.
Data drift and concept drift are risks in any machine learning project. Ideally, you’ll use the highest quality data to train it. However, once the model is deployed, there’s a risk of model decay, where the prediction power drops and model performance decreases. This mostly occurs because we live in an ever-changing world, and data measures those changes.
More specifically, the cause could be data drift (when new data is very different from the data used to train and build the model) or concept drift (when the prediction target changes). Monitoring enables you to detect changes in the data and make adjustments to the ML model to maintain a high level of accuracy.
Automatic issue detection and alerts will ensure that problems are addressed by the appropriate team member as soon as possible. Automation issue remediation can also be used to resolve common problems, further shortening feedback loops. When implemented effectively, potential problems are found and resolved before a business impact. For example, outlier data and unusual values that might impact a model can be detected early with an anomaly detection algorithm.
In addition, you should monitor the CI/CD pipeline itself to ensure it’s functioning optimally. This will allow you to spot broken or inefficient tools and processes and identify opportunities for training and advancement among team members. You can apply what you’ve learned in the monitoring stage to the next iteration, continuously improving your cloud native machine learning application and CI/CD pipeline.
A CI/CD pipeline allows you to create a streamlined and collaborative development cycle that delivers high-quality machine learning models in cloud native environments, allowing you to take advantage of the flexibility and scalability of serverless architectures to run more sophisticated applications.