
Essential Kubeflow
Engineering ML Workflows on Kubernetes
- 1st Edition - May 1, 2026
- Latest edition
- Authors: Prashanth Josyula, Sonika Arora, Anant Kumar, Jivitesh Poojary
- Language: English
- Paperback ISBN:9 7 8 - 0 - 4 4 3 - 4 5 2 5 4 - 3
- eBook ISBN:9 7 8 - 0 - 4 4 3 - 4 5 2 5 5 - 0
Essential Kubeflow: Engineering ML Workflows on Kubernetes equips readers with the tools to transform ML workflows from experimental notebooks to production-ready platforms with t… Read more
Purchase options

- Provides readers with a comprehensive step-by-step guide to building reliable ML pipelines with automated workflows, testing, and deployment using Kubeflow's pipeline components.
- Includes clear strategies for monitoring ML workloads, managing resources, handling multi-user environments, and maintaining production platforms at scale.
- Presents proven solutions and architectural patterns drawn from actual production deployments, showing readers how to avoid common pitfalls and accelerate ML initiatives.
1. Kubernetes Essentials for ML Engineers
1.2. Container Fundamentals and Docker Basics
1.3. Kubernetes Architecture Overview
1.4. Key Concepts: Pods, Deployments, Services
1.5. Resource Management and Scheduling
1.6. StatefulSets and Persistent Storage
1.7. Networking and Service Discovery
2. Getting Started with Kubeflow
2.1. Understanding ML Platforms and MLOps
2.2. Kubeflow Architecture and Components
2.3. Installation and Environment Setup
2.4. Multi-user Management Basics
2.5. Platform Security Fundamentals
Part II: Building ML Workflows
3. Understanding Kubeflow Pipelines
3.1. Pipeline Architecture Fundamentals
3.2. The Pipeline SDK and DSL
3.3. Building Your First Pipeline
3.4. Pipeline Components and Artifacts
3.5. Pipeline Execution and Debugging
4. Advanced Pipeline Development
4.1. Designing Reusable Components
4.2. Managing Pipeline Parameters
4.3. Error Handling Strategies
4.4. Pipeline Versioning and Storage
4.5. CI/CD Integration Patterns
5. Experimentation with Notebooks
5.1. JupyterHub in Kubeflow
5.2. Managing Notebook Servers
5.3. Resource Allocation and Quotas
5.4. Persistent Storage Configuration
5.5. From Notebooks to Pipelines
Part III: Model Development and Training
6. Training at Scale
6.1. Understanding Training Operators
6.2. Distributed Training Basics
6.3. TensorFlow Training on Kubeflow
6.4. PyTorch Training on Kubeflow
6.5. Resource Management for Training
7. Hyperparameter Tuning with Katib
7.1. Experiment Configuration
7.2. Defining Search Spaces
7.3. Understanding Search Algorithms
7.4. Managing Training Trials
7.5. Analyzing Experiment Results
Part IV: Model Deployment
8. Serving Models with KServe
8.1. KServe Architecture Overview
8.2. Model Server Deployment
8.3. Inference Service Configuration
8.4. Model Updates and Versioning
8.5. Performance Monitoring
9. Production Operations
9.1. Monitoring ML Workloads
9.2. Resource Management
9.3. Security Best Practices
9.4. Platform Maintenance
9.5. Troubleshooting Guide
Part V: Enterprise Implementation
10. Production Best Practices
10.1. Building Enterprise ML Platforms
10.2. Multi-tenant Architecture Design
10.3. Scaling Strategies and Patterns
10.4. Cost Optimization Techniques
10.5. Team Collaboration Models
11. Platform Integration and Ecosystem
11.1. Integrating with Data Lakes
11.2. CI/CD Pipeline Integration
11.3. Monitoring Stack Integration
11.4. External Model Registry Systems
11.5. Cloud Provider Integrations
- Edition: 1
- Latest edition
- Published: May 1, 2026
- Language: English
PJ
Prashanth Josyula
Prashanth Josyula is a seasoned IT professional based in San Francisco, USA, with over 16 years of industry experience spanning enterprise software engineering, artificial intelligence, and cloud-native infrastructure. He specializes in AI/ML systems, Kubernetes, MLOps, and service mesh technologies, and has consistently contributed to building intelligent, scalable, and resilient platforms that power next-generation applications.
In his current role as a Principal Member of Technical Staff (PMTS) at Salesforce, Prashanth is at the forefront of architecting cloud-native solutions that seamlessly integrate AI-driven automation, real-time data processing, and large-scale distributed systems. His work spans across platform services, ML infrastructure, and enterprise-grade deployments, enabling cross-functional teams to build, deploy, and manage intelligent applications with speed and reliability. Prashanth is also an active thought leader and speaker, regularly participating in and presenting at industry-leading conferences. His talks focus on advanced topics such as ML/AI Ops, Retrieval-Augmented Generation (RAG), AI Agents, Responsible AI, and Time-Series Forecasting, where he shares practical insights derived from real-world enterprise experience. With a strong passion for both innovation and knowledge-sharing, Prashanth combines deep technical expertise with a commitment to advancing the field through mentorship, public speaking, authorship, and contributions to research and open-source communities.
SA
Sonika Arora
Sonika Arora is a seasoned software engineer with over a decade of experience building scalable, resilient, and intelligent distributed systems. She currently serves as a Lead Member of Technical Staff at Salesforce, where she architects and delivers complex microservice-based platforms that power machine learning workflows at scale. At Salesforce, Sonika has played a pivotal role in designing orchestration platforms that seamlessly integrate compute services such as training, prediction, and modeling of ML jobs. By leveraging technologies like AWS Lambda, DynamoDB Streams, Kubernetes, and Terraform, she has led initiatives that ensure concurrency, reliability, and observability across distributed architectures. Prior to Salesforce, she made significant contributions at PayPal, where she helped build real-time monitoring systems and QR code payment infrastructure—delivering solutions optimized for scale, fault tolerance, and performance. Sonika’s strength lies in fusing backend engineering with system-level thinking to create cloud-native systems enriched with automation, monitoring, and intelligent orchestration. She remains passionate about advancing AI-powered platforms, stream processing, and high-throughput infrastructure.
AK
Anant Kumar
Anant Kumar is a seasoned technology leader at Salesforce, where he leads the Data Lake team within the Einstein AI Platform. With over 20 years of experience in distributed systems, AI/ML infrastructure, and cloud-native architectures, he architects enterprise-scale Apache Spark services and data lake solutions that power Salesforce’s predictive and generative AI.
His technical expertise includes building scalable Spark services on Kubernetes, developing cloud-native data pipelines processing billions of events, and designing secure, high-performance infrastructure for AI/ML workloads. He holds multiple U.S. patents in network visibility and security, and his innovations have earned him industry recognition.
Anant is a passionate advocate for responsible AI, contributing to IEEE conferences, peer-reviewed journals, and academic reviews. He mentors emerging researchers and students through non profit organizations and serves as a technical reviewer for leading publishers like O'Reilly, Packt, Manning and Plos ONE.
He is an alumnus of the Stanford Graduate School of Business Ignite Program and actively supports interdisciplinary collaboration across AI, cloud infrastructure, and data science. Recognized for his leadership, mentorship, and commitment to ethical innovation, Anant continues to shape the future of enterprise AI platforms.
JP
Jivitesh Poojary
Jivitesh works in a leading Fortune 100 Telecom organization as a Lead Machine Learning Engineer. He has over 11 years of experience building large scale AI / ML systems for Enterprises. He has cross functional skills in Data Science, Data Engineering and DevOps and is able to look at AI problems holistically. Beyond technical skills, he collaborates across departments to align ML strategies with business goals, advocate for data-driven decision-making, and establish robust MLOps practices. He has a Masters in Data Science from Indiana University Bloomington and has also been active in the AI / ML community by writing research papers, giving conference talks and contributing to open source projects.