You will be instrumental in implementing platform and application monitoring requirements, auditing requirements, and sound deployment strategies that are highly available.
Project work and analysis will include implementation of best practices across IBM Watson Health Kubernetes environments including IBM Cloud and Blockchain platforms across several applications.
Takes ownership and accountability of the Product and provide technical guidance to the team members.Champion the DevSecTestOps (DTSO), analyze code for reliability issues, components, infrastructure, and system level.
Define Blue-Green deployment approach to enable zero-downtime deployment.Define the strategies, patterns, solution to improve the reliability of the system by reducing / eliminating points of failure.
Analyze errors, exceptions and identify a feedback loop to prevent future occurrence.Develop, Define & Implement automatic healing and recovery strategies, and work with development team.
Establish best practices for system logging, monitoring, health checks, and recovery.Work with various teams QA, Engineering, Architects to ensure test automation, security testing is integrated with DTSO pipeline.
Define key operational metrics and apply tools to gauge the product health in development, test and production environment.
Sharing in the collective team vision and successfully promoting the why and how to all teamsOn-Call Process Optimization - Should get involve in the implementation of strategies that increase system reliability and performance through on-call rotation and process optimization.
To add automation for improved collaborative response in real-time, besides updating documentation, runbook tools, and modules to ready teams for incidents.
Documenting Knowledge - Take part in on-call duties, IT operations, software development, and support. To ensure a seamless flow of information between teamsDriving the adoption of Site Reliability and Agile principles across the organizationDemonstrated sense of ownership and drive to improve production operations.
Proven ability to understand and troubleshoot complex problems under pressure.Ability to zoom in and zoom out to understand the holistic architecture and design of the product / site.
Role requires to work in shift, 24X7 support model.Work Location : Bangalore and Hyderabad (preferred )