, you will be instrumental in ensuring the reliability, scalability, and performance of our hybrid cloud infrastructure across
Azure and AWS
. You will collaborate with engineering and cloud platform teams to build resilient, observable, and automated systems that support rapid delivery and high availability of services.
Key Responsibilities:
Lead
SRE initiatives
to improve availability, reliability, and performance of cloud-native and hybrid applications.
Design and implement
observability frameworks
across Azure and AWS using tools like CloudWatch, Azure Monitor, Prometheus, and Grafana.
Drive
automation and infrastructure-as-code
practices to reduce operational toil and streamline deployments.
Collaborate with application teams to define and implement
SLIs, SLOs, and Error Budgets
for cloud-hosted services.
Champion
chaos engineering
and resilience testing across Azure and AWS environments.
Work with enterprise teams to deploy and scale
SRE enablers
such as service mesh, auto-scaling, and CI/CD pipelines.
Establish and enforce
cloud infrastructure deployment standards
, including blue-green and canary deployments.
Support
cloud migration strategies
, cutover planning, and testing for applications transitioning between Azure and AWS.
Requirements:
Minimum
10 years of experience
in SRE or Cloud Engineering, preferably within the banking or financial services sector.
Deep expertise in
Azure and AWS cloud platforms
, including compute, networking, storage, and security services.
Strong understanding of
ITIL and SRE frameworks
, with the ability to integrate traditional operations with modern cloud practices.
Proven leadership in coordinating with application teams and vendors for cloud deployment and migration planning.
Hands-on experience with
infrastructure-as-code tools
(e.g., Terraform, Bicep, CloudFormation) and scripting (Bash, Python).
Certifications in