Department descriptionThe Group Infrastructure and Cloud \xe2\x80\x93 Public (GIC-Public) division is a diverse team focused on modernizing DBS\xe2\x80\x99 technology delivery methods and platforms to enable the bank and compete in the Digital Economy on Cloud. Our vision is to enable business teams to explore and implement solutions in record time, securely, with no customer disruption and no operations. Inspired by the GAFA and Fintech companies, as well as other emerging technologies, we develop enterprise grade solutions for consumption by business and technology units.The GIC-Public department is responsible for defining and ensuring the execution of the roadmaps related to the firm\xe2\x80\x99s global cloud platforms.MissionThe GIC-Public team owns, develops, and supports the public cloud strategy of the bank. This strategy is enabled by a truly cloud native public adoption tool, called Evolve which offers a unique way to onboard public cloud through a directive approach to cloud architecture. The platform covers directive, preventive and detective controls and achieves agility for the users of the platform.As an Ops & SRE Engineer, you are responsible for serving the DBS technology community by caring deeply about their needs, frustrations, and overall satisfaction with their DBS Cloud experience. You are passionate about improving the public cloud experience by constantly learning, engaging our customers with a strong desire to help our customers and enhance the cloud platform experience.Responsibilities\xc2\xb7 Taking ownership of incidents reported and coordinating with various teams for resolution\xc2\xb7 Partner with DBS development teams to help reproduce and resolve public cloud platform issues.\xc2\xb7 Responsible for solving incidents with advanced troubleshooting techniques to provide tailored solutions for our development team and thoughtfully work with them to dive deep into the root cause of an issue.\xc2\xb7 Coach/mentor new hires, develop & present training, partner with development teams on complex issues or, participate in new hiring, write tools/script to help the team, or work with leadership on process improvement and strategic initiatives.\xc2\xb7 Responding to and investigating system generated security incidents\xc2\xb7 Manage high severity incidents and high customer impact incidents focusing on fast recovery\xc2\xb7 Champions production resilience and availability, focusing on superior client experience, by working with the operation team and technology development teams\xc2\xb7 Drive the implementation of Site Reliability Engineer (SRE) for all strategic systems\xc2\xb7 Drive effective communication between business and technology with regards to production service reliability and performance\xc2\xb7 Drive continuous improvements in processes or systems leveraging Site Reliability Engineering methods\xc2\xb7 Respond to, evaluate and analyse production incidents to minimise their impact as well as devise innovative solutions to prevent them in the future\xc2\xb7 Improve the reliability and availability of systems by gathering hard data, designing systems for increased service reliability and performance\xc2\xb7 Provide expert advice and training to our engineers as to which technology solutions and advanced reliability techniques to use on each situation Requirements\xc2\xb7 Experience driving major production incidents and organise incident retrospective meetings\xc2\xb7 The role is an SRE role and thus includes rotation in a 24/7 on call roster that supports the cloud delivery platform pipelineKey Skills and Experience\xc2\xb7 8+ years working experience\xc2\xb7 5+ years Production Operations and SRE experience in supporting enterprise public cloud environments (AWS and Google Cloud) \xc2\xb7 Knowledge on CICD tools (Bitbucket, Jenkins, AWS CLI etc.)\xc2\xb7 Must have a helping hand attitude\xc2\xb7 General knowledge of enterprise security and public cloud security\xc2\xb7 Knowledge on Incident Management, Change management process\xc2\xb7 Knowledge in software Release Management\xc2\xb7 Highly proficient in written and spoken business English\xc2\xb7 Well organized, adaptable and makes clear and effective decisions\xc2\xb7 Knowledge and experience with observability and SRE tooling and philosophy:\xc2\xb7 General knowledge of infrastructure components Firewalls, TCP/IP, DNS, ICMP, Networking, Switching, PKI, TLS, Load Balancing\xc2\xb7 General knowledge of web technology fundamentals HTTP, WebSocket, Content Distribution, WAF, REST, JSON, YAML, HTTP, CORS\xc2\xb7 Strong experience with any flavour of Linux\xc2\xb7 General knowledge and experience with Python\xc2\xb7 Awareness of continuous delivery and continuous integration development environment git,\xc2\xb7 General experience with at least one of the following: K8S, Ansible, Terraform, CloudFormation\xc2\xb7 AWS Certified (SysOps, DevOps, Security) is a plus and good to have.
MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.