Manage application and security incidents, conduct problem determination, work with various internal teams and vendors to resolve issues on a timely basis to meet SLA, provides reporting and escalation to higher management or incident committee if necessary.
Develop operations and processes guide to ensure every aspect of operations is documented and complies with audit requirements.
Manage day-to-day operation activities, analyse statistics and write status and progress reports, and present findings to stakeholders and higher management.
Manage operations team consisting of staff and vendors, ensuring support is available on a 24/7 basis.
Proven experience as an Operations Engineer or similar role in an IT setting.
Implement change management and incident management workflows, using ITSM tools e.g. Remedy, Zendesk, ServiceDesk to automate workflows is advantageous
Implement security and access control measures to control privileged access to test and production environment.
Implement full stack monitoring (i.e. application and infrastructure) using Application Performance Management (APM) tools. Familiarity with cloud native monitoring options (e.g. Cloudwatch, Stackdriver) and the OpenAPM stack is preferred.
Identify and implement process automation to minimum downtime and human errors. Familiarity with scripting tools e.g. Terraform, Ansible is preferred.
Experienced in agile methodologies, DevOps pipelines, test-driven development, and info-security practices.
Able to work collaboratively with a high performance team and influence with positive energy.
Resourceful and able to work out solutions with innovative thinking and new tech.
Experienced with management cloud infrastructure and services / certification with GPC, GCC (i.e. AWS, Azure, Google Cloud) or equivalent cloud platforms will be preferred.
Excellent problem-solving skills.
Strong communication skills, with the ability to communicate complex technical issues to non-technical teams