We are seeking a highly skilled and experienced Cloud Engineer lead (Level 3) to support cloud infrastructure for Commercial and Singapore Government-appointed agency operating across commercial cloud platforms. This role requires experiences managing multi-cloud environments predominantly on Amazon Web Services (AWS), with knowledge in Microsoft Azure and Google Cloud Platform (GCP). The ideal candidate will demonstrate strong Infrastructure-as-Code (IaC) capabilities, comprehensive OS lifecycle and patching operations, application deployment and troubleshooting expertise, and proactive operational leadership. This role emphasizes hands-on technical proficiency, security awareness, automation-driven practices, mentorship capabilities, and familiarity with strict uptime, compliance, and audit requirements in network separation environments.
Key Responsibilities:
Multi-Cloud Infrastructure Operations
Operate and maintain cloud-native services in production across AWS, Microsoft Azure, and Google Cloud Platform:
Hands-on experience with cloud services including: Lambda, ECS/EKS, FSx, Glue, SES, GuardDuty, WAF, Shield Advanced, Security Hub, KMS, Secret Manager, SNS, SQS, EventBridge, API Gateway, EC2, S3, CloudWatch, Systems Manager, Azure Virtual Machines, Azure Kubernetes Service (AKS), Azure Functions, Azure Storage, Azure Monitor, Compute Engine, Google Kubernetes Engine (GKE), Cloud Functions, Cloud Storage, Cloud Monitoring
Monitor and troubleshoot infrastructure performance, uptime, and scalability across all platforms
Support production and staging environments with 24/7 reliability objectives
Able to participate in 24/7 shift rotation to provide round-the-clock operational support and assist a team of L2 engineers with hands-on troubleshooting of technical issues.
Infrastructure as Code (IaC)
With working knowledge, able to maintain infrastructure deployment pipelines with 1 of the following: Terraform, Ansible, and/or Azure Resource Manager (ARM) templates
Troubleshoot environment drift and pipeline failures across multi-cloud environments.
Promote and be empowered to drive automation in cloud operations and continuous improvement initiatives.
Implement and maintain GitOps practices for infrastructure deployment
Operating System Lifecycle & Patch Management
Lead OS patching operations across RHEL (v8 to v10) and Windows Server (2016 2025) using AWS Patch Manager, Azure Update Management, WSUS, SCCM, and YUM/DNF
Maintain basic knowledge of Linux administration with deep expertise in Wintel Operating System patching and management
Schedule, automate, and track patches across all environments
Coordinate patch approvals and ensure compliance with organizational policies
Execute monthly and quarterly patch cycles with minimal disruption
Perform post-patch validation and remediation activities
Application Deployment & Troubleshooting
Deploy and troubleshoot applications across Windows and Linux operating systems
Support application teams with OS-level diagnostics and performance optimization
Collaborate with development teams to resolve infrastructure and OS-related application issues
Implement and maintain application monitoring and alerting frameworks
Security & Compliance
Execute CIS (Center for Internet Security) security remediations across cloud platforms
Perform security hardening based on CIS Benchmarks and government security baselines
Conduct vulnerability remediation using tools such as Trend Micro Vision One, Qualys, Tenable, and AWS Config
Track SSL certificate renewals across all environments
Identify and remediate End-of-Life (EOL) components including OS versions and Lambda runtimes
Support compliance with government-level security, audit, and regulatory requirements
Container & DevSecOps
Demonstrate knowledge of container technologies (Docker, Kubernetes, ECS, EKS, AKS, GKE)
Familiarity or insights of DevSecOps practices using SHIP-HATS (Secure Hybrid Integration Pipeline - Hive Agile Testing Solutions) under Singapore Government technology stack
Support CI/CD pipeline operations and integration with security scanning tools
ITIL & Service Management
Adhere to ITIL processes including Incident, Problem, Change, and Request Management
Manage and resolve ITSM tickets via ServiceNow, Jira, or similar platforms
Drive ITSM ticket escalation between engineering teams and stakeholders
Coordinate change management activities and participate in Change Advisory Board (CAB) reviews with junior engineers.
Maintain service level agreements (SLAs) and operational level agreements (OLAs)
Tool Integration & Observability
Integrate third-party tools including NGINX, monitoring dashboards, and observability stacks
Configure and maintain observability tools for metrics, logs, and alerts across multi-cloud environments
Implement log aggregation and analysis using CloudWatch, Azure Monitor, and GCP Cloud Logging
Documentation & Knowledge Management
Create and maintain comprehensive infrastructure runbooks, system documentation, and change tracking logs and infrastructure architecture design of Application assigned.
Develop standard operating procedures (SOPs) and knowledge base articles
Ensure audit-readiness through meticulous documentation discipline
Maintain configuration management databases (CMDB) and asset inventories
Leadership & Mentorship
Provide technical guidance and mentorship to Level 2 and junior engineers
Lead technical discussions and architecture reviews
Facilitate knowledge transfer sessions and training programs
Act as escalation point for complex technical issues
Drive continuous improvement initiatives and best practice adoption
Soft Skills & Competencies
===============================
Problem Solving - Advanced troubleshooting of complex multi-cloud systems
Communication - Clear and effective communication with technical and non-technical teams, stakeholders, and management
Leadership - Ability to guide teams and drive technical initiatives
Collaboration - Cross-functional teamwork across engineering, security, and business teams
Adaptability - Responsive and effective in rapidly changing environments
Accountability / Attention to Detail - Takes ownership of outcomes and service delivery, ensures accurate and secure implementations
Customer Focus - Supportive, service-oriented approach with stakeholder management
Continuous Learning - Stays current with evolving cloud and security practices
Resilience - Performs effectively under pressure and during incident response
Mentorship - Develops and supports junior team engineers
SME Expectations - Role Behavior
This Subject Matter Expert (SME) role requires:
Proficiency across Amazon Web Services with working knowledge of Azure and GCP
Proven experience in uptime-critical and compliance-driven environments
Strong mentorship and leadership capabilities for junior and mid-level engineers
Proactive initiative in incident prevention and operational excellence
Calm, structured, and methodical approach to incident handling with strict adherence to change management and incident response processes
Audit-readiness mindset with comprehensive documentation practices
Ability to drive escalations and manage stakeholder communications effectively
Experience working within Singapore Government technology frameworks
Required Qualifications:
Bachelor's degree in Computer Science, Information Systems, or related field
Minimum 3 years of experience in Commercial Cloud Engineering roles
At least 2 years of experience in public sector or regulated cloud environments
Minimum 3 years of hands-on experience with AWS or Microsoft Azure or Google Cloud Platform
Experience in 24/7 operational support environments with shift rotation
Demonstrated experience in mentoring and leading junior engineers
Strong background in ITIL processes and ITSM platforms with experiences on CIS security hardening and remediation
Familiarity with Singapore Government technology standards and frameworks (e.g., SHIP-HATS, IM8 Policy)
Preferred Certifications:
AWS Certified Solutions Architect - Associate / Professional
AWS Certified SysOps Administrator - Associate (preferred)
Microsoft Certified: Azure Administrator Associate or Azure Solutions Architect Expert
Microsoft Certified: Windows Server Hybrid Administrator Associate
RHCE or Linux Professional Institute Certification (LPIC)
* ITIL v3/v4 Foundation
Beware of fraud agents! do not pay money to get a job
MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.