Technical Support Specialist Job in SEDHA CONSULTING PTE. LTD.

Technical Support Specialist

SG, Singapore

Apply Now

Job Description

About the Role

We are looking for a skilled and driven Technical Software/Support Engineer (Operations) to join our team. In this role, you will drive our operations and incident management initiatives, ensuring our systems remain robust, scalable, and resilient at scale. You will work closely with cross-functional teams to identify operational gaps and implement solutions that enable seamless deployment, observability, and maintenance of our system

Key Responsibilities

Incident Management & Response (60%)

Lead/contribute to incident response efforts during critical system outages and performance degradations Develop and maintain incident response procedures, runbooks, and escalation protocols Conduct thorough post-incident reviews and drive implementation of preventive measures Coordinate cross-functional teams during high-severity incidents Build and maintain incident management tooling and automation Manage stakeholders expectations

System Operations & Reliability (20%)

Design, implement, and maintain monitoring, alerting, and observability across our system Develop automation tools to reduce manual operational overhead Ensure system SLAs and SLOs are met consistently

Software Development (10%)

Build internal tools, APIs, and platforms to improve operational efficiency Create dashboards and reporting systems for operational metrics

Collaboration & Process Improvement (10%)

Partner with development teams to improve system reliability and operability Establish and refine operational processes and best practices Mentor team members on incident response and operational procedures Participate in on-call rotation and provide operational leadership during incidents Drive continuous improvement initiatives based on operational data and feedback

Required Qualifications

Technical Skills

5+ years of software engineering experience with a focus on operations Proficiency in at least one programming language (Python, Java/Kotlin, TypeScript or similar) Experience in modern web application technologies/tools such as PostgresDB, Kotlin, AWS Knowledge of CI/CD pipelines and deployment automation Experience with AWS and container technologies (Docker, Kubernetes) Understanding of monitoring and observability tools (Prometheus, Grafana, ELK stack, or similar) Experience with APM tools (New Relic, Datadog, AppDynamics) Experience with infrastructure-as-code tools (Terraform, Ansible, CloudFormation) Background in DevOps or Site Reliability Engineering practices Experience with log aggregation and analysis tools Understanding of security operations and compliance requirements Contribute to system architecture decisions with operations considerations in mind

Operational Experience

Proven experience in incident management and response procedures Experience with on-call responsibilities and escalation processes Understanding of system reliability concepts (SLAs, SLOs) Knowledge of networking, security, and database administration concepts Experience with configuration management and deployment strategies

Soft Skills

Excellent problem-solving and analytical thinking abilities Strong communication skills for technical and non-technical audiences Ability to work effectively under pressure during incident situations Collaborative mindset with cross-functional teams * Detail-oriented approach to documentation and process improvement

Beware of fraud agents! do not pay money to get a job

MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.