Reliability Engineer

Singapore, Singapore

Job Description


COMPANY DESCRIPTION
NE Digital is the digital, data and technology organization that serve as a center of excellence to drive digital transformation for our group of NTUC Social Enterprises to meet the critical social needs of Singapore's community. Delivering innovative products and solutions, we empower our people to lead a better and meaningful life through digital services in the area of daily essentials, health and community care, childcare and education as well as financial services.
The Team We believe that diversity is key to driving an innovative, cohesive, productive and fun workplace! Hence, at NE Digital our people join us from all around the world. Be sure to be soaked in an environment with different ethnic groups driving innovation and injecting some creative juice as one! Contributing to a social purpose through technology, our team of passionate and dedicated folks are spread into different social enterprises such as NTUC Fairprice Group, NTUC First Campus, NTUC Health and among others! Creating technologies that impacts!
RESPONSIBILITIES NE Digital is currently hiring for Reliability Engineer to join Digital Product Development organization. The team combines software and system engineering to architect and run large-scale, distributed, and fault-tolerant systems. The primary team’s goal is to ensure sustainably achieve product reliability through software engineering practices, architecture patterns, culture embracement, process standardization, automation framework, education, and sharing. The team practices industry reliability frameworks such as Service Level Objectives (SLOs) and Service Level Indication (SLIs), release engineering, IaC, and operations automation. The team will empower our product developers in the Product Development Life Cycle to ensure product reliability, it is not limited to building self-serve tools/processes, and an infrastructure foundation that allows the product team to constantly deliver a high-reliability system. The ideal Reliability Engineer candidate is either a software engineer with a good DevOps mindset or a highly skilled system administrator with knowledge of programming and operations automation. You must be the person who likes to solve complex problems with simplicity in mind, work around the clock to ensure system reliability, enjoy collaborating with other teams to embrace reliability discipline and frameworks. As a Reliability Engineer in NE Digital, you have the opportunity to manage the complex challenges of the Social Enterprise System that are unique to NE Digital, while using your expertise in coding, algorithm, complexity analysis, and large-scale system design. You will be reporting to the Architecture & Reliability Lead.

  • Work with product developers to ensure that the software delivery pipeline is as reliable as possible.
  • Responsible to drive practices that ensure reliability of the product.
  • Collaborate closely with product developers to ensure that the designed solution responds to non-functional requirements such as availability, performance, security, and maintainability.
  • Responsible for availability, latency, performance, efficiency, monitoring, emergency response, and system capacity planning.
  • To improve the whole lifecycle of services from inception and design, through deployment, operation, and refinement.
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, system capacity planning and post-mortems.
  • Maintain services once they are launched by measuring and monitoring availability, latency, and overall system health.
  • Scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity.
  • Practice sustainable incident response and blameless postmortems.
  • Documenting “tribal” knowledge.
QUALIFICATIONS
  • Experience in analyzing and troubleshooting systems.
  • Understanding of Infrastructure monitoring, logging, alerting release, and configuration management.
  • Understanding of networking (e.g. TCP/IP, routing, network topology, load balancers, DNS, NTP).
  • Experience in one of the following: Python, Java, Go, Perl, Ruby, or shell scripting.
  • Experience in Public Cloud, AWS, and/or GCP.
  • Experience maintaining Internet-facing production-grade applications.
  • Experience with software deployment and/or orchestration technologies, e.g., Puppet, Chef, Salt, Ansible, Docker, Kubernetes, Terraform.
  • Experience in CI/CD (e.g., JIRA, Git, Jenkins, Nexus, ...)
  • Experience in standard IT security practices (e.g., encryption, certificates, key management)
  • Excellent communication, and problem-solving skills with strong attention to detail.
  • Flexibility to work non-business hours that may include weekends and/or holidays
  • Self-starter who is able to identify and perform tasks with minimal supervision
Please note that your application will be sent to and reviewed by the direct employer - NE Digital

Beware of fraud agents! do not pay money to get a job

MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD1084451
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Singapore, Singapore
  • Education
    Not mentioned