Systems Reliability Operations (sro) Engineer

Singapore, Singapore

Job Description

b'


The SRO Engineer will provide operational oversight and technical leadership and is responsible for monitoring, identifying, and coordinating with other technologists across segments to fine-tune system operations rallying to resolve service interruptions. This role is responsible for the end to end reliability and operations of IT services and performing consultations and training to other clients and segments within TWDC. The SRO Engineer will examine IT systems for defects and communicate maintenance schedules and critical events across the company. Working with Engineers and Analysts at all levels and the SRO will interact with computer and software engineers, quality control specialists, infrastructure service leads, segment technologists, and others to ensure service availability, increase efficiency, and establish best practices for the execution and continuous improvement of the Event, Incident, Major Incident, Crisis Management, Hypercare execution, and Problem Management processes within the DTOC. Additionally this position will drive service improvement initiatives through proactive monitoring and enhancement actions from gaps identified through analytics and problem management. The SRO engineer is an active member of the DTOC service team focused on Operations, but ensuring the operations sustainability by contributing to the development, testing, evaluation of services supported. Leverage partnerships with the Business, Customer base and the Suppliers to successfully deliver services to meet agreed upon expectations. Provides 24x7x365 first point-of-contact for centralized incident response and recovery that consistently and reliably triages reported or automated incidents, applies recovery procedures, and engages domain experts to restore steady-state operations; provides all core services on a priority basis and with dedicated support to ensure the success of critical events. Job Responsibilities Technology Focus
  • Carries and maintains a relevant and up to date skill set in the areas of x86 hardware technology, Windows, Linux, RISC operating systems, P-Series hardware, SAN, NAS and data protection technologies.
  • Must have a working knowledge of relevant WAN/LAN technologies, wireless infrastructure, DNS/DHCP, Load-Balancers, WAN Accelerators, Telephony and other network technologies.
  • Experience working with external telecommunications providers including, MSOs, LECs and MNOs as well as the technologies leveraged for service integration.
  • Seasoned technologist whom will identify technology and execution challenges in solutions and products offered by Architecture and Engineering teams as well as outside vendors and OEMs.
  • Consistently reviews the outsourced supplier on the effectiveness of their technology approaches and its implications to the technology delivery function.
  • In partnership and cooperation with the architecture and design teams \xe2\x80\x93 ensures that products currently in ideation and development are being engineered with long term sustainment goals in mind.
  • Must have a solid understanding of Internet technologies and availability strategies for digital platforms.
  • Must be familiar with complex network topics and availability approaches in an effort to drive performance from all network operations center functions.
  • Responsibilities
  • Drive the efficiency and effectiveness of the Event, Incident, Major Incident, Request Fulfillment and Problem Management processes
  • Partner with suppliers to ensure third parties fulfill their contractual obligations with regard to response, diagnosis, resolution and providing RCA-related information and data
  • Identify service improvement opportunities through trend analysis, proactive techniques, and after-action reviews
  • Analyze and publish DTOC utilization and service performance metrics regularly
  • Identify and drive service availability improvement opportunities by executing leading practices
  • Ensure that all DTOC services are designed to deliver the levels of availability required by the business, and validate of the final design to meet the minimum levels of availability as agreed by the business for IT services
  • Elevate any service gaps proactively with leadership
  • Participate in creating, maintaining, and regularly reviewing department procedures, operational readiness plans and posture, aimed at improving the overall availability of IT services and infrastructure components, to ensure that existing and future business availability requirements can be met. This includes compiling daily operational reports and facilitation of operational readiness calls.
  • Ensure the DTOC is effectively monitoring available tools and systems for high availability and swift response to potential and actual outage situations
  • Perform as the incident commander on service outage calls, orchestrating recovery activities of DTOC and other technology teams to drive fast restoration of service without added risk to the organization, providing command and control of the call
  • Effectively apply Incident Analysis and Problem Analysis technique during an incident and post-incident and ensure staff apply the same
  • During outage situations consistently provide Situation Reports in a timely fashion, ensure work streams toward resolution are clearly articulated following department procedures, and business impacts are obtained and all communicated
  • Manage and provide the technical direction of the team to ensure 100% on-site coverage required to effectively support incidents, service requests, proactive health checks and HyperCare services
  • Perform DR/BCP activities for critical events and emergency onsite response.
Strategy
  • Responsible for influencing and socializing DTOC solutions, practices, roles, responsibilities, and processes
  • Responsible for influencing and socializing Operational service gaps to Engineering for capability enhancements.
  • Participate in creating, maintaining, and regular reviews targeting the overall readiness of services for existing and future business needs, including Operational Readiness Reviews (ORR)
  • Contribute to the development and sustainment of an enterprise level incident, event, and availability management strategy
  • Participate in the development and governance of service level agreements
Job Requirements: Education:
  • BA/BS in Computer Science, Engineering or related field. Equivalent work experience would be considered in lieu of degree
Experience:
  • 4+ years experience supporting converged infrastructure stacks, including: application, hypervisor, compute, storage and networking
  • 4+ years leading incident recovery with multi-disciplined geographically dispersed teams in a Fortune 500 organization
  • + years of experience in either a large IT shared services organization or outsourced environment
  • Experience leading technical recovery of major incidents for Fortune 500 organization
  • Experience with hands-on support of cloud operations with one or more: AWS, Google Cloud or Azure
  • Experience supporting diverse portfolios, multiple business applications and IT services
  • Experience working in a 24x7 IT operations environment.
  • Demonstrated experience with Service and Event Management tools.
  • 5+ years experience supporting converged infrastructure stacks, including: application, hypervisor, compute, storage and networking
  • 5+ years leading incident recovery with multi-disciplined geographically dispersed teams in a Fortune 500 organization
  • 3+ years of experience in either a large IT shared services organization or outsourced environment
  • Experience leading technical recovery of major incidents for Fortune 500 organization
  • Experience with hands-on support of cloud operations with one or more: AWS, Google Cloud or Azure
  • Experience supporting diverse portfolios, multiple business applications and IT services
  • Experience working in a 24x7 IT operations environment.
  • Demonstrated experience with Service and Event Management tools.
  • Demonstrated experience in systems integration, application infrastructure support and middleware operations.
  • Demonstrates management skills, both from a resource management perspective and from the overall control of a process
  • Proven experience and understanding of root cause analysis techniques
  • Proven ability to be detail, deadline, and results-oriented
  • Strong leadership skills with the ability to motivate and encourage others
  • Ability to manage competing priorities and workflow
  • Solid interpersonal skills for written, oral, and face to face communications
  • Practical experience with influence and negotiation methods and techniques
  • Ability to serve as mentor and coach
  • Strong customer service orientation, seeking opportunities to serve clients
Skills:
  • Experience with ITIL frameworks and processes
  • Experience working within large, complex production teams
  • Experience working within an outsourced environment
  • Vendor relationship management experience
  • Comfortable working within a highly matrixed organization
  • Strong technology driven process experience
  • Ability to work under pressure, meet internal and external work schedules and or deadlines and show effective time and crisis management skills
The Walt Disney Company is an Equal Opportunity Employer. We strive to be a diverse workforce that is representative of our audiences, and where all can thrive and belong. We are committed to building a team that includes and respects a variety of voices, identities, backgrounds, experiences and perspectives.
We are taking a responsible approach to creating environments that allow us to do what we do best \xe2\x80\x93 entertain and inform millions around the world. As part of our commitment to health and safety, COVID-19 vaccines are required for all newly hired employees in Singapore
#LI-DNI

Beware of fraud agents! do not pay money to get a job

MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD1307265
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Singapore, Singapore
  • Education
    Not mentioned