girl looking into her desktop
Back to search results

Cloud Reliability Engineer - Core Technology Infrastructure

Richardson, Texas;

Job Description:

Our Cloud Site Reliability Engineers (cSREs), ensure that our Cloud services meet the reliability and uptime requirements of our demanding enterprise customers. This is achieved with, the best engineering practices and resilient design and through a well-defined and effective global on-call rotation that runs 24x7.

The role provides opportunity to work with wide range of technologies and unique perspective on how various services (on-prem/external) interact with each other. You will work with colleagues that are as smart, hardworking, and driven as you. You will get an opportunity to work in a team that keeps growing, innovating, and giving you room to be proactive and creative.

A successful candidate must have 6+ years of hands on experience with private cloud and able to provide on-call support and demonstrate ability to debug and optimize code and automate routine task. They should be able to work stand-alone and with a distributed team.

Are you ready for the next step in your career? Then we’d love to hear from you!

Responsibilities:

  • Will be responsible for reliability and support of Internal Cloud, Public Cloud (Azure /IBM) and OpenShift Containers (Dockers) services.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
  • Troubleshoot issues across the entire stack: hardware, software, application, and network
  • Perform deep dives into both systemic and latent reliability issues; perform blameless RCA, partner with engineering and operation teams across the organization to roll out fixes.
  • Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization.
  • Identify and drive opportunities to improve automation for the cloud services
  • Provide on-call coverage as per rotation
  • Be a key stakeholder in the design of cloud services so that they are resilient from day 0 and identify/fix resiliency problems by collaborating with product teams

Required Skills:

  • BS /MS degree in Computer Science or related technical field involving systems or equivalent practical experience.
  • Minimum 6+ years of hands-on experience maintaining infrastructure services
  • Excellent understanding of Linux /Windows operating systems administration
  • Experience with VMware, Azure cloud, OpenShift Docker, Kubernetes  
  • Experience with automation in one or more of the programming: Python, Java, Ansible and shell scripting and source control (git)
  • Experience with SQL/NoSQL databases like MySQL MongoDB and CI/CD tools git /Jenkins
  • Systematic problem-solving approach, sense of ownership and drive
  • Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
  • Proven ability to work independently with minimal supervision and as part of a team with direct responsibilities.

Desired Skills:

  • Experience with Ansible Tower, Redhat Satellite Foreman, capsule architecture knowledge is a plus.
  • Experience with Hashicorp Vault /Terraform /Consul /Nomad is a plus.

Job Band:

H5

Shift: 

1st shift (United States of America)

Hours Per Week:

40

Weekly Schedule:

Referral Bonus Amount:

0

Job Description:

Our Cloud Site Reliability Engineers (cSREs), ensure that our Cloud services meet the reliability and uptime requirements of our demanding enterprise customers. This is achieved with, the best engineering practices and resilient design and through a well-defined and effective global on-call rotation that runs 24x7.

The role provides opportunity to work with wide range of technologies and unique perspective on how various services (on-prem/external) interact with each other. You will work with colleagues that are as smart, hardworking, and driven as you. You will get an opportunity to work in a team that keeps growing, innovating, and giving you room to be proactive and creative.

A successful candidate must have 6+ years of hands on experience with private cloud and able to provide on-call support and demonstrate ability to debug and optimize code and automate routine task. They should be able to work stand-alone and with a distributed team.

Are you ready for the next step in your career? Then we’d love to hear from you!

Responsibilities:

  • Will be responsible for reliability and support of Internal Cloud, Public Cloud (Azure /IBM) and OpenShift Containers (Dockers) services.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
  • Troubleshoot issues across the entire stack: hardware, software, application, and network
  • Perform deep dives into both systemic and latent reliability issues; perform blameless RCA, partner with engineering and operation teams across the organization to roll out fixes.
  • Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization.
  • Identify and drive opportunities to improve automation for the cloud services
  • Provide on-call coverage as per rotation
  • Be a key stakeholder in the design of cloud services so that they are resilient from day 0 and identify/fix resiliency problems by collaborating with product teams

Required Skills:

  • BS /MS degree in Computer Science or related technical field involving systems or equivalent practical experience.
  • Minimum 6+ years of hands-on experience maintaining infrastructure services
  • Excellent understanding of Linux /Windows operating systems administration
  • Experience with VMware, Azure cloud, OpenShift Docker, Kubernetes  
  • Experience with automation in one or more of the programming: Python, Java, Ansible and shell scripting and source control (git)
  • Experience with SQL/NoSQL databases like MySQL MongoDB and CI/CD tools git /Jenkins
  • Systematic problem-solving approach, sense of ownership and drive
  • Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
  • Proven ability to work independently with minimal supervision and as part of a team with direct responsibilities.

Desired Skills:

  • Experience with Ansible Tower, Redhat Satellite Foreman, capsule architecture knowledge is a plus.
  • Experience with Hashicorp Vault /Terraform /Consul /Nomad is a plus.

Shift:

1st shift (United States of America)

Hours Per Week: 

40

Learn more about this role

Full time

JR-21067799

Band: H5

Manages People: No

Travel: No

Manager:

Talent Acquisition Contact:

Angela Kathmann

Referral Bonus:

0