girl looking into her desktop
Back to search results

Cloud Reliability Engineer - Core Technology Infrastructure

Jersey City, New Jersey;

Job Description:

Cloud Reliability Engineer (SRE)

  • Job Description:

    • Responsible for reliability and support of Internal Cloud, Public Cloud (Azure /IBM) and OpenShift Containers (Dockers) services.
    • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
    • Troubleshoot issues across the entire stack: hardware, software, application, and network
    • Perform deep dives into both systemic and latent reliability issues; perform blameless RCA, partner with engineering and operation teams across the organization to roll out fixes.
    • Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization.
    • Identify and drive opportunities to improve automation for the cloud services
    • Provide on-call coverage as per rotation
    • Be a key stakeholder in the design of cloud services so that they are resilient from day 0 and identify/fix resiliency problems by collaborating with product teams

    Required Skills:

    • BS /MS degree in Computer Science or related technical field involving systems or equivalent practical experience.
    • Minimum 6+ years of hands-on experience maintaining infrastructure services
    • Excellent understanding of Linux /Windows operating systems administration
    • Experience with VMware, Azure cloud, OpenShift Docker, Kubernetes  
    • Experience with automation in one or more of the programming: Python, Java, Ansible and shell scripting and source control (git)
    • Experience with SQL/NoSQL databases like MySQL MongoDB and CI/CD tools git /Jenkins
    • Systematic problem-solving approach, sense of ownership and drive
    • Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
    • Proven ability to work independently with minimal supervision and as part of a team with direct responsibilities.

    Desired Skills:

    • Experience with Ansible Tower, RedHat Satellite Foreman, capsule architecture knowledge is a plus.
    • Experience with Hashicorp Vault /Terraform /Consul /Nomad is a plus.

Job Band:

H5

Shift: 

1st shift (United States of America)

Hours Per Week:

40

Weekly Schedule:

Referral Bonus Amount:

0

Job Description:

Cloud Reliability Engineer (SRE)

  • Job Description:

    • Responsible for reliability and support of Internal Cloud, Public Cloud (Azure /IBM) and OpenShift Containers (Dockers) services.
    • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
    • Troubleshoot issues across the entire stack: hardware, software, application, and network
    • Perform deep dives into both systemic and latent reliability issues; perform blameless RCA, partner with engineering and operation teams across the organization to roll out fixes.
    • Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization.
    • Identify and drive opportunities to improve automation for the cloud services
    • Provide on-call coverage as per rotation
    • Be a key stakeholder in the design of cloud services so that they are resilient from day 0 and identify/fix resiliency problems by collaborating with product teams

    Required Skills:

    • BS /MS degree in Computer Science or related technical field involving systems or equivalent practical experience.
    • Minimum 6+ years of hands-on experience maintaining infrastructure services
    • Excellent understanding of Linux /Windows operating systems administration
    • Experience with VMware, Azure cloud, OpenShift Docker, Kubernetes  
    • Experience with automation in one or more of the programming: Python, Java, Ansible and shell scripting and source control (git)
    • Experience with SQL/NoSQL databases like MySQL MongoDB and CI/CD tools git /Jenkins
    • Systematic problem-solving approach, sense of ownership and drive
    • Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
    • Proven ability to work independently with minimal supervision and as part of a team with direct responsibilities.

    Desired Skills:

    • Experience with Ansible Tower, RedHat Satellite Foreman, capsule architecture knowledge is a plus.
    • Experience with Hashicorp Vault /Terraform /Consul /Nomad is a plus.

Shift:

1st shift (United States of America)

Hours Per Week: 

40

Learn more about this role

Full time

JR-21067790

Band: H5

Manages People: No

Travel: No

Manager:

Talent Acquisition Contact:

Angela Kathmann

Referral Bonus:

0