Site Reliability Engineer II
Job Description:
At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. We do this by driving Responsible Growth and delivering for our clients, teammates, communities and shareholders every day.
Being a Great Place to Work is core to how we drive Responsible Growth. This includes our commitment to being an inclusive workplace, attracting and developing exceptional talent, supporting our teammates’ physical, emotional, and financial wellness, recognizing and rewarding performance, and how we make an impact in the communities we serve.
Bank of America is committed to an in-office culture with specific requirements for office-based attendance and which allows for an appropriate level of flexibility for our teammates and businesses based on role-specific considerations.
At Bank of America, you can build a successful career with opportunities to learn, grow, and make an impact. Join us!
Job Description:
This job is responsible for partnering with engineering and technology teams to implement measures as prescribed by lead/senior SRE engineers. Key responsibilities include ensuring appropriate instrumentation, tooling, ticketing, alerting and on call routines are in place for key services, identifying root causes of issues through production triage efforts, and suggesting code enhancements to technology teams to automate services and improve reliability and efficiency. Job expectations include using software development skills to improve efficiency and to address gaps in reliability.
Responsibilities:
Develops and maintains reliability scripts, tools and libraries and leverages them for common instrumentation, automation, and operational needs, and when mentoring Site Reliability Engineer (SRE) resources on reliability practices and established tools/capabilities
Collaborates with Development and Infrastructure teams to understand technical solutions and implement monitoring capabilities outlined in the application and system monitoring designs put forward by the SRE Lead
Partners to implement code changes to make use of common reliability libraries and tools and helps Application Production Services and Application Development teammates understand how to use them
Identifies vulnerabilities and opportunities for reliability improvement, such as investigating low level error rates and 'noise' in monitoring, and defines solutions to reduce manual support effort and/or improve system reliability
Engages as a subject matter expert in major incident triage efforts and failure scenario modelling and diagnosis with Problem Manager root causes for major incident/problem management investigations
Participates regularly in an on-call rotation with Production Support teammates to learn more about reliability issues affecting their portfolio
Provide Day to Day Operations and Project Support for Infrastructure specifically focused on the firms GRID environment, but also across all cloud product offerings.
Shift and Weekend Coverage support
Develop, test and deploy automated workflow in support of the IT business
Research and implement process or technology improvements
Provide technical guidance and expert-level consultation for complex automation
Workflows and deliver solutions that integrate technologies to carry out desired function
Delivering technical documentation for all projects completed
Required Qualifications:
Minimum of 6+ years in Python / Ansible
Minimum of 6+ years in shell/Bash/Ksh.
Candidate must have exposure to large enterprise grid deployment and/or Cloud integration experience
Good experience in developing projects using Go lang, XML, XSLT, SOAP, RESTful, SQL.
Good experience in command line interfaces (CLI), third party APIs and integration.
Good knowledge of Linux, Windows, virtualization technologies
Experience building CI/CD pipeline, expert level knowledge of tools like git/Jenkins
Experience with design, development & test new and existing configuration management (Terraform) infrastructure as code
Experience in the Azure/AWS technologies as well as broad know-how around how applications and services are constructed using the Azure and AWS platforms, and helping drive automation & integration aspects around the respective technologies
Experience creating and maintaining complex data driven automations and queries using SQL and noSQL databases.
Good proficiency in system, network, security and database operations, protocols and industry standard technologies.
Good experience in developing secure technologies, knowledge in ACLs and roles based entitlements.
Experience in systems analysis, modular design and creating API that support XML, JSON or other well-known interfaces.
Application development skills and experience in integrating automation within an existing back-end IT systems and databases.
Proven ability to work independently with minimal supervision and as part of a team with direct responsibilities
Experience with IT core applications like DNS, Active Directory, Kerberos, SMTP, Transactional DBs, Apache, etc.
Ability to juggle competing priorities and adapt to changes in project scope.
Ability to communicate and collaborate effectively with teammates.
Effective verbal and written communication.
Good understanding of developing fault tolerant solutions and knowledge in horizontal scaling and resiliency/HA
8-10 years infrastructure or software engineering / development experience
Desired Qualifications:
Minimum of a 4-year degree in computer science or equivalent experience
Skills:
Analytical Thinking
Automation
Collaboration
Production Support
Result Orientation
Application Development
Architecture
Influence
Project Management
Solution Design
Adaptability
DevOps Practices
Risk Management
Solution Delivery Process
Stakeholder Management
Shift:
1st shift (United States of America)Hours Per Week:
40