Company name
Humana Inc.
Location
Chicago, IL, United States
Employment Type
Full-Time
Industry
Engineering, Work At Home
Posted on
Mar 04, 2022
Profile
Description
The senior network site reliability engineer is responsible for the big picture of how our network and applications relate to each other, we use a breadth of tools and approaches to tackle a broad spectrum of problems. Site Reliability Engineering is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. SREs will help drive iterative improvement by reducing manual processes and shortening the problem resolution cycle.
Responsibilities
Humana is looking for a senior site reliability engineer (network) position that will be responsible for building and deploying network automation, improving network reliability, and will drive tools/service development to maintain and improve our service SLOs. This person is expected to have key knowledge in the following areas:
Proven experience managing various large-scale enterprise network topologies including LAN, WAN, Wireless, Network Security and Services. Working knowledge of infrastructure components (e.g., routers, load balancers, cloud products, container systems, compute, storage, and networks)
Able to identify manual operational tasks and develop automation to solve problems in a modern SRE support model
Drive the collection of performance metrics that will help drive automation, reduce network down time and enhance our decision-making capability
The senior network site reliability engineer installs, supports and/or maintains network monitoring and management tools. They will perform technical analysis of software, hardware and transmission facilities using various diagnostic tools in support of efficient network operations. They will also help drive the performance, reliability, and scalability of the enterprise network to support the growing and changing needs of the business. This person will also help influence the department's strategy in the areas of automation, telemetry and predictive analytics utilizing current and emerging technologies in the market.
The position will have the following responsibilities:
Analyze data to diagnose and identify root causes to network-specific events
Develop tools and services to automate the mitigation and remediation of network-specific events
Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
Lead significant production improvement around tooling, automation, and process improvements
Maintain services once they are live by measuring and monitoring availability, latency and overall system health
Extracting key performance metrics from SRE related tools (DynaTrace, Splunk, Big Panda, Thousand Eyes, etc and then building associated dashboards)
Diagnosing performance issues in complex distributed applications leveraging infrastructure and application telemetry
Identify manual operational tasks and develop automation to solve problems in a modern SRE support model
Required Qualifications:
Bachelor's degree
Five or more years of technical experience.
2 years working with scripting language for developing automation processes.
3 years working with an enterprise application and network performance management. solution (Dynatrace, Datadog, New Relic, AppDynamics, Thousand Eyes).
3 years working with an enterprise log aggregation solution (ie Splunk).
Experience with configuring, operating, and troubleshooting network protocols such as DNS, HTTP, SSL, and routing protocols (OSPF, EGIRP, BGP).
Familiarity with cloud-native software platforms and tools in Azure, AWS and Google.
Ability to isolate network failures causing impact to services across LAN/WAN topologies and understand how these failures surface in the application layer.
Excellent communication skills both written and verbal, presentation, social, and analytical skills.
Preferred Qualifications:
Strong networking background along with a strong familiarity with major routing/switching protocols and equipment is a bonus.
Familiarity with monitoring tools such as Splunk, Dynatrace, ThousandEyes, StealthWatch, BigPanda is a plus.
Hands on experience with cloud service providers Microsoft Azure, GCP and AWS.
Strong scripting experience: Ansible, Python, Perl, bash, windows scripts (VBS/PowerShell).
Experience with Terraform, Istio and Kubernetes and similar are bonuses.
Additional Information
Associates are required to be fully COVID vaccinated, including booster, or undergo weekly COVID testing and wear a face covering while at work. The weekly testing will need to be done through an approved Humana vendor, and unvaccinated associates should follow all social distancing and masking protocols if they are required to come into a Humana facility or work outside of their home.
If progressed to offer, candidates will be required to:
Provide proof of full vaccination, including booster OR
Provide proof of applicable exemption including any required supporting documentation
Medical, religious, and state exemptions will be available.
#LI-PH1
Scheduled Weekly Hours
40
Company info
Humana Inc.
Website : http://www.humana.com