Company name
J.Crew
Location
New York City, NY, United States
Employment Type
Full-Time
Industry
It, Engineering, Sciences
Posted on
Sep 27, 2021
Profile
The Lead DevOps Engineer is responsible for the availability, reliability, performance, capacity, scalability, maintainability, and survivability of the ecommerce websites, applications, services, and the hosting infrastructure. This individual will contribute his or her knowledge and expertise on dependable systems to the design, development, instrumentation, deployment, operation, administration, maintenance, security, and documentation of the frameworks, tools, applications, services, and infrastructure needed to support the SDLC of ecommerce websites, applications, and services ensuring on budget, on time, and high quality delivery of projects. The SRE focuses on the automation and observability of SDLC processes, and of privately hosted and cloud based ecommerce websites, applications, services, and infrastructure; addresses ecommerce websites, applications, services, and infrastructure issues; contributes to ecommerce R&D processes and innovation; and engage in continuous service improvement.
ENGINEERING RESPONSIBILITIES
The Lead DevOps Engineer:
*Promotion of the DevOps and SRE cultures within the DevOps Team, the Ecommerce department, the IT organization, and the enterprise at large. *Automation and instrumentation of SDLC processes, ecommerce websites, applications, services, and hosting infrastructure. *Definition of SLAs, and SLOs and Error Budgets for the SDLC processes, ecommerce websites, applications, services, and hosting infrastructure. *Formulation of SLA, and SLOs and Error Budgets for DevOps services. *Sharing projects and tasks information within the DevOps Team, the Ecommerce department, the IT organization, and the enterprise at large. *Sharing engineering knowledge and expertise within the DevOps Team, the Ecommerce department, the IT organization, and the enterprise at large. *Quality assurance of all engineering outputs. *Leanness (continuous flow, continuous improvement, elimination of waste, fast feedback, systemic view, simplicity). *Assist in the capacity of stakeholder, and/or team member (SME) , and/or lead engineer, and/or PM in all phases of the software development and infrastructure life cycles. Tasks may include agile project management and workflow management; the definition of needs, benefits analysis, and technical strategy; research & development within the project life-cycle; technical analysis and design; and the support of the implementation, testing and rolling-out of the solutions. *Create and/or review documents for projects and tasks (as needed) including charters, plans, software requirements specifications, use case specifications, scope of work, schedule, technical design, configuration management, quality management, risk management, provisioning, transition to operations, OAM, security, trouble tickets, change and configuration management events, RCAs, premortems, postmortems, etc. *Review with stakeholders, the Architecture Review Board, InfoSec, and others the projects and tasks business and technical documentation. *Develop business and technical architectures, tools, applications, services, systems, instrumentation, etc., utilizing established management, analysis, design, development, implementation, and operation frameworks, methodologies, tools, and practices. *Monitoring of ecommerce websites, applications, services, and the hosting infrastructure. *Measuring and reporting about SLIs and KPIs. *Attend to DevOps and ecommerce development teams meetings including planning sessions, standups, reviews. and retrospectives, etc. *Interface with the IT organization, the business, and enterprise. *Join the Incident Response Management Team. *DR.
SPECIFIC ECOMMERCE WEBSITES, APPLICATIONS, SERVICES, AND INFRASTRUCTURE RESPONSIBILITIES
*Program/write tools, applications, services, scripts, instrumentation, etc., to automate SDLC processes, to deploy infrastructure as code, to handle complex and/or repetitive configuration and OAM tasks, and as needed in the course Sprints. *Programmatically approach to problem-solving with good modeling and development practices such as object orientation, functional, and pattern programming, use of APIs, TDD, unit testing, integration testing, etc. *Design, provision, configuration, operation, and maintenance of systems and related infrastructure for ecommerce websites on privately hosted and cloud based environments. *Creation, deployment, configuration, operation, and maintenance of machine images for computing systems. *Provision, configuration, operation, and maintenance of computing systems. *Design, development, deployment, configuration, operation, and maintenance of serverless applications. *Creation/provision, deployment, configuration, operation, and maintenance of containerized applications, services, microservices, etc. *Design, deployment, configuration, operation, and maintenance of containers orchestrators for containerized applications. *Write Helm charts, Prometheus Operators, PromQL queries, cluster abstractions, etc., for EKS. *Design, deployment, configuration, operation, and maintenance of storage and backup systems. *Design, deployment, configuration, operation, and maintenance of databases. *Design, deployment, configuration, operation, and maintenance of network systems and network infrastructure. *Design, deployment, configuration, operation, and maintenance of IT and security controls. *Design, deployment, configuration, operations, and maintenance of websites, applications, services, and hosting infrastructure instrumentation. *Design, deployment, configuration, operations, and maintenance of the monitoring (metrics and events) infrastructure. *Create microservices that interact with ecommerce distributed applications and systems to increase visibility and reliability of hosted applications, hosted services, and cloud infrastructure. *On-going review of metrics and events for performance and capacity planning, and to minimize operational risk. *Investigate and troubleshoot issues to enhance ecommerce websites, application, service, and infrastructure performance - by reviewing APM metrics, errors, transactions, and traces, BRUM metrics, Website metrics, Cache hit ratios, VM/system performance metrics, WAF metrics, events and alerts, performing thread dump analysis and core dump analysis, etc. *Management of runtime environments (JRE, node, etc.) for ecommerce applications and services. *Upgrade of frameworks, tools, applications, services, infrastructure, etc., for reasons of compliance, to address bugs, vulnerabilities and exposures, for support, etc., in accordance with established IT policies and procedures. *Conduct and assist with hardware and software audits of ecommerce sites, applications, services, and infrastructure to ensure compliance with established IT standards, policies, and procedures. *Support security assessments including audits of IT and Cybersecurity controls, static and dynamic code analysis, vulnerability tests, penetration tests, etc. *Assists in the creation, deployment, configuration, operation, and maintenance of version control systems. *Assist in the definition, configuration, and implementation of development workflows, branching strategies, CI/CD pipelines, and environments to support the SDLC. *Assist with release management and deployment of ecommerce applications and serviced in various configurations (standalone processes, batch, containerized, clustered, microservice, serverless, etc.), to different environments (development, integration testing, QA, staging, load and performance testing, production, etc.), mpa, spa, pwa, native; HA, active-standby, active-active (global load balancing), etc. *Assist in the creation, evaluation and selection, deployment/implementation, operation, and maintenance of ecommerce ancillary systems and tools including A/B testing tools, chat, CDNs, caches, contact centers, content management, data lakes, data warehouses, DNS servers, email services, file transfer services, key management services, image management services, message queues, NoSQL DBs, OMS, personalization, product inventory management, proxies, secrets and parameters management, trending, etc. *Research, development, and implementation of technologies, including open sources technologies, to support continuing innovation in the business, the SDLC, and the infrastructure supporting them. *Cyber incident exercises and response. *DR planning and exercises. *24/7 on-Call support *Provide coverage as needed for special ecommerce events, periods, and dates like Holidays and Peak Season. *Address organizational, people, SDLC, websites, applications, services, and infrastructure ad-hoc requests and issues. *Regularly provide advice and recommend actions involving complex issues.
REQUIREMENTS:
Must Have
*Degree in Computer Science, Engineering, or related field *Experience as a SRE, DevOps Engineer or equivalent software-engineering role *5 years of experience in an IT engineering/administrator role. *Proficient in UNIX/LINUX systems administration, networking, security, and scripting of OAM tasks. *Expert knowledge of configuration management tools (Ansible), infrastructure provisioning tools (Terraform), CI/CD tools (Jenkins), and version control with Git *Expert knowledge of container technology and tools (Docker, Kubernetes) *Programming proficiency in at least one general purpose programming language (Python, Java, C++, Go). *Expert knowledge of AWS *Working knowledge of SQL and NoSQL databases *Proficient working with open source technologies *Expert knowledge on websites, applications, services, and hosting infrastructure instrumentation. *Expert knowledge in websites, applications, services, and hosting infrastructure monitoring and event management with AppDynamics, Splunk, Prometheus, etc. *Expertise in software development methodologies *Knowledge of best practices and IT operations in a 24x7 operation *Engineering leadership experience *Creative engineering and problem-solving mindset *Ability to break down tasks into stories and deliver incrementally *Ability to work independently and as part of a team *Strong communication skills
Great to Have
*Ecommerce background *Ecommerce InfoSec experience *Experience with Redis, Varnish, and/or Akamai *Solid understanding of modern software architectures *RH, Ansible, AWS, and Kubernetes Certifications
We are committed to affirmatively providing equal opportunity to all associates and qualified applicants without regard to race, color, ancestry, national origin, religion, sex, marital status, age, sexual orientation, gender identity or expression, legally protected physical or mental disability or any other basis protected under applicable law.
We are committed to affirmatively providing equal opportunity to all associates and qualified applicants without regard to race, color, ancestry, national origin, religion, sex, marital status, age, sexual orientation, gender identity or expression, legally protected physical or mental disability or any other basis protected under applicable law.
Company info
J.Crew
Website : https://www.jcrew.com/