Back to Job Search

Platform SRE DevOps Engineer

Job Description

Job Description

As part of the new eCommerce platform team, we require Site Reliability Engineers (SRE). You will be focused on ensuring resilience, observability, and release automation.

The role will entail operational work such as handling escalations, being on-call to respond to production issues, and fixing problems. Secondly your focus will be on automation. The expectation is that this will be done hands on writing code. As an SRE engineer you will be focused on building tooling and automation across the various parts of the ecommerce platform to ensure it maintains its service level objectives.

We are a progressive team and encourage flexible/remote working and are fully embracing new agile ways of working. We have a desire to operate in fully automated CI-CD setup & tooling, releasing regularly during the business hours (no unsociable working hours) with squads working in a DevOps & SRE mindset & approach. We are already part of the way there and need your help to progress and mature in this goal.

As an SRE you will be expected to drive greater operational maturity and automation while working closely with development squads to provide feedback on operational improvements.

Your Role:

The Platform SRE (Site Reliability Engineer) role is part of E-commerce Platform function and is integral to Asda 'Future' programme which is transforming the organisation. eCommerce is building the future state websites, core platform and backend fulfilment capabilities. The platform team help provide capabilities and services that are used by multiple teams within the organisation to ensure performance and stability to all our customer facing websites.

  • Debug production issues across services and technology stack.

  • Consult on new cloud patterns; improving system resilience, performance and stability.

  • Support Prod deployments, pipeline engineering and maintenance & build failures (squads responsible for releasing their own code/packages)

  • Platform Ops - L2 quick fixes (restart env/jobs, check monitoring dashboard, reset access etc)

  • Monitoring & Observability - Configure & extend monitoring, egress/ingress data lake (consolidation of Azure Integration Services, 3^rd party events), Business Monitoring

  • Automation & Self-healing (incl. event based triggers)

  • Ensuring consistency of technology usage across a programme, by continuously reviewing existing toolsets and code and suggesting re-use of components.

  • Ensuring system SLOs/SLis and performance are monitored and alerted on.

Candidate Description

  • Software engineering background

  • Hands-on experience designing, building, delivering and operating production-grade software at scale

  • Experience with troubleshooting distributed systems

  • Strong opinions informed by experience of continuous delivery, distributed architectures, testing, everything-as-code, containerisation, orchestration, cloud services and incident response

  • Comfortable having in-depth discussions, troubleshooting and debugging systems and reading/writing code

  • Experience working within an Agile environment

  • Experience with enterprise APM monitoring tools

  • Working knowledge of system architectures and networking


  • Salesforce experience

  • Azure cloud experience

  • Experience with CI-CD tooling e.g. code quality, security, accessibility, testing framework integration

  • Worked as DevOps / SRE engineer

Application Description

If you have any questions, then please email, Asda House - Leeds Asda House, Great Wilson Street, Leeds, West Yorkshire, United Kingdom, LS11 5AD Loading…