5.2

million online transactions every day

80%

improvement in mean time to resolution

25%

improvement in resolving P1 incidents within 60 minutes

Every day, William Hill publishes 5.1 million price changes, where everything is updated in real-time. 

Founded in the UK in 1934, William Hill is one of the world's leading betting and gaming companies and one of the most trusted brands in the industry. In 2019, the business took some 5.2 million online transactions every day. That’s 74% more than Amazon UK on its highest-ever trading day and three-and-a-half times more than the highest-ever FTSE trading day. 

The bookmaker is undergoing a profound digital transformation toward becoming a scalable, digitally-led global business. Three years ago there was a major shift to DevOps to drive the speed and innovation that the competitive market requires. And with hundreds of releases to production happening every day, DevOps is critical. William Hill is also busy migrating to Amazon Web Services (AWS), using Terraform and Lambda for further business agility and scalability.

"We have a data center burndown strategy to evacuate our data centers by a certain time and migrate to the cloud. It was all about rapid speed of agility, adoption, capacity-on-demand," says Andrew Longmuir, Engineering Manager Observability, Automation & AIOps at William Hill. "We also made it a lot more agile to deploy our products and apps and pipelines. We did a lot of work around CI/CD pipelines to enable that as well. For businesses like ours, it was key we did that. It's all about first-to-market."

While William Hill is clearly a top player in the bookmaker market, issues can occur rapidly in its stack, given its real-time nature and complexity. Having an observability strategy in place is critical to the business model. The company needed to consolidate monitoring tools into a self-service observability platform that would enable teams to be self-sufficient in building and supporting systems and fix issues quickly whenever they occurred. 

New Relic provided a technology that was a perfect fit with William Hill’s modern cloud architecture. The necessary real-time insights allowed the company to make more perfect customer experiences and save money by the minute. 

Stephen Wild, engineering manager for observability and automation at William Hill, discusses how New Relic helped them improve MTTR by 80%.

Always on the edge of technology 

Understanding the revenue impacts of technical outages across all production business services is a key objective within William Hill's observability strategy. To help teams gain the real-time observability needed to achieve this, Longmuir's team built the Impact Listener on top of New Relic capabilities to track P1 incidents. The tool can be mapped onto any business service and any metric in real-time to provide context and insights into service impacting incidents during the entire incident lifecycle. 

New Relic is the primary trigger to launch the Impact Listener workflow: Alerts for critical incidents are sent to PagerDuty. At the same time, Impact Listener correlates the issue to the revenue being lost, and dashboards are created in New Relic in real-time. 

"It enables us to look at measurable improvements across revenue streams and how we respond and improve our response to service impacting incidents. If there is an issue, New Relic helps us spot it very quickly, and you can see how much it costs us while the outage is ongoing. And then guide the team to fix priority issues at speed," says Longmuir. 

"Stakeholders span from executives to product owners to technical teams. We want everyone to consume these metrics in the Impact Listener and dashboards. Providing costs insights is a chaos engineering principle, and we've been working on these capabilities for some time via our custom downtime calculators. By using New Relic, we are limiting that barrier to entry to open monitoring, to non-monitoring experts, so they can consume and adopt that technology as quickly as possible," explains Longmuir. 

With New Relic, we don't have to build dashboards and create alerts for everybody. We've federated that ownership to our teams. We don't have to feed and water a monitoring platform. We don't have to support it. We don't have to maintain it. We don't have to patch it. We don't have to scale it for big events. We can focus on the real value-add around things like integrations and making sure we get those observability standards and practices in place and driving a monitoring-as-code strategy, whilst also being the SMEs.

Using real-time investigation and correlation to quickly identify issues

The plan is that for incident retrospectives, William Hill leverages the Impact Listener to create post-mortem reports for operational support teams, SREs, and development teams to evaluate how they can triage similar incidents in the future. This, together with real-time analytics, allows the teams to start to drive KPIs and continuous improvement. The KPIs are published, tracked, and made accessible to all employees via New Relic's dashboards for each business service.

William Hill also uses New Relic dashboards for proactive alerting. That is, to spot trends and flags where teams need to improve. "If, for example, we get a particular business technical service that generates more minutes of outages per month than it had previously, we want to focus on that to improve it and ask the right questions," Longmuir explains. "By having higher availability in our systems, we are giving our customers a great and safe experience. I think this shows that we are leveraging our monitoring ecosystem to demonstrate cohesive value and using the strengths of each of our tools through custom integration. I think this also shows New Relic's value and how effective and innovative monitoring measures improve our services."

William Hill aims to leverage New Relic's platform to democratize monitoring costs across business units and accurately move to a showback, chargeback model with their DevOps teams. 

Pushing the boundaries 

With an improved ability to correlate technical problems to business outcomes, Longmuir and his team see significant improvements in their troubleshooting efforts. "We've set ourselves a goal of resolving every P1 within 60 minutes. Just before we started deploying New Relic's technology, I think we were at 60%. This has increased to 75%. For 2021, we are looking to achieve 90% of P1s resolved within under 60 minutes with New Relic's help and implement our Impact Listener across all business services and for less critical issues. We have been working closely with our production engineering team to fully automate insights into our incident lifecycles and provide true on-demand transparency to the business and our critical services."

William Hill is committed to being at the forefront of the latest technology developments and using technology to create a better customer experience. No matter how the technology stack evolves going forward, New Relic gives the company the observability it needs to understand its complex systems, maintain performance, and embrace change with confidence. The platform empowers William Hill's teams with insights, and greater control while maximizing business benefits.

We've set ourselves a goal of resolving every P1 within 60 minutes. Just before we started deploying New Relic's technology, I think we were at 60%. This has increased to 75%.