Pantheon is a website operations and hosting platform designed to deliver improved speed, scalability, and reliability for customers leveraging Drupal and WordPress for website delivery. Powering more than 200,000 sites and supporting billions of monthly pageviews, Pantheon is used by organizations across the globe, including Stitch Fix, Datastax, and the ACLU.

Success at Pantheon comes from a consistent focus on engineering excellence and adoption of modern development methodologies and technologies. From the beginning, the platform was developed to leverage public cloud services in order to maximize performance and keep the company’s developers focused on its core technology. Written using a combination of Go, PHP, Python, and Node.js, the Pantheon platform runs on both open source technologies (Cassandra, Linux, and Kubernetes) and licensed technologies (the Fastly content delivery network, or CDN) to ensure performance, agility, and scalability for customers. The combination of a DevOps culture with a cloud-native approach to application development and delivery enables rapid iteration, resulting in a steady flow of innovative features for customers.

Optimizing cloud delivery

Pantheon was originally built using a first-generation cloud services provider. While the platform was satisfactory, before long the pace of innovation at Pantheon began to outpace that of its existing cloud services provider.

After evaluating several solutions, Pantheon chose to replace its existing provider with Google Cloud Platform (GCP), including Google Compute Engine, Google Big Query, Google Kubernetes Engine (GKE), and Cloud SQL. With GCP, Pantheon has access to capabilities supporting its current and future road map, including machine learning, integration with G Suite, and support for Kubernetes, which is critical to the future of Pantheon’s core container orchestration layer.

Migrating with confidence  

After choosing GCP, Pantheon went to work mapping out its migration strategy. With over 200,000 websites on the platform, all with custom application code and many with multiple active development environments with any number of technical variations, migration was anything but a straightforward path. However, in the manner consistent with its DevOps culture, Pantheon planned and tested over the course of three months to ensure its migration was a success. “Before we moved a single workload over to GCP, our engineering team mapped out a detailed plan for how we would successfully migrate, including establishing baseline metrics and acceptance criteria,” says Josh Koenig, co-founder and head of product at Pantheon.

Critical to this process was New Relic APM. By using APM in both its legacy and GCP environments, Pantheon was able to gain insight into baseline metrics such as response time, error rates, and throughput. (For examples of cloud migration best practices, read the New Relic Tutorial: Measure Twice, Cut Once). “Using New Relic, if an error occurred with a workload migrated from our old cloud platform to GCP, we could see down to the line of code what was causing the issue and resolve it quickly,” says Koenig. “Because we had clear insight into our data, we never had to question if an error or performance degradation was caused by the app or the cloud platform.”

After testing, migration began. “Using New Relic, we migrated 50,000 customers and over 200,000 total websites in just two weeks with no major incidents,” says Koenig. “We didn’t receive a single customer support call due to the migration and didn’t open a single ticket with GCP during the process. We were amazed.”

“Using New Relic, we migrated 50,000 customers and over 200,000 total websites in just two weeks with no major incidents.”

Josh Koenig Co-Founder & Head of Product, Pantheon

Additionally, overall customer site performance has improved due to features available in GCP and as a result of the diligent troubleshooting Pantheon conducted using New Relic APM. “During migration, we identified a few edge cases that caused some performance regression,” says Koenig. “Using New Relic, we were able to do a quick root cause analysis and fix the problems before any of our customers noticed. Not only were we able to eliminate any major incidents from occurring, we were able to deliver a 5% performance improvement to customers.”  

By moving to GCP, Pantheon has reduced its cloud services spend significantly. “We were able to reduce our cloud spend by about 40% by migrating to GCP, and using New Relic helped accelerate realization of those savings,” says Koenig. Pantheon can now focus less on maintaining infrastructure and more on innovating its product.

Zero failure performance, availability, and reliability

Pantheon was able to expertly leverage New Relic for its cloud migration initiative due to its extensive experience with the product, as New Relic APM is included with every Pantheon site.

With over 200,000 websites hosted on Pantheon’s platform, customer requirements vary from simplistic to business critical, but with its container-based approach, Pantheon can handle a wide variety of use cases effortlessly. With a diverse set of use cases across a large number of customers, site reliability can be difficult to ensure, leading Pantheon to partner with New Relic. By including New Relic APM at no additional cost for development, test, and production environments, Pantheon customers running business critical sites can maintain the highest standard of site performance, availability, and reliability. 

Because New Relic is a SaaS-based platform, it meshed perfectly with Pantheon’s cloud-first strategy and customer expectations. “New Relic APM has unmatched capabilities for monitoring cloud-based applications, providing a rock-solid website safety net for our customers,” says Koenig.

Including New Relic APM with every site makes Pantheon’s service offering stronger. Troubleshooting complex sites with custom logic and connecting to external services can be difficult. Without a tool to provide visibility into the source of issues, minor frustrations can turn into major incidents, even resulting in site outages. Granular visibility enables Pantheon to isolate incidents and provide quick resolution for customers. “With New Relic APM, we can help customers identify and diagnose latency issues, path errors, and performance bottlenecks across components and services, down to specific lines of code. If an external service is causing the problem, we have visibility into that as well,” says Koenig.

Looking to the future

Moving ahead, Pantheon will leverage features of both New Relic and GCP to provide a better product and overall experience for customers. Explains Koenig, “Kubernetes will play a big role in how we evolve our container orchestration layer moving forward. Features in both GCP and New Relic have the potential to take our adoption of Kubernetes to the next level. This translates into innovations our customers will benefit from now and into the future.”

“New Relic APM has unmatched capabilities for monitoring cloud-based applications, providing a rock-solid website safety net for our customers.”

Josh Koenig Co-Founder & Head of Product, Pantheon