Founded in 1999, CafePress calls itself “the world’s best gift shop,” but that label doesn’t begin to describe the breadth or scope of what the e-commerce company actually does. Offering on-demand printing from its Louisville, Kentucky, manufacturing facility, CafePress is a pioneer in customization, its website serving as a hub for designers and consumers alike.
Indeed, with artists and writers making their designs and slogans available to be printed on any of the site’s vast array of merchandise, consumers coming to the site to create individualized wares, and major entertainment companies offering up their own products for personalization, CafePress.com has become the go-to place for customized products.
Today that means offering more than one billion items from a global community of more than two million designers through the CafePress.com e-commerce website. To power it all, CafePress now relies on a hybrid cloud infrastructure, primarily employing Amazon Web Services (AWS) to help deliver its IT infrastructure and New Relic to support scalable, reliable application delivery on top of that infrastructure.
Quickly resolving hidden performance issues
Like many businesses today, CafePress is a technology company at heart, depending on a host of home-grown and third-party applications to keep its website, manufacturing and fulfillment facilities, and back office humming.
In the early days, however, getting a true picture of application performance was elusive. And when applications did experience hiccups, they were slow to be detected and even slower to be diagnosed and fixed. Describing that time, CafePress Vice President of Engineering Bryan Downs says, “In those days, when an issue arose, every engineer on the team would offer a theory as to the root cause, but it was all just guesswork. Meanwhile, our manufacturing plants would be experiencing issues that lasted for weeks.”
And even when things seemed to be going well on the e-commerce site, the team didn’t have much to go on besides their own gut feelings. Says Downs, “We’d browse the site without problems and figure that performance must be the same for the other 10,000 concurrent users, which of course wasn’t necessarily true.”
What Downs and team needed were the metrics that would prove their theories correct—or incorrect—and the visibility to spot trouble before real issues arose. That’s why CafePress became one of New Relic’s earliest customers, deploying its SaaS-based application performance monitoring solution, New Relic APM.
Achieving deep visibility
Nearly a decade later, much has changed within CafePress’ IT environment—which is now largely cloud-based through AWS—but one thing that’s remained constant has been the use of New Relic monitoring. CafePress Manager of Business Technology Cody Martinho, whose team manages the network operation center (including site scope and site availability), can’t imagine a time when this wasn’t the case.
“Today, New Relic monitoring is fully integrated within our websites and throughout the majority of our internal applications,” says Martinho. “My team is responsible for configuration and incident response within New Relic so we work with the other development teams to define the applications metrics that will establish the thresholds for alerting.”
The resulting deep visibility—across a hybrid cloud environment that includes .NET Core for new applications and .NET Framework for older applications—enables Martinho and team to not only quickly address any issues that arise but also to identify (and correct) the trends that could lead to performance problems down the road. “New Relic’s traces provide a level of detail that we aren’t able to get with any of our other tools,” he says. “As a result, we’re able to really get in and troubleshoot an issue to completion.”
Downs agrees. “We have a lot of services talking to one another,” he says. “With New Relic we’re able to see the dependencies, both internal and external. So, for instance, if website performance is suffering because our external search service has slowed, we’re able to jump into New Relic and see the reason for this—whether it’s that our databases have slowed down, or that our Elastic search is underperforming, or whatever else is causing the problem.”