Site Reliability Engineer
Why be a Toaster?
“I work here because Toast values the importance of a platform that performs and scales well. Toast is growing quickly: every week we achieve a new peak traffic mark. The Performance Engineering team is working alongside development teams, building load testing and monitoring frameworks that identify the limitations and capabilities of our platform to ensure customer happiness.” - Chris W., Performance Engineering Team Lead
We are a rapidly growing company that’s revolutionizing the way the restaurant industry does business by pairing technology with an extraordinary commitment to customer success. We help restaurants streamline operations, increase revenue, and deliver amazing guest experiences through our platform that combines restaurant point of sale, guest-facing technology, and award-winning customer support. As a Toaster, you will be challenged to take on meaningful projects that will help craft the future of the company. Join us as we empower the restaurant community to delight guests, do what they love, and thrive.
The Performance team is looking for a self-motivated individual who loves improving application performance. Toast engineering teams are pushing the boundaries of Android performance and building a highly reliable and scalable AWS-hosted platform that supports our fast growing customer base. The team’s mission is to drive architectural decisions through observation, measurement, and validation. We build performance testing and observability frameworks that make it easy for engineers to quickly get self-service performance and scalability feedback about their proposed code and infrastructure changes. Join the Performance Engineering team to champion performance, deliver fast applications, and drive our platform to architectural excellence.
Recent projects include:
- Building out an observability framework that monitors the health and performance of our fleet of tens of thousands of devices in production.
- Using Espresso, JMeter, and the ELK stack, we built a simulation of a high volume customer that we use to run various experiments with.
- Deploying a synthetic monitoring solution in production that tracks, trends, and alerts on the performance of our critical transactions.
As a performance engineer, you will be:
- Researching and developing tools and frameworks to evaluate performance of our platform: from our mobile app on Android devices to our distributed systems hosted in AWS.
- Developing fully automated systems for accurate, consistent, and reliable measurement of performance metrics.
- Partnering with product management and development teams to build the right set of performance tests and performance monitors that identify our platform capabilities and limitations.
- Executing performance tests, analyzing results, and documenting and presenting those results.
- Working in a fast paced agile environment with multiple development teams and providing direction to the teams on performance concerns and improvements.
- Creating and evangelizing performance standards across software applications and teams.
- Integrating performance tests into the CI pipeline.
- Ensuring safe use of the product and the platform by defining and communicating appropriate guardrails via weekly meeting with Engineering and Product stakeholders as well as ‘Lunch and Learn’ presentations.
- Determining how to meet growing capacity requirements while ensuring reliability within the platform.
Do you have the right ingredients?
- A minimum of 2 years’ experience in software development, DevOps, site reliability, software performance engineering, or QA automation
- Strong knowledge of the Linux OS
- Java programming and data analysis skills
- Eagerness to design and create tools that exercise our platform and measure platform performance
- The ability to deal with unexpected issues and try new approaches to get yourself unstuck
- Well-developed clear communication skills and the ability to influence others
- Preferred ingredients include:
- Experience in performance testing of complex systems
- Knowledge of the performance of Java based applications and AWS-hosted applications
- Scripting ability (shell, python)
- Understanding of other technologies such as networking, virtualization, storage, monitoring, etc.
- Experience with monitoring and analysis tools such as New Relic, Dynatrace, DataDog, Splunk, Sumologic, Sentry, and the ELK stack.
- High-level of comfort and know-how to work with senior developers, product managers, and engineering directors across the organization