Site Reliability Engineer
Toast Overview:
We are a rapidly growing company that’s revolutionizing the way the restaurant industry does business by pairing technology with an unrivaled commitment to customer success. We help restaurants streamline operations, increase revenue, and deliver amazing guest experiences through our platform that combines restaurant point of sale, guest-facing technology, and award-winning customer support. As a Toaster, you will be challenged to take on meaningful projects that will help shape the future of the company. Join us as we empower the restaurant community to delight guests, do what they love, and thrive.
Job Overview:
The Performance team is looking for a self-motivated person who loves improving application performance. Toast engineering teams are pushing the boundaries of Android performance and building a highly reliable and scalable AWS-hosted platform that supports our fast growing customer base. The team’s mission is to drive architectural decisions through observation, measurement, and validation. We build performance testing and observability frameworks that make it easy for engineers to quickly get self-service performance and scalability feedback about their proposed code and infrastructure changes. Join the Performance Engineering team to champion performance, deliver fast applications, and drive our platform to architectural excellence.
Recent projects include:
- Building out an observability framework that monitors the health and performance of our fleet of tens of thousands of devices in production.
- Using Espresso, JMeter, and the ELK stack, we built a simulation of a high volume customer that we use to run various experiments with.
- Deploying a synthetic monitoring solution in production that tracks, trends, and alerts on the performance of our critical transactions.
As a site reliability engineer, you will be:
- Researching and developing tools and frameworks to evaluate performance of our platform: from our mobile app on Android devices to our distributed systems hosted in AWS.
- Developing fully automated systems for accurate, consistent, and reliable measurement of performance metrics.
- Partnering with product management and development teams to build the right set of performance tests and performance monitors that identify our platform capabilities and limitations.
- Executing performance tests, analyzing results, and documenting and presenting those results.
- Working in a fast paced agile environment with multiple development teams and providing direction to the teams on performance concerns and improvements.
- Creating and evangelizing performance standards across software applications and teams.
- Integrating performance tests into the CI pipeline.
- Ensuring safe use of the product and the platform by defining and communicating appropriate guardrails with Engineering and Product stakeholders.
- Determining how to meet growing capacity requirements while ensuring reliability within the platform.
Do you have the right ingredients?
- A minimum of 2 years’ experience in software development, DevOps, site reliability, software performance engineering, or software engineering
- Strong knowledge of the Linux OS
- Programming and data analysis skills (we use Java and Python but experience in any language is fine)
- Eagerness to design and create tools that exercise our platform and measure platform performance
- The ability to deal with unexpected issues and try new approaches to get yourself unstuck
- The desire to work in a highly autonomous fast-paced environment where priorities may be ambiguous
Preferred ingredients include:
- Well-developed clear communication skills and the ability to influence others
- Experience in performance testing of complex systems
- Knowledge of the performance of Java-based applications and AWS-hosted applications
- Scripting ability (shell, python)
- Understanding of other technologies such as networking, virtualization, storage, monitoring, etc.
- Experience with monitoring and analysis tools such as New Relic, Dynatrace, DataDog, Splunk, Sumologic, Sentry, and the ELK stack.
- High-level of comfort and know-how to work with senior developers, product managers, and engineering directors across the organization