Senior Site Reliability Engineer
Who we are
Stavvy is transforming how business is conducted remotely by making complex legal and financial transactions easier, safer, and more accessible to all. Whether we are working to enable title companies to facilitate remote closings in a safe way, better connecting lenders with the businesses they use during the home buying process, or building the next set of tools for the platform, Stavviators (our employees) are disruptors at heart. Our team is constantly iterating, solving problems, and working together to simplify life's defining moments. If you want to help power the paperless revolution, join us at Stavvy!
Who you are
You are a curious and skilled Site Reliability Engineer with a proven track record of successfully managing and scaling a live production environment. You have proficient hands-on experience working with centralized logging or Elasticsearch, standing up new services, scripting, and diagnosing server and application issues with monitoring tools such as Prometheus or Grafana. You’re fluent in scripting with Python and Bash and possess a strong knowledge of working with Cloud providers and Infrastructure as Code (AWS and Terraform). You also have DevSecOps experience, such as CICD pipelines with CircleCI, React, Cypress , Docker, ECS, Kubernetes, or HashiCorp Vault. You’re collaborative and team-oriented with at least 4 years of demonstrated experience as a Site Reliability Engineer. We’d be super impressed if you also have a background in Security or Corporate IT!
What You’ll Do
In this Senior Site Reliability Engineer role, you will play an integral role in helping Stavvy achieve operational excellence for engineering. You’ll be responsible for building out infrastructure required to support a Public API and accompanying developer documentation for our existing products, as well as building out a framework for deploying and rolling back service. You’ll be working closely with the Product team, backend engineering, and tech leadership to achieve increased observability of all production errors and a stable deployment and rollback/recovery process. You’ll have the unique opportunity to make things scalable and do it right early on.
In this role you will:
- Own the build out of infrastructure required to support a Public API
- Help build a self-service platform for developers to trigger their own deployments and respond to failures
- Improve monitoring and alerting to detect outages and problems
- Increase visibility into our system with observability tools
- Participate in design reviews and take ownership of projects
- Triage issues on deployment and post-release
- Expand CICD pipelines to facilitate the development of more automated testing
- Identify and develop areas for automated remediation to system and security events and vulnerabilities
If this sounds like a company you would like to join and a role you would thrive in, please don’t hold back from applying! Stavvy is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status or any other characteristic protected by local, state, or federal laws, rules, or regulations.