Veritone

Site Reliability Engineer II

Reposted 20 Days Ago

In-Office or Remote

52 Locations

125K-135K Annually

Senior level

In-Office or Remote

52 Locations

125K-135K Annually

Senior level

The Site Reliability Engineer II will manage and ensure the reliability and efficiency of SaaS application platforms, leveraging tools for automation, monitoring, and incident response while collaborating with various teams.

The summary above was generated by AI

POSITION SUMMARY

The ideal candidate will have 5+ years of experience in Linux systems and software management, expertise with Terraform, Ansible, and cloud platforms like AWS, Azure, and GCP. Experience with large-scale distributed systems, monitoring/alerting systems (Prometheus, Grafana), CI/CD pipelines, container orchestration (Docker, Kubernetes), and programming languages (Go, Java, Python) is essential. A background in implementing security controls, automating deployments, and troubleshooting complex systems is also required.

‎

WHAT YOU'LL DO

Deploy and maintain a resilient, secure, and efficient SaaS application platform to meet established SLAs.
Automate, monitoring, management and incident response to achieve an auto-remediation system.
Monitor site stability and performance and troubleshoot site issues.
Participate in on-call rotation to ensure stability and uptime for our platforms.
Scale infrastructure to meet rapidly increasing demand.
Collaborate with cross-functional teams working with Engineering, Product, Services, and other departments.
Collaborate with developers to bring new features and services into production.
Independently design and develop tools to aid in operations and automation as well as work jointly with other team members to deliver innovative solutions to complex business and technical challenges.
Provide deployment and operations support for multi-tiered distributed software applications.
Estimate engineering effort, plan implementation, and rollout system changes that meet requirements for functionality, performance, scalability, reliability, and adherence to development goals and principles.
Collaborate in a fast paced environment with multiple teams (software development, release management, build and release, etc...).
Collaborate in a fast paced environment with multiple teams in a dynamic entrepreneurial organization
Defining how the behavior of large scale systems can be achieved
Measuring and achieving reliability through engineering and operations work
Monitoring and alert development, documentation and management with the goal of creating an auto-remediation system
Adapting security controls to product not typically native to GA releases
Developing automation methods to extend standard deployment pipelines for bespoke implementations
Patching, policy enforcement, and audit of production

‎

WHAT YOU'LL NEED

Expertise with Infrastructure-as-Code such as Terraform.
5+ years of professional Linux systems and software management experience
Knowledgeable with code languages including: Python, Go, Node.js, Java
Experience with managing infrastructure within Azure, GCP and AWS
Expertise in Kubernetes management, upgrades
Strong script skills for systems and data driven solutions
Strong GitOps and CICD experience with tools such as Jenkins, ArgoCD, Helm
Extensive experience in troubleshooting large-scale distributed systems.
Comprehensive background in monitoring and alerting systems in auto-remediation systems including Prometheus, Grafana
Proven examples of standardizing security controls across large-scale systems
Comfort working within project/task management platforms.

Systems and Tools

Cloud platforms including: AWS, Azure, and GCP.
Infrastructure coding languages: Terraform, Cloudformation, Ansible, Puppet, Python
CI/CD: experience working with and supporting build and deploy pipelines and tools: Jenkins, ArgoCD, GitHub Actions, Rundeck
Datastore Management and Query skills: Postgres, MySQL, MongoDB, MSSQL, ElasticSearch, Solr
Container orchestration platforms: Docker, Kubernetes, EKS, AKS
Familiarity with coding languages including: Go, Node.js, Java, Python
Monitoring/Alerting Tools: Prometheus, Grafana, VividCortex, Runscope, Cloudwatch, Monitor, VictorOps
OS and Container Hardening: STIG, CIS, SELinux, IPTables, FIPS 140-2, FIPS 140-3
JSON data structures and database schemas
API Query language: REST, GQL

Bonus Points If

Bachelor’s degree in Computer Science or related field
Have worked in regulated or public sector environments through development and assessment of cloud based solutions
Worked with, developed, or supported continuous integration/continuous deployment systems
Have concrete examples ready to present for creating auto-remediation systems

DISCLOSURE

Our company provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability or genetics.

(Colorado & California Only*): The Annual salary listed for the position is a range of $125,000-$135,000. This base pay is for illustrative purposes only and will be determined based on skills and experience comparable to the job requirements. This position may be eligible for additional compensation and benefits including but not limited to: incentive compensation; health benefits; retirement benefits; life insurance; paid time off; parental leave and benefits; and other employee perks and benefits.

*Note: Disclosure as required by sb19-085 (8-5-20) of the minimum salary compensation for this role when being hired in Colorado & California.

‎

Top Skills

Ansible

Argocd

AWS

Azure

Cis

Docker

Elasticsearch

Fips 140-2

Fips 140-3

GCP

Grafana

Helm

Iptables

Java

Jenkins

Kubernetes

Linux

MongoDB

Mssql

MySQL

Postgres

Prometheus

Python

Selinux

Solr

Stig

Terraform

Similar Jobs

Circle

Senior Site Reliability Engineer

6 Days Ago

Remote

United States of America

148K-195K Annually

Mid level

148K-195K Annually

Mid level

Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3

The Senior Site Reliability Engineer at Circle builds and maintains infrastructure, collaborates on software development, and ensures system scalability and reliability through effective practices.

Top Skills: AWSGoGoogle Cloud PlatformJavaKubernetesAzureSQL

Circle

Senior Site Reliability Engineer

6 Days Ago

Remote

United States of America

148K-195K Annually

Mid level

148K-195K Annually

Mid level

Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3

The Senior Site Reliability Engineer builds and maintains infrastructure, develops scalable microservices, and collaborates with teams to improve software delivery and system reliability.

Top Skills: AWSGoGoogle Cloud PlatformJavaKubernetesAzureSQL

Toast

Site Reliability Engineer

11 Days Ago

Remote

USA

112K-184K Annually

Mid level

112K-184K Annually

Mid level

Cloud • Fintech • Food • Information Technology • Software • Hospitality

The Sr. Site Reliability Engineer will automate incident and change management processes, optimize efficiency, and collaborate with stakeholders to maintain reliability at Toast.

Top Skills: AWSAzureFirehydrantGCPGoJIRAPythonTerraform

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories