Veracode Logo

Veracode

Manager, Site Reliability Engineering

Posted 5 Days Ago
Be an Early Applicant
In-Office
Burlington, MA
Senior level
In-Office
Burlington, MA
Senior level
The Manager, Site Reliability Engineering will lead a team, manage production systems' reliability, enforce standards, and collaborate with various teams to enhance system reliability and developer velocity.
The summary above was generated by AI

Looking for an innovative, high-growth, multi-award-winning company in one of the hottest segments of the security market?  Look no further than Veracode! 

Veracode is a global leader in Application Risk Management for the AI era. Powered by trillions of lines of code scans and a proprietary AI-generated remediation engine, the Veracode platform is trusted by organizations worldwide to build and maintain secure software from code creation to cloud deployment.

Learn more at www.veracode.com, on the Veracode blog, and on LinkedIn and Twitter


We are seeking a skilled Manager, Site Reliability Engineering to lead the reliability, availability, and operational excellence of Veracode’s production systems.This role focuses on defining and enforcing reliability standards, managing risk in production, and ensuring services meet agreed-upon service levels under real-world load and failure conditions.  

The ideal candidate has experience operating large-scale distributed systems in production, driving and implementing SLO-based reliability practices, and partnering with engineering, security, devops and product teams to improve the reliability of the system and developer velocity at the same time.  

Key Aspects of Role 

  • Lead 9 member global Site Reliability Engineering Team
  • Set objectives and key results, KPIs and manage team performance
  • Act as the primary point of accountability for reliability concerns that span multiple teams, including DevOps, Security, Database, and Product Engineering, driving alignment and resolution.
  • Manage team on-call schedule and act as point of escalation for alerts and production incidents
  • Create tickets, groom backlog and prioritize work in sprints
  • Utilize AWS services to design scalable cloud solutions that support critical systems.
  • Partner with software engineering teams to ensure monitoring and alerting is in place, enabling consistent, scalable, and automated service delivery.
  • Own the design and enforcement of the organization’s observability strategy, ensuring continuous improvements in reliability, standardization, and observability across the board.
  • Drive alert hygiene, standardization, and reduction of alert fatigue across the organization. 
  • Lead efforts to automate infrastructure deployment and management using Terraform, Kubernetes, and other cloud-native tools.
  • Create automated incident response workflows to handle common infrastructure and application issues.
  • Collaborate with security teams to ensure systems adhere to industry-standard security practices and policies.
  • Document and train engineering teams on best practices in reliability, scalability, and operational excellence.
  • Design, operate, and continuously improve on-call and incident response processes to ensure sustainability, appropriate escalation, and reduction of operational toil.
  • Contribute to incident and process post-mortems.
  • Ensure uptime, SLAs, and availability of critical platform components through process improvements and automation.
  • Monitor existing application and infrastructure while working to improve existing monitoring.
  • Communicate effectively with project stakeholders and management.
  • Develop and support processes to maintain uptime, SLAs and availability of critical platform components.
  • Troubleshoot and resolve production issues related to systems, network, and application.
  • Ensure that our systems and processes adhere to industry-standard security practices and policies. 

Required Skills/Experience: 

  • Bachelor's Degree in Computer Science, Information Science, Engineering, or related/relevant field or equivalent experience.
  • 2+ years working as a manager or team lead with direct reports
  • 5+ years working in a SRE, DevOps, Cloud Engineering or similar role.
  • Experience with AWS and automation tools like Terraform, CloudFormation, or Ansible.
  • Hands-on experience deploying, managing, and troubleshooting Kubernetes clusters.
  • Hands -on proficiency with observability, monitoring, and alerting tools (Datadog, Sumologic, Prometheus, Grafana, etc.).
  • Familiarity with CI/CD pipelines and repository management tools (e.g., GitLab, Jenkins, GitHub).
  • Strong programming skills for automation (Python, Go, or similar languages).
  • Solid understanding of infrastructure as code (IaC) and GitOps methodologies.
  • Strong communication skills with the ability to collaborate effectively across different teams.
  • Ability to work in an Agile environment.
  • Proven experience in troubleshooting production environments and improving system reliability.
  • Experience with on-call/incident management systems such as PagerDuty, VictorOps or OpsGenie. 

Desired Experience: 

  • Experience with service meshes (e.g., Istio) to enhance application observability and security.
  • Familiarity with advanced Kubernetes features (e.g., StatefulSets, Helm, Operators).
  • Knowledge of database management and migration processes, including RDS and DMS. 

Compensation Transparency 

In accordance with U.S. pay transparency laws, Veracode provides compensation transparency for roles based in the United States. Click here to view our compensation ranges by grade. Please note, specific compensation may be influenced by various factors including candidates experience, education, and work location.  

Job Grade: Manager 

Employment opportunities are available to all applicants without regard to race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.  

Fraudulent Recruitment Alert - Be Aware and Stay Informed

At Veracode, we prioritize a secure recruitment process. Unfortunately, fake recruitment and job offer scams are on the rise. They aim to deceive candidates through emails and calls to obtain sensitive information.

Here’s our recruitment promise to you:

  • Comprehensive Interview Process: We never extend job offers without a comprehensive interview process involving our recruitment team and hiring managers.
  • Offer Communications: Our job offers are not sent solely through email, and we will never ask you to pay for your own hardware.
  • Email Verification: Recruiting emails from Veracode will always originate from an “@veracode.com" email address.

If you have any doubts about the authenticity of an email, letter, or telephone communication claiming to be from Veracode, please reach out to us at [email protected] before taking any further action.


Top Skills

Ansible
AWS
CloudFormation
Datadog
Git
Gitlab
Go
Grafana
Jenkins
Kubernetes
Prometheus
Python
Sumologic
Terraform
HQ

Veracode Burlington, Massachusetts, USA Office

65 Blue Sky Dr, 3rd Floor, Burlington, Massachusetts , United States, 01803

Similar Jobs

8 Days Ago
Remote or Hybrid
Boston, MA, USA
160K-200K Annually
Senior level
160K-200K Annually
Senior level
Fintech • Payments • Software
As a Site Reliability Engineering Manager II, you will lead SRE efforts, mentor teams, coordinate daily activities, and drive production excellence in our cloud infrastructure.
Top Skills: AWSDockerGitlabGoHoneycombJavaJenkinsKotlinLinuxNode.jsPythonRubySentrySumologicTerraform
22 Days Ago
Easy Apply
Remote or Hybrid
USA
Easy Apply
132K-175K Annually
Senior level
132K-175K Annually
Senior level
Cloud • Security • Software
Lead and grow an SRE team for FedRAMP/DoD-compliant government cloud environments. Ensure availability, performance, capacity, incident response, automation, and Continuous Monitoring (ConMon) to maintain ATOs and regulatory evidence for audits.
Top Skills: AWSAzureBashDatadogDockerDod Cc SrgFedrampGCPGitGitlabGoJavaJenkinsKubernetesNist Sp 800-53PagerdutyPrometheusPythonSplunkTerraform
2 Hours Ago
Hybrid
Boston, MA, USA
56K-83K Annually
Junior
56K-83K Annually
Junior
Artificial Intelligence • Automotive • Greentech • Information Technology • Machine Learning • Software • Cybersecurity
As a Client Trainer II at Cox Automotive, you will deliver impactful training for clients, helping them adopt new technologies, while traveling nationwide.
Top Skills: Cox Automotive ProductsMS OfficeSalesforce

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account