Akamai Technologies Logo

Akamai Technologies

Senior Site Reliability Engineer

Reposted 14 Hours Ago
In-Office or Remote
2 Locations
121K-219K Annually
Senior level
In-Office or Remote
2 Locations
121K-219K Annually
Senior level
As a Senior Site Reliability Engineer, you will manage reliability for AI platforms, automate tooling, integrate workloads, and collaborate with product teams on infrastructure decisions.
The summary above was generated by AI

Do you enjoy solving complex reliability challenges for cutting-edge technology?

Do you have a passion for automation and building systems that scale?

Join the Akamai Inference Cloud Team!

The Akamai Inference Cloud team is part of Akamai's Cloud Technology Group. We design, implement, deploy and operate AI platforms that enable customers to run inference models and developers to create AI applications with unmatched performance, compliance, and economics.

Partner with the best

In this role, you'll own reliability workstreams for Akamai's serverless inference platform, build automation and tooling, and contribute to architecture and operational decisions. Opportunities exist to take ownership of critical reliability problems end-to-end, partner with product engineering teams, and develop expertise in GPU infrastructure, Kubernetes at scale, and AI inference workloads.

As a Senior Site Reliability Engineer, you will be responsible for:

  • Building and maintaining observability for AI workloads, including telemetry, dashboards, alerts, SLO/SLI tracking, and driving improvements when targets are missed
  • Writing automation and tooling to reduce operational toil, improve deployment safety, and accelerate incident response
  • Integrating AI workloads into Akamai's existing incident management processes, building runbooks, participating in on-call rotations, and conducting blameless post-mortems
  • Building and maintaining CI/CD integrations, deployment safety checks, and rollback automation
  • Collaborating with product engineering teams to improve reliability, contribute to architecture decisions, and ensure operational readiness for product releases
  • Contributing to capacity planning, autoscaling configuration, and workload scheduling for AI compute infrastructure

Do what you love

To be successful in this role you will:

  • 5+ years of experience in SRE, infrastructure engineering, or platform engineering, working with large-scale distributed systems
  • Have extensive experience with Kubernetes and containerization at scale
  • Have experience defining SLOs and working with observability tools such as Prometheus, Grafana, and distributed tracing
  • Possess coding ability in Python or Go for automation and tooling, with experience in CI/CD pipelines, deployment safety, and infrastructure-as-code
  • Interest in or experience with AI/ML infrastructure, model serving, or GPU workloads
  • Possess the ability to take ownership of problems and drive them to resolution independently

Work in a way that works for you

FlexBase, Akamai's Global Flexible Working Program, is based on the principles that are helping us create the best workplace in the world. When our colleagues said that flexible working was important to them, we listened. We also know flexible working is important to many of the incredible people considering joining Akamai. FlexBase, gives 95% of employees the choice to work from their home, their office, or both (in the country advertised). This permanent workplace flexibility program is consistent and fair globally, to help us find incredible talent, virtually anywhere. We are happy to discuss working options for this role and encourage you to speak with your recruiter in more detail when you apply.
Learn what makes Akamai a great place to work

Connect with us on social and see what life at Akamai is like!

We power and protect life online, by solving the toughest challenges, together.

At Akamai, we're curious, innovative, collaborative and tenacious. We celebrate diversity of thought and we hold an unwavering belief that we can make a meaningful difference. Our teams use their global perspectives to put customers at the forefront of everything they do, so if you are people-centric, you'll thrive here.

Working for you

At Akamai, we will provide you with opportunities to grow, flourish, and achieve great things. Our benefit options are designed to meet your individual needs for today and in the future. We provide benefits surrounding all aspects of your life:

  • Your health
  • Your finances
  • Your family
  • Your time at work
  • Your time pursuing other endeavors

Our benefit plan options are designed to meet your individual needs and budget, both today and in the future.

About us

Akamai powers and protects life online. Leading companies worldwide choose Akamai to build, deliver, and secure their digital experiences helping billions of people live, work, and play every day. With the world's most distributed compute platform from cloud to edge we make it easy for customers to develop and run applications, while we keep experiences closer to users and threats farther away.

Join us

Are you seeking an opportunity to make a real difference in a company with a global reach and exciting services and clients? Come join us and grow with a team of people who will energize and inspire you!
Akamai Technologies is an Affirmative Action, Equal Opportunity Employer that values the strength that diversity brings to the workplace. All qualified applicants will receive consideration for employment and will not be discriminated against on the basis of gender, gender identity, sexual orientation, race/ethnicity, protected veteran status, disability, or other protected group status.
#LI-Remote
If no date is displayed, applications are being accepted on an ongoing basis until the job is filled.

Compensation

Akamai is committed to fair and equitable compensation practices. For US based candidates only - the base salary for this position ranges from $121,400 - $218,600/year; a candidate’s salary is determined by various factors including, but not limited to, relevant work experience, skills, certifications and location. Compensation for candidates outside the US will vary. The compensation package may also include incentive compensation opportunities in the form of annual bonus or incentives, equity awards and an Employee Stock Purchase Plan (ESPP). Akamai provides industry-leading benefits including healthcare, 401K savings plan, company holidays, vacation (in the form of PTO), sick time, family friendly benefits including parental leave and an employee assistance program including a focus on mental and financial wellness; Eligibility requirements apply.

Top Skills

Go
Grafana
Kubernetes
Prometheus
Python
HQ

Akamai Technologies Cambridge, Massachusetts, USA Office

145 Broadway, Cambridge, MA, United States, 2142

Similar Jobs

2 Days Ago
Easy Apply
Remote
USA
Easy Apply
186K-219K Annually
Senior level
186K-219K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The role involves leading AI product development, enhancing CI/CD frameworks, automating IT workflows, supporting AWS services, and driving cloud security best practices.
Top Skills: AnsibleAWSBashChefCi/CdDockerGitKubernetesPuppetPythonRubySaltTerraform
4 Days Ago
Remote or Hybrid
US
65K-160K Annually
Senior level
65K-160K Annually
Senior level
Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics
As a Senior Site Reliability Engineer, you will ensure software reliability and scalability, manage IAC, CI/CD, monitor systems, and mentor junior engineers while collaborating across teams.
Top Skills: AnsibleArgocdBashDatadogGithub ActionsGitlabGoHashicorp ConsulHelmKubernetesPackerPostgresPowershellPythonSQL ServerTerraformTypescript
7 Days Ago
Remote or Hybrid
CO, USA
110K-145K Annually
Senior level
110K-145K Annually
Senior level
Information Technology • Insurance • Software
Responsible for the reliability and scalability of production services, focusing on incident response, root cause analysis, automation, and collaboration across teams.
Top Skills: .NetAWSC#Ci/CdJavaKubernetesLinuxPythonReactWindows

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account