Aalyria Logo

Aalyria

Site Reliability Engineer - Spacetime

Posted Yesterday
Remote
Hiring Remotely in United States
Mid level
Remote
Hiring Remotely in United States
Mid level
As a Site Reliability Engineer, you'll build and manage observability platforms for satellite communications, define SLOs/SLIs, and collaborate on incident response and deployment automation.
The summary above was generated by AI
About Aalyria:

Aalyria is a leading technology company that supplies laser communications technology and temporospatial software-defined networking platforms to the aerospace industry. With technology acquired from Google, Aalyria is at the forefront of innovation in satellite and airborne mesh networks, as well as cislunar and deep-space communications. We are revolutionizing the orchestration and management of planetary mesh networks using any radio or optical spectrum, any orbit, and any hardware across land, sea, air, and space.

Role Overview:

This isn't a "keep the lights on" SRE role. This is a strategic, high-impact opportunity to build the nervous system for a platform that transforms how networks of satellites, ground stations, and fleets are interconnected and orchestrated. You will be building the core observability stack that ensures the reliability of systems critical to the operation of satellite megaconstellations and missions to deep space.

This is a greenfield/brownfield opportunity. You will be a trusted expert, helping to define and implement the strategy and building the tools that empower our engineers. You will support the roadmap to mature our observability stack, moving from cloud-native tools to a robust, scalable, and insightful platform built on best-in-class technologies (Prometheus, OpenTelemetry, etc.). If you are an SRE who thrives on platform-building challenges and wants to be relied upon to build a production-grade observability stack from the ground up, this role is for you.

Note: this role includes on-call responsibilities.

Key Responsibilities:
  • Help design and build Aalyria's centralized observability platform, integrating and scaling tools for metrics (e.g. Prometheus), logging (e.g. Loki), and distributed tracing (e.g. Tempo/OpenTelemetry).
  • Define, implement, and manage a robust framework of Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for our core products, ensuring we are launch-ready.
  • Partner with SWEs to implement observability best practices, develop standard templates and documentation, and configure tooling (e.g., OpenTelemetry libraries).
  • Automate the deployment, scaling, and management of the entire observability stack using Infrastructure as Code (e.g. Terraform) and GitOps principles (e.g. ArgoCD).
  • Partner closely with the core infrastructure team to ensure deep visibility into our Kubernetes clusters and underlying GCP and AWS environments.
  • Develop and lead the company's monitoring, alerting, and incident response strategy, driving a culture of proactive reliability and blameless post-mortems.
Required Qualifications:
  • 4+ years of experience in an SRE or platform engineering role, with a focus on observability for large-scale, distributed compute or network systems.
  • Deep, hands-on expertise building, scaling, and managing observability platforms (e.g., Prometheus, Grafana, Loki/ELK, OpenTelemetry, Tempo/Jaeger, Honeycomb, etc.). You have proven experience using these tools to support performance analysis and debugging of complex distributed systems.
  • Strong production-level experience with Google Cloud Platform (GCP) and Kubernetes.
  • Experience using Infrastructure as Code (IaC) and GitOps principles (e.g., ArgoCD).
  • Proficiency in a systems programming language, with a strong preference for Go and Python for debugging and writing tooling.
  • Demonstrable experience defining, implementing, and managing SLOs, SLIs, and error budgets for production services for high availability distributed systems.
Preferred Qualifications:
  • Experience operating a multi-cloud environment, specifically GCP and AWS.
  • Hands-on experience with GitLab CI for CI/CD pipelines.
  • Working knowledge of service mesh technologies such as Istio or Linkerd.
  • Familiarity with instrumenting applications written in Go and C++.
  • An active Secret clearance, or higher, is preferred for this position.
  • Experience with JVM observability (tuning, monitoring) for Java-based applications.
What We Offer:
  • Innovative Environment: Work at a cutting-edge company shaping the future of aerospace communications.
  • Impactful Work: Directly contribute to critical national security programs and initiatives.
  • Growth Opportunities: Expand your career with opportunities for professional development and advancement.
  • Inclusive Culture: Be part of a collaborative, supportive, and inclusive workplace where your contributions matter.
  • Flexibility: Flexible working arrangements including hybrid remote/in-office schedules.
  • Compensation and Equity: Competitive salary, comprehensive benefits (401(k), dental, vision, health, life insurance), paid time off, and equity options.
ITAR/EAR Requirements:

This position involves access to export-controlled information. To comply with U.S. government export regulations, applicants must meet one of the following criteria:


(A) Qualify as a U.S. person, which includes:

  • U.S. citizen or national
  • U.S. lawful permanent resident (green card holder)
  • Refugee under 8 U.S.C. 1157
  • Asylee under 8 U.S.C. 1158

(B) Be eligible to access export-controlled information without requiring an export authorization.


(C) Be eligible and reasonably likely to obtain the necessary export authorization from the appropriate U.S. government agency.


The company reserves the right to decline pursuing an export licensing process for legitimate business-related reasons.

Equal Opportunity Employer Statement:

Aalyria is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate based on race, color, religion, sex (including pregnancy, gender identity, and sexual orientation), national origin, age, disability status, genetic information, protected veteran status, or any other characteristic protected by law. Qualified applicants from all backgrounds are encouraged to apply.



Top Skills

Argocd
AWS
Elk
GCP
Go
Grafana
Istio
Jaeger
Kubernetes
Linkerd
Loki
Opentelemetry
Prometheus
Python
Tempo
Terraform

Similar Jobs

Yesterday
Remote
United States
Senior level
Senior level
Aerospace • Manufacturing
The Staff Site Reliability Engineer will design and manage Aalyria's centralized observability platform, focus on metrics, logging, and tracing systems, implement SLOs and SLIs, automate deployments, and drive incident response strategies for enhanced reliability across satellite and cloud platforms.
Top Skills: AWSElkGCPGitopsGoGrafanaJaegerJavaKubernetesLokiOpentelemetryPrometheusPythonTempoTerraform
13 Days Ago
Easy Apply
Remote or Hybrid
5 Locations
Easy Apply
127K-249K Annually
Senior level
127K-249K Annually
Senior level
Big Data • Cloud • Software • Database
The role involves maintaining and improving CI/CD infrastructure using Argo Workflows and Kubernetes, ensuring effective deployment for engineering teams.
Top Skills: AWSAzureGoGCPKubernetesPython
An Hour Ago
Remote or Hybrid
Chicago, IL, USA
95K-184K Annually
Senior level
95K-184K Annually
Senior level
Automotive • Professional Services • Software • Consulting • Energy • Chemical • Renewable Energy
The role focuses on building customer relationships, selling specialty Advisory services in Healthy Buildings, and collaborating with clients to meet their sustainability needs.
Top Skills: Crm SoftwareMS Office

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account