Voltage Park Logo

Voltage Park

Infrastructure Operations Engineer

Posted Yesterday
Remote
Hiring Remotely in USA
140K-200K Annually
Senior level
Remote
Hiring Remotely in USA
140K-200K Annually
Senior level
The Infrastructure Operations Engineer is responsible for ensuring the stability and performance of AI compute infrastructure, collaborating with various teams, and deploying system updates while participating in an on-call rotation.
The summary above was generated by AI

Voltage Park is your enterprise AI factory. We offer scalable compute power, on-demand and reserved bare metal AI infrastructure using NVIDIA GPUs, with world-class service, performance and value. Founded with the mission of making accessible AI computing for all – our flexible, affordable GPU solutions power everyone from builders to enterprises.

We are seeking a highly skilled and proactive Infrastructure Operations Engineer to be part of our 24/7 Infrastructure Operations team responsible for the stability, scalability, and performance of compute, storage, and platform infrastructure. This role plays a key part in delivering always-on, high-performance environments that support AI/ML training, inference, and HPC workloads at scale. The ideal candidate combines technical depth with strong interpersonal skills and a passion for operational excellence. 

This position offers full remote flexibility, although candidates must be based in the continental US and available to work during PST hours. Unfortunately, we are unable to provide sponsorship for this role.

Responsibilities

  • At the direction of the Manager of Infrastructure Operations, design, build, and roll out new platforms and patterns to minimize incidents and enable customer facing and internal features.

  • Deploy updates and improvements to support both Voltage Park’s internal and end customer use cases.

  • Collaborate with colleagues in Infrastructure Engineering, Network Operations, Customer Success and Software and Platform Development Teams.

  • Participate in the on-call rotation which is evenly distributed across all team members in a primary / secondary pattern where you are primary then move to a secondary position.

Qualifications

  • 8+ years working with Linux as a server / hosting platform, extra points for Ubuntu experience.

  • 5+ years experience with AWS.

  • 2+ years experience with Kubernetes and strong container fundamentals.

  • 2+ years experience with Terraform and Ansible

  • 2+ years with network attached storage management (via NFS, ceph, or other protocols). Extra points for experience with VAST storage systems.

  • Experience working in a Slack-first, asynchronous remote work environment.

  • Experience with monitoring systems (Prometheus, ELK stack).

  • Familiarity with the gitops workflow. 

  • Software development experience using Python, Go, bash,  or other languages for the purposes of automation & connecting systems & APIs together.

  • Deep networking fundamentals, extra points for experience with datacenter level networks, 400Gb ethernet, and Infiniband.

  • Experience building and delivering complex systems.

  • Effective at navigating tradeoffs between design, risk, cost, and outcomes.

  • Comfortable with navigating ambiguity.

  • Strong written and oral communication.

Ideal Experiences

  • Experience with bare metal hardware troubleshooting and provisioning, extra points for working with Dell hardware.

  • Experience with GPU servers, both in bare metal form or under virtualization.

  • Deep experience with network switches, routers, and firewalls, particularly SONiC switches, Palo Alto firewalls and Juniper Networks as vendors.

  • Experience with VAST storage systems

Culture

  • You enjoy working with a small group of friendly, highly motivated, execution focused colleagues.

  • You’re comfortable with a high degree of autonomy. We expect you to independently prioritize your work and understand how it maps to the overall needs and goals of the company.

  • You’re knowledgeable in your domain but also enjoy wearing multiple hats and venturing outside of your comfort zone when the need arises.

  • You value the ability to write well and understand the importance of good documentation.

Voltage Park is an equal opportunity employer and makes employment decisions on the basis of merit. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic under federal, state, or local law. If you require an accommodation during the job application process, please notify your recruiter. 

Compensation Range: $140K - $200K


#BI-Remote

Top Skills

Ansible
AWS
Bash
Ceph
Elk Stack
Go
Gpu
Kubernetes
Linux
Nfs
Prometheus
Python
Terraform
Vast Storage

Similar Jobs at Voltage Park

Yesterday
Remote
USA
150K-180K Annually
Senior level
150K-180K Annually
Senior level
Artificial Intelligence • Cloud • Hardware • Machine Learning • Other • Software • Infrastructure as a Service (IaaS)
The Storage Engineer will manage and optimize a customer-facing multi-petabyte VAST storage system, including performance tuning, troubleshooting, and collaboration with teams.
Top Skills: AnsibleHpc Storage SystemsLinuxNfsTerraformVast Storage Systems
12 Days Ago
Remote
2 Locations
120K-180K Annually
Senior level
120K-180K Annually
Senior level
Artificial Intelligence • Cloud • Hardware • Machine Learning • Other • Software • Infrastructure as a Service (IaaS)
As a Platform Engineer, you'll maintain platforms, develop automation software, and ensure system reliability, leveraging strong Linux administration and scripting skills.
Top Skills: AnsibleBashCephDebianDockerElk StackGrafanaKubernetesLibvirtLinuxMaasNfsPostgresPrometheusPythonReactRedisTailwindTerraformUbuntu
17 Hours Ago
Remote
USA
100K-125K Annually
Mid level
100K-125K Annually
Mid level
Artificial Intelligence • Cloud • Hardware • Machine Learning • Other • Software • Infrastructure as a Service (IaaS)
The Technical Support Manager leads a team of Technical Support Analysts, ensuring high-quality customer support and optimizing support processes in a remote environment.
Top Skills: Linux

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account