Metrum Research Group Logo

Metrum Research Group

Platform Architect HPC & AI

Posted 6 Days Ago
Be an Early Applicant
Remote
Hiring Remotely in United States
Expert/Leader
Remote
Hiring Remotely in United States
Expert/Leader
The Platform Architect will lead the architectural strategy for HPC and AI computing platforms, ensuring scalability, reliability, and regulatory compliance while collaborating across teams to meet business goals.
The summary above was generated by AI
Join Us as a Platform Architect — Shape the Future of AI and High-Performance Computing

We’re on the hunt for a visionary and systems-minded Platform Architect to define and drive the architectural strategy behind our next-generation, high-performance, AI-powered computing platform. This is a pivotal role—one that sits at the heart of scaling a secure, fault-tolerant, and cloud-native infrastructure designed to power advanced modeling, AI/ML workflows, and data-intensive workloads across life sciences and other innovation-focused industries.

You’ll be the guiding force behind the technical foundation of our platform, translating complex requirements into scalable, resilient systems. With deep expertise in distributed systems, AI infrastructure, and HPC environments, you’ll align technology architecture with regulatory, operational, and business priorities.

Working closely with cross-functional teams—engineering, product, and compliance—you’ll help shape a platform that’s not only powerful and future-proof but also reliable, supportable, and elegantly designed for real-world use.

Responsibilities:
  • Develop and guide an architectural blueprint that supports modular, scalable, and secure delivery of HPC and AI capabilities.
  • Design for resilience, fault tolerance, and operational durability to ensure platform services are stable and supportable at scale.
  • Translate emerging scientific and business needs into infrastructure strategies that prioritize reliability, usability, and maintainability.
  • Collaborate with engineering, infrastructure, product, and compliance teams to ensure architectural alignment with implementation and operational goals.
  • Lead technical design reviews and act as an advisor on systems-level challenges, promoting clarity and coherence across teams.
  • Foster shared understanding of platform design tradeoffs, emphasizing outcomes that improve the experience of users and those who support the platform.
  • Define infrastructure requirements for reproducible, on-demand, and GxP-compliant compute environments.
  • Ensure that security, observability, and operational control are embedded into platform architecture from the outset.
  • Guide the use of containerization, orchestration, and service mesh technologies (e.g., Kubernetes, Istio, Argo) in collaboration with engineering teams.
  • Architect scalable infrastructure for the full AI/ML lifecycle, including model training, deployment, and real-time inference.
  • Evaluate and integrate emerging HPC and AI technologies (e.g., accelerators, AI agents, distributed frameworks) to enhance long-term platform capability.
  • Define workload orchestration strategies that balance performance, cost-efficiency, and operational resilience.
  • Perform feasibility and sustainability impact assessments for proposed architectures, including risk analysis, cost implications, and long-term maintainability.
  • Represent architectural perspectives in customer engagements and business development efforts where platform design is a key differentiator.
  • Collaborate with stakeholders to scope and shape technical solutions that align with product vision and customer requirements.
  • Identify systemic architectural or operational issues and drive improvements that benefit both internal teams and external users.
  • Please note: that this job description is not meant to be all-inclusive. Other duties may be assigne
Qualifications:
  • 10+ years of experience in software or platform architecture, including 5+ years in HPC, large-scale compute infrastructure, or AI platform development.
  • Strong understanding of cloud-native architecture (AWS, Azure, or GCP), container technologies, and orchestration frameworks.
  • Experience designing infrastructure that is resilient, fault-tolerant, and easy to operate, especially in regulated or high-stakes environments.
  • Background in supporting AI/ML workflows (e.g., TensorFlow, PyTorch) and integrating workflow orchestration tools (e.g., Airflow, Nextflow, Argo Workflows).
  • Familiarity with distributed systems and job scheduling (e.g., Slurm, HTCondor) in both research and production environments.
  • Technical fluency across multiple languages and systems (e.g., Python, Go, R, Linux-based infrastructure).
  • Strong communication and systems-thinking skills with a track record of collaborative problem solving.
Preferred Qualifications:
  • Familiarity with GxP compliance, 21 CFR Part 11, or regulated computing frameworks.
  • Background in scientific computing, pharma R&D, or life sciences infrastructure.
  • Exposure to AI agent orchestration frameworks (e.g., LangChain, NVIDIA NeMo, AutoGen).
  • Experience with semantic data platforms or data lakehouse architecture.
Education and Experience:
  • Bachelor’s degree in computer science, engineering, or a related field - or equivalent work experience with demonstrable expertise in platform-scale architecture.
  • Experience collaborating across disciplines including engineering, infrastructure, networking, and security.
  • Certifications in cloud, security, or systems architecture are preferred
Physical Demands

The job frequently requires working at a computer terminal, standing or sitting, and the ability to operate the computer with proficiency.

Work Environment

The work environment is quiet with no adverse conditions.

Metrum Research Group offers competitive salaries and an excellent benefits package. You can read more about us by clicking the link at the top of this page, 'Company Website'.

Metrum Research Group is an Equal Opportunity Employer

Metrum Research Group EEO Statement

MetrumRG believes that innovation is cultivated when we challenge each other with new ideas and perspectives. MetrumRG is an equal opportunity employer that is committed to building a diverse and inclusive team. All employment decisions are based on qualifications, merit, and business needs, and we prohibit discrimination and harassment of any kind based on race, color, sex, religion, sexual orientation, gender identity, national origin, disability, genetic information, pregnancy, or any other protected characteristic as outlined by federal, state, or local laws.
MetrumRG is committed to providing equal employment opportunities and reasonable accommodations for candidates and employees with disabilities. We encourage all qualified candidates to apply for positions within our organization. If you require reasonable accommodation because of a medical condition for the application or interview process, please contact Scotti Rylands or our Talent and Culture Department, (860)735-7043 x-622, or message us and we will work with you to meet your needs.

Top Skills

Airflow
Argo
AWS
Azure
GCP
Go
Istio
Kubernetes
Linux
Nextflow
Python
PyTorch
R
TensorFlow

Similar Jobs

51 Minutes Ago
Remote
United States
147K-199K Annually
Mid level
147K-199K Annually
Mid level
Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
As an Infrastructure Software Engineer, you'll build infrastructure for managing extensive metadata, enhance interoperability, and optimize analytics platforms, while ensuring system reliability and performance.
Top Skills: C/C++GoJavaOpentelemetryPython
51 Minutes Ago
Remote
United States
167K-226K Annually
Senior level
167K-226K Annually
Senior level
Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
The Product Manager will drive the vision, execution, and results of AI-enhanced products, collaborating with cross-functional teams to deliver customer value.
Top Skills: AIData AnalyticsMlProduct Management
52 Minutes Ago
Remote
USA
97K-114K Annually
Mid level
97K-114K Annually
Mid level
Computer Vision • Healthtech • Information Technology • Logistics • Machine Learning • Software • Manufacturing
The Senior Tax Accountant will manage sales tax compliance, coordinate with external advisors, develop processes, and ensure timely remittances.
Top Skills: AirbaseAvalaraExcelNetSuite

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account