Metrum Research Group Logo

Metrum Research Group

Platform Architect HPC & AI

Reposted 10 Days Ago
Be an Early Applicant
In-Office
Boston, MA
Senior level
In-Office
Boston, MA
Senior level
The Platform Architect will lead the architectural strategy for an HPC and AI computing platform, focusing on scalable, secure, and resilient infrastructure. Responsibilities include developing architecture, collaborating with teams, and designing for AI/ML workflows.
The summary above was generated by AI
Join Us as a Platform Architect — Shape the Future of AI and High-Performance Computing

We are seeking a strategic and systems-oriented Platform Architect to lead the architectural vision and technical direction for our next-generation high-performance and AI computing platform. This role is critical to scaling a secure, fault-tolerant, and cloud-native infrastructure that supports advanced scientific modeling, AI/ML workflows, and data-intensive computing across the life sciences and other innovation-driven sectors with a heavy emphasis on Model-informed Drug Development (MiDD).

The ideal candidate brings deep expertise in distributed systems, AI infrastructure, and HPC environments, combined with the ability to align architecture with regulatory, operational, and business requirements. The candidate should also have a deep understanding of HPC within AWS including ParallelCluster. You will collaborate across engineering, product, and compliance teams to ensure cohesive design, sustainable implementation paths, and a platform that is reliable and easy to support and use.

Responsibilities:
  • Develop and guide an architectural blueprint that supports modular, scalable, and secure delivery of productized HPC and AI capabilities in the domains of Model-Informed Drug Development (MiDD), pharmacometrics, biostatistics and quantitative systems pharmacology.
  • Design for resilience, fault tolerance, and operational durability to ensure platform services are stable and supportable at scale.
  • Translate emerging scientific and business needs into infrastructure strategies that prioritize reliability, usability, and maintainability.
  • Collaborate with engineering, infrastructure, product, and compliance teams to ensure architectural alignment with implementation and operational goals.
  • Lead technical design reviews and act as an advisor on systems-level challenges, promoting clarity and coherence across teams.
  • Foster shared understanding of platform design tradeoffs, emphasizing outcomes that improve the experience of users and those who support the platform.
  • Define infrastructure requirements for reproducible, on-demand, and GxP-compliant compute environments.
  • Ensure that security, observability, and operational control are embedded into platform architecture from the outset.
  • Guide the use of containerization, orchestration, and service mesh technologies (e.g., Kubernetes, Istio, Argo) in collaboration with engineering teams.
  • Architect scalable infrastructure for the full AI/ML lifecycle, including model training, deployment, and real-time inference.
  • Evaluate and integrate emerging HPC and AI technologies (e.g., accelerators, AI agents, distributed frameworks) to enhance long-term platform capability.
  • Define workload orchestration strategies that balance performance, cost-efficiency, and operational resilience.
  • Perform feasibility and sustainability impact assessments for proposed architectures, including risk analysis, cost implications, and long-term maintainability.
  • Represent architectural perspectives in customer engagements and business development efforts where platform design is a key differentiator.
  • Collaborate with stakeholders to scope and shape technical solutions that align with product vision and customer requirements.
    Identify systemic architectural or operational issues and drive improvements that benefit both internal teams and external users.
  • Please note: that this job description is not meant to be all-inclusive. Other duties may be assigned
Qualifications:
  • 10+ years of experience in software or platform architecture, including 5+ years in HPC, large-scale compute infrastructure, or AI platform development.
  • Strong understanding of cloud-native architecture (AWS, Azure, or GCP), container technologies, and orchestration frameworks.
  • Experience designing infrastructure that is resilient, fault-tolerant, and easy to operate, especially in regulated or high-stakes environments.
  • Background in supporting AI/ML workflows (e.g., TensorFlow, PyTorch) and integrating workflow orchestration tools (e.g., Airflow, Nextflow, Argo Workflows).
  • Familiarity with distributed systems and job scheduling (e.g., Slurm, HTCondor) in both research and production environments.
  • Technical fluency across multiple languages and systems (e.g., Python, Go, R, Linux-based infrastructure).
  • Strong communication and systems-thinking skills with a track record of collaborative problem solving.
Preferred Qualifications:
  • Familiarity with GxP compliance, 21 CFR Part 11, or regulated computing frameworks.
  • Background in scientific computing, pharma R&D, or life sciences infrastructure.
  • Exposure to AI agent orchestration frameworks (e.g., LangChain, NVIDIA NeMo, AutoGen).
  • Experience with semantic data platforms or data lakehouse architecture.
Education and Experience:
  • Bachelor’s degree in computer science, engineering, or a related field - or equivalent work experience with demonstrable expertise in platform-scale architecture.
  • Experience collaborating across disciplines including engineering, infrastructure, networking, and security.
  • Certifications in cloud, security, or systems architecture are preferred
Physical Demands

The job frequently requires working at a computer terminal, standing or sitting, and the ability to operate the computer with proficiency.

Work Environment

The work environment is quiet with no adverse conditions.

Metrum Research Group offers competitive salaries and an excellent benefits package. You can read more about us by clicking the link at the top of this page, 'Company Website'.

Metrum Research Group is an Equal Opportunity Employer

Metrum Research Group EEO Statement

MetrumRG believes that innovation is cultivated when we challenge each other with new ideas and perspectives. MetrumRG is an equal opportunity employer that is committed to building a diverse and inclusive team. All employment decisions are based on qualifications, merit, and business needs, and we prohibit discrimination and harassment of any kind based on race, color, sex, religion, sexual orientation, gender identity, national origin, disability, genetic information, pregnancy, or any other protected characteristic as outlined by federal, state, or local laws.
MetrumRG is committed to providing equal employment opportunities and reasonable accommodations for candidates and employees with disabilities. We encourage all qualified candidates to apply for positions within our organization. If you require reasonable accommodation because of a medical condition for the application or interview process, please contact Scotti Rylands or our Talent and Culture Department, (860)735-7043 x-622, or message us and we will work with you to meet your needs.

Top Skills

AI
Airflow
Argo
AWS
Azure
GCP
Go
Hpc
Istio
Kubernetes
Linux
Nextflow
Python
PyTorch
R
TensorFlow

Similar Jobs

9 Hours Ago
Hybrid
Framingham, MA, USA
64K-90K Annually
Mid level
64K-90K Annually
Mid level
Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics
This role involves post-sales support including installation and troubleshooting of SaaS solutions, managing customer satisfaction, and training clients on Rave products.
Top Skills: C++CSSHTMLJavaJSONMySQLOracleRubySQL ServerXML
9 Hours Ago
Hybrid
Waltham, MA, USA
45K-75K
Mid level
45K-75K
Mid level
Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics
The Technical Support Engineer will troubleshoot complex issues, manage customer support tickets, and improve Rave's SaaS products, requiring strong technical abilities and communication skills.
Top Skills: DhcpDnsHTMLMySQLOracleSQL ServerTcp/IpXMLZendesk
11 Hours Ago
Hybrid
5 Locations
99K-232K Annually
Senior level
99K-232K Annually
Senior level
Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
As an AML/Sanctions Data Scientist Manager, you'll lead teams in delivering AI/ML solutions for financial crime detection, mentor junior staff, and enhance client operations through advanced analytics.
Top Skills: Large Language ModelsMachine LearningMlopsNatural Language ProcessingPythonSQL

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account