Quantiphi Logo

Quantiphi

Infrastructure Architect (GCP)

Posted 25 Days Ago
Be an Early Applicant
Remote
Hiring Remotely in USA
Senior level
Remote
Hiring Remotely in USA
Senior level
Design and implement hybrid infrastructure solutions supporting AI/GenAI workloads, collaborating with teams to optimize performance and cost across cloud and on-prem environments.
The summary above was generated by AI

While technology is the heart of our business, a global and diverse culture is the heart of our success. We love our people and we take pride in catering them to a culture built on transparency, diversity, integrity, learning and growth.
If working in an environment that encourages you to innovate and excel, not just in professional but personal life, interests you- you would enjoy your career with Quantiphi!

About Quantiphi:
Quantiphi is an award-winning Applied AI and Big Data software and services company, driven by a deep desire to solve transformational problems at the heart of businesses. Our signature approach combines groundbreaking machine learning research with disciplined cloud and data-engineering practices to create breakthrough impact at unprecedented speed.

Company Highlights:
Quantiphi has seen 2.5x growth YoY since its inception in 2013, we don’t just innovate - we lead. Headquartered in Boston, with 4,000+ Quantiphi professionals across the globe. As an Elite/Premier Partner for Google Cloud, AWS, NVIDIA, Snowflake, and others, we’ve been recognized with:

  • 17x Google Cloud Partner of the Year awards in the last 8 years.
  • 3x AWS AI/ML award wins.
  • 3x NVIDIA Partner of the Year titles.
  • 2x Snowflake Partner of the Year awards.
  • We have also garnered top analyst recognitions from Gartner, ISG, and Everest Group.
  • We offer first-in-class industry solutions across Healthcare, Financial Services, Consumer Goods, Manufacturing, and more, powered by cutting-edge Generative AI and Agentic AI accelerators.
  • We have been certified as a Great Place to Work for the third year in a row- 2021, 2022, 2023.

Be part of a trailblazing team that’s shaping the future of AI, ML, and cloud innovation. Your next big opportunity starts here!

For more details, visitWebsite or LinkedIn Page.

Work Location: Dallas (preferred) but anywhere in US works.

Role Overview:

  • We are seeking a seasoned Infrastructure Architect with deep expertise in both cloud platforms and on-premise infrastructure to design, implement, and manage robust hybrid environments that can support high-compute AI and GenAI workloads.
  • You will work onsite with one of our key enterprise clients to assess existing infrastructure, define scalable architectures, and ensure optimal performance for AI/ML and GenAI solutions.
  • You’ll play a critical role in bridging infrastructure, DevOps, and AI solution delivery, ensuring our client has the right foundational stack to scale advanced AI workloads across their enterprise.

Key Responsibilities:

Hybrid Infrastructure Design & Deployment:

  • Architect and implement secure, scalable, and cost-effective infrastructure solutions across on-prem and cloud (GCP, AWS, Azure) environments.
  • Evaluate existing systems and define migration or integration strategies for deploying AI/GenAI workloads in hybrid setups.
  • Design infrastructure supporting GPU-intensive workloads, distributed training, inferencing, and vector database storage.

Cloud & On-Prem Operations:

  • Manage provisioning, automation, and orchestration across virtual machines, containers, and Kubernetes clusters.
  • Implement and monitor high-availability, low-latency, and disaster recovery strategies.
  • Optimize infrastructure for latency-sensitive applications, including real-time GenAI agentic workflows.

Collaboration & Enablement:

  • Work closely with AI/ML engineers, data scientists, solution architects, and DevOps to ensure smooth deployment and scaling of models and GenAI agents.
  • Recommend best practices on hybrid infrastructure for LLM fine-tuning, RAG architecture, and multi-agent orchestration platforms.
  • Guide teams on infrastructure security, IAM policies, and governance frameworks for GenAI applications.

Performance & Cost Optimization:

  • Continuously benchmark, profile, and optimize infrastructure for performance and efficiency.
  • Monitor resource utilization and propose capacity planning strategies for AI workload peaks.

Key Qualifications & Experience:

  • Bachelor’s or Master’s degree in Computer Science, Information Systems, or related field.
  • 8–15 years of experience in enterprise infrastructure architecture, with significant experience in both on-prem and cloud-native environments.
  • Proven track record in designing and deploying AI/ML or GenAI-supporting infrastructure (e.g., GPU clusters, Kubernetes for ML workloads, hybrid vector databases).
  • Deep knowledge of cloud services (GCP preferred; AWS or Azure acceptable), on-prem virtualization, storage, networking, and container orchestration.
  • Experience supporting multi-agentic GenAI frameworks, including task orchestration, distributed agents, and workflow automation.
  • Hands-on experience in DevOps and IaC tools (Terraform, Helm, Ansible, CI/CD).
  • Familiarity with AI governance, data security, and compliance in hybrid environments.

Required Skills:

GCP Infrastructure Design & Deployment
Deep hands-on expertise in architecting and managing solutions on Google Cloud Platform, including:

  • VPC design, subnetting, firewall rules, Private Service Connect, and Cloud Interconnect for secure hybrid networking.
  • Identity & Access Management (IAM), Workload Identity Federation, and service accounts for secure access control across services.
  • Cloud Load Balancing, Cloud NAT, and Cloud Armor for high-availability, secure ingress/egress management.
  • Resource hierarchy and organization policies to manage large-scale enterprise GCP environments.

AI/GenAI-Centric Compute & Storage Architecture
Strong understanding of compute services tailored to GenAI:

  • Compute Engine for custom VM/GPU provisioning (A100/H100, T4).
  • GKE (Google Kubernetes Engine) for containerized model deployments, including support for GPU workloads and node auto-provisioning.
  • Vertex AI and Vertex AI Workbench for managing ML pipelines, training, model registry, and deployments.

Storage architecture experience with:

  • Cloud Storage (standard, nearline, coldline) for unstructured datasets.
  • Filestore, Local SSDs, and Persistent Disks for high-throughput model training and inferencing.
  • Integration with BigQuery and Spanner for structured data workloads supporting GenAI applications.

Containerization, Orchestration & IaC on GCP:

  • Advanced experience with GKE:
  • Cluster autoscaling, workload identity, taints/tolerations for GPU scheduling.
  • Helm-based deployments and integration with Artifact Registry.

Proficient in Infrastructure as Code using:

  • Terraform (with GCP provider modules) for declarative infrastructure deployment.
  • Cloud Build, Cloud Deploy, or integration with GitHub Actions for CI/CD pipelines.
  • Ability to automate infrastructure provisioning, policy enforcement, and environment standardization.

Support for GenAI Architectures:

  • Experience deploying and optimizing infrastructure for:
  • LLM hosting using Triton Inference Server, vLLM, or Text Generation Inference on GKE or Compute Engine.
  • Vector database integrations (Weaviate, ChromaDB, FAISS) with GCS and BigQuery.
  • RAG pipeline infrastructure including document ingestion (e.g., via Pub/Sub, Cloud Functions) and scalable retrieval.
  • Multi-agent frameworks like LangGraph, CrewAI, or AutoGen, with secure multi-service orchestration across GCP services.

Observability, Security, and Governance
Monitoring & observability stack:

  • Cloud Monitoring, Cloud Logging, Cloud Trace, Profiler, and Error Reporting for full-stack visibility.
  • Experience setting up custom dashboards, alerts, and uptime checks.
  • Security and compliance capabilities:
  • VPC Service Controls, Shielded VMs, Confidential Computing, and data encryption strategies (at rest and in transit).
  • Experience with cloud security posture management (CSPM) and compliance frameworks (e.g., HIPAA, SOC 2, FedRAMP).

Governance:

  • Experience setting up Organization Policies, Folder Hierarchies, and Cloud Asset Inventory for enterprise governance.

Cost Optimization & Resource Efficiency:
Proven ability to:

  • Monitor and optimize spend using Billing Reports, Cost Table Reports, Budgets, and Recommendations Hub.
  • Implement rightsizing recommendations, sustained use discounts, and committed use contracts (CUDs) for GPU workloads.
  • Design cost-aware architecture balancing performance, latency, and throughput for GenAI use cases.

Soft Skills & Personality Traits:

  • Strong problem-solving and debugging skills.
  • Ability to communicate technical concepts clearly to non-technical stakeholders.
  • Collaborative mindset with ability to work cross-functionally across AI, DevOps, and business teams.
  • Detail-oriented, with a focus on reliability, scalability, and security.

Preferred:

  • GCP Professional Cloud Architect, AWS Solutions Architect, or similar certifications.
  • Familiarity with GPUs (NVIDIA A100, H100), inference acceleration, and edge deployments
  • Familiarity with AI/ML governance, compliance, and ethical AI frameworks.

What is in it for you:

  • Be part of a team and company that has won NVIDIA's AI Services Partner of the Year three times in a row with an unparalleled track record of building production AI applications on DGX and Cloud GPUs.
  • Strong peer learning which will accelerate your learning curve across Applied AI, GPU Computing and other softer aspects such as technical communication.
  • Exposure to working with highly experienced AI leaders at Fortune 500 companies and innovative market disruptors looking to transform their business with Generative AI.
  • Access to state-of-the-art GPU infrastructure on the cloud and on-premise.
  • Be part of the fastest-growing AI-first digital transformation and engineering company in the world.

If you like wild growth and working with happy, enthusiastic over-achievers, you'll enjoy your career with us!

Top Skills

Ansible
AWS
Azure
Ci/Cd
Cloud Services
GCP
Gpu Clusters
Helm
Kubernetes
Terraform

Similar Jobs

2 Hours Ago
Remote or Hybrid
Pennsylvania, USA
59K-138K Annually
Senior level
59K-138K Annually
Senior level
AdTech • Digital Media • Marketing Tech
Manage complex projects ensuring on-time delivery, budget adherence, and oversee audit and governance processes, while coordinating cross-functional teams.
2 Hours Ago
Remote or Hybrid
Pennsylvania, USA
59K-138K Annually
Senior level
59K-138K Annually
Senior level
AdTech • Digital Media • Marketing Tech
Manage complex projects related to audit readiness, governance programs, and risk analysis while ensuring timely completion and adherence to organizational policies.
3 Hours Ago
Remote or Hybrid
Pennsylvania, USA
59K-138K Annually
Senior level
59K-138K Annually
Senior level
AdTech • Digital Media • Marketing Tech
Manage complete life cycle of complex projects, ensuring adherence to audit and governance programs, analyzing data, and leading strategic initiatives.

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account