Lead SRE at JPMorgan Chase focusing on site reliability, defining requirements, managing incidents, mentoring, and developing AI/ML solutions.
Job Description
Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability.
As a Principal Site Reliability Engineer at JPMorgan Chase within the AI/ML & Data platform team, you work with your fellow stakeholders to define non-functional requirements (NFRs) and availability targets for the services in your application and product lines. You will ensure those NFRs are accounted for in your products' design and test phases, that your service level indicators are effectively measuring customer experience, and that service level objectives are defined with stakeholders and implemented in production.
Job responsibilities
Required qualifications, capabilities, and skills
Preferred qualifications, capabilities, and skills
Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability.
As a Principal Site Reliability Engineer at JPMorgan Chase within the AI/ML & Data platform team, you work with your fellow stakeholders to define non-functional requirements (NFRs) and availability targets for the services in your application and product lines. You will ensure those NFRs are accounted for in your products' design and test phases, that your service level indicators are effectively measuring customer experience, and that service level objectives are defined with stakeholders and implemented in production.
Job responsibilities
- Demonstrate expertise in application development and support with multiple technologies such as Databricks, Snowflake, AWS, Kubernetes, etc.
- Coordinate incident management coverage to ensure effective resolution of application issues.
- Collaborate with cross-functional teams to perform root cause analysis and implement production changes.
- Mentor and guide team members to foster innovation and strategic change.
- Develop and support AI/ML solutions for troubleshooting and incident resolution.
Required qualifications, capabilities, and skills
- Formal training or certification on SRE concepts and 5+ years applied experience
- Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
- Proficiency in running production incident calls and managing incident resolution.
- Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
- Strong understanding of SLI/SLO/SLA and Error Budgets
- Proficiency in Python or PySpark for AI/ML modeling.
- Must be able to reduce toil by building new tools to automate repeated tasks.
- Hands-on experience in system design, resiliency, testing, operational stability, and disaster recovery
- Understanding of network topologies, load balancing, and content delivery networks.
- Awareness of risk controls and compliance with departmental and company-wide standards.
- Ability to work collaboratively in teams and build meaningful relationships to achieve common goals.
Preferred qualifications, capabilities, and skills
- SRE or production support role with AWS Cloud, Databricks, Snowflake or similar Technologies.
- AWS and Databricks certifications.
Top Skills
AWS
Databricks
Datadog
Dynatrace
Grafana
Kubernetes
Prometheus
Pyspark
Python
Snowflake
Splunk
Similar Jobs at JPMorganChase
Financial Services
As a Lead Software Engineer, you'll design, develop, and implement market-leading technology solutions, ensuring operational stability and leading software communities.
Top Skills:
AngularAWSCassandraJavaKafkaMicroservicesNoSQLReactRestful ServicesSpring BootSQL
Financial Services
The Lead Software Engineer designs and develops complex software solutions, influences teams towards best practices, and utilizes Agile methodologies while managing distributed teams.
Top Skills:
AixAngularjsAWSBddBitbucketCloud FoundryDb2EclipseGitGCPHibernateIntellijJ2EeJavaJenkinsJmeterMavenMicroservicesAzureObject Oriented DesignOracleReactSoaSpring BootSpring CloudSpring FrameworkTddUnixWeb ServicesWebsphere
Financial Services
Lead site reliability engineering initiatives, influence team culture, ensure application reliability, mentor team members, and implement best practices across teams.
Top Skills:
.NetArtificial IntelligenceCloudDockerEcsGitlabJava Spring BootJenkinsKubernetesMachine LearningMobilePythonTerraform
What you need to know about the Boston Tech Scene
Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.
Key Facts About Boston Tech
- Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
- Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
- Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
- Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

