GSK Logo

GSK

Data Engineer II

Posted 8 Hours Ago
In-Office
2 Locations
116K-194K Annually
Mid level
In-Office
2 Locations
116K-194K Annually
Mid level
The Data Engineer II builds and maintains automated data services and pipelines, applies best practices in software development, and ensures data quality and compliance, while also supporting and improving existing services.
The summary above was generated by AI

The Onyx Research Data Platform organization represents a major investment by GSK R&D and Digital & Tech, designed to deliver a step change in our ability to leverage data, knowledge, and prediction to find new medicines.  We are a full-stack shop consisting of product and portfolio leadership, data engineering, infrastructure and DevOps, data / metadata / knowledge platforms, and AI/ML and analysis platforms, all geared toward:   

   

  • Building a unified, automated, next-generation data experience for GSK’s scientists, engineers, and decision-makers, increasing productivity, and reducing data friction  

  • Providing best-in-class AI/ML, GenAI and data analysis environments to accelerate our predictive capabilities and attract top-tier talent   

  • Aggressively engineering our data at scale to unlock the value of our combined data assets and predictions in real-time   

   

Data Engineering is responsible for the design, delivery, support, and maintenance of industrialised automated end to end data services and pipelines. They apply standardised data models and mapping to ensure data is accessible for end users in end-to-end user tools through use of APIs. They define and embed best practices and ensure compliance with Quality Management practices and alignment to automated data governance. They also acquire and process internal and external, structure and unstructured data in line with Product requirements.   

As a Data Engineer II, you are a technical contributor who can take a well-defined specification for a function, pipeline, service, or other sort of component, devise a technical solution, and deliver it at a high level. You are aware of, and adhere to, best practice for software development in general (and data engineering in particular), including code quality, documentation, DevOps practices, and testing. You ensure robustness of our services and serve as an escalation point in the operation of existing services, pipelines, and workflows. You will work across structured, unstructured, and scientific data domains, applying modern engineering and automation best practices to deliver reliable, scalable, and governed data products. You will also contribute to emerging GenAI-enabled data capabilities, such as embedding pipelines, vectorized data flows, and LLM-ready data products.  

You should be deeply familiar with the most common tools (languages, libraries, etc) in the data space, such as Spark, Kafka, Storm, etc., and aware of the open-source communities that revolve around these tools.  You have a strong focus on operability of your tools and services, and develop, measure, and monitor key metrics for their work to seek opportunities to improve those metrics.

Key responsibilities include:
  • Builds modular code / libraries / services / etc using modern data engineering tools (Python/Spark, Kafka, Storm, …) and orchestration tools (e.g. Google Workflow, Airflow Composer) 

  • Produces well-engineered software, including appropriate automated test suites and technical documentation 

  • Develop, measure, and monitor key metrics for all tools and services and consistently seek to iterate on and improve them 

  • Ensure consistent application of platform abstractions to ensure quality and consistency with respect to logging and lineage 

  • Fully versed in coding best practices and ways of working, and participates in code reviews and partnering to improve the team’s standards 

  • Adhere to QMS framework and CI/CD best practices 

  • Provide L3 support to existing tools / pipelines / services  

Why you?Basic Qualifications:

We are looking for professionals with these required skills to achieve our goals:

  • Bachelor’s degree in Data Engineering, Computer Science, Software Engineering, or a related discipline

  • 4+ years of Data engineering Experience 

  • Software engineering experience 

  • Familiarity with orchestrating tooling 

  • Cloud experience (GCP, Azure or AWS)

  • Experience in automated testing and design

Preferred Qualifications:

If you have the following characteristics, it would be a plus:

  • New PhD or a Masters degree with 2+ years of experience.

  • Experience overcoming high volume high compute challenges 

  • Knowledge and use of at least one common programming language: e.g., Python, Scala, Java, including toolchains for documentation, testing, and operations / observability 

  • Strong experience with modern software development tools / ways of working (e.g. git/GitHub, DevOps tools, metrics / monitoring, …) 

  • Cloud experience (e.g., AWS, Google Cloud, Azure, Kubernetes) 

  • Application experience of CI/CD implementations using git and a common CI/CD stack (e.g. Jenkins, CircleCI, GitLab, Azure DevOps) 

  • Experience with agile software development environments using Jira and Confluence  

  • Demonstrated experience with common tools and techniques for data engineering (e.g. Spark, Kafka, Storm, …) 

  • Knowledge of data modelling, database concepts and SQL 

  • Exposure to GenAI or ML data workflows (vector stores, embeddings, feature pipelines, etc.)

#GSK-LI

• If you are based in Cambridge, MA; Waltham, MA; Rockville, MD; or San Francisco, CA, the annual base salary for new hires in this position ranges $116,325 to $193,875. The US salary ranges take into account a number of factors including work location within the US market, the candidate’s skills, experience, education level and the market rate for the role. In addition, this position offers an annual bonus and eligibility to participate in our share based long term incentive program which is dependent on the level of the role. Available benefits include health care and other insurance benefits (for employee and family), retirement benefits, paid holidays, vacation, and paid caregiver/parental and medical leave. If salary ranges are not displayed in the job posting for a specific country, the relevant compensation will be discussed during the recruitment process.

Please visit  GSK US Benefits Summary to learn more about the comprehensive benefits program GSK offers US employees.

Why GSK?
Uniting science, technology and talent to get ahead of disease together.

GSK is a global biopharma company with a purpose to unite science, technology and talent to get ahead of disease together. We aim to positively impact the health of 2.5 billion people by the end of the decade, as a successful, growing company where people can thrive. We get ahead of disease by preventing and treating it with innovation in specialty medicines and vaccines. We focus on four therapeutic areas: respiratory, immunology and inflammation; oncology; HIV; and infectious diseases – to impact health at scale.

People and patients around the world count on the medicines and vaccines we make, so we’re committed to creating an environment where our people can thrive and focus on what matters most. Our culture of being ambitious for patients, accountable for impact and doing the right thing is the foundation for how, together, we deliver for patients, shareholders and our people.

If you require an accommodation or other assistance to apply for a job at GSK, please contact the GSK Service Centre at 1-877-694-7547 (US Toll Free) or +1 801 567 5155 (outside US).

GSK is an Equal Opportunity Employer. This ensures that all qualified applicants will receive equal consideration for employment without regard to race, color, religion, sex (including pregnancy, gender identity, and sexual orientation), parental status, national origin, age, disability, genetic information (including family medical history), military service or any basis prohibited under federal, state or local law.

Important notice to Employment businesses/ Agencies

GSK does not accept referrals from employment businesses and/or employment agencies in respect of the vacancies posted on this site. All employment businesses/agencies are required to contact GSK's commercial and general procurement/human resources department to obtain prior written authorization before referring any candidates to GSK. The obtaining of prior written authorization is a condition precedent to any agreement (verbal or written) between the employment business/ agency and GSK. In the absence of such written authorization being obtained any actions undertaken by the employment business/agency shall be deemed to have been performed without the consent or contractual agreement of GSK. GSK shall therefore not be liable for any fees arising from such actions or any fees arising from any referrals by employment businesses/agencies in respect of the vacancies posted on this site.

Please note that if you are a US Licensed Healthcare Professional or Healthcare Professional as defined by the laws of the state issuing your license, GSK may be required to capture and report expenses GSK incurs, on your behalf, in the event you are afforded an interview for employment. This capture of applicable transfers of value is necessary to ensure GSK’s compliance to all federal and state US Transparency requirements. For more information, please visit the Centers for Medicare and Medicaid Services (CMS) website at https://openpaymentsdata.cms.gov/

Top Skills

Airflow
AWS
Azure
GCP
Google Workflow
Kafka
Python
Spark
Storm

Similar Jobs

10 Days Ago
Easy Apply
Hybrid
Boston, MA, USA
Easy Apply
120K-180K Annually
Mid level
120K-180K Annually
Mid level
Consumer Web • eCommerce • Marketing Tech • Retail • Software • Analytics • Generative AI
As a Software Engineer II on the Data Automation team, you'll design, build, and maintain systems for analytics infrastructure, collaborating with cross-functional teams and leveraging a variety of technologies.
Top Skills: AirflowAWSClickhouseDynamoDBFlinkIcebergKafkaKubernetesMySQLPythonSparkTerraform
10 Days Ago
Easy Apply
Hybrid
Boston, MA, USA
Easy Apply
120K-180K Annually
Mid level
120K-180K Annually
Mid level
Consumer Web • eCommerce • Marketing Tech • Retail • Software • Analytics • Generative AI
As a Software Engineer II on the Data Lake Team, you will design, implement, and optimize large-scale data processing systems and pipelines, collaborating with cross-functional teams to deliver insights and drive technical excellence.
Top Skills: AirflowApache FlinkApache PulsarSparkAWSEmrKafkaKubernetesMySQLPythonSQL
8 Hours Ago
Easy Apply
Hybrid
4 Locations
Easy Apply
230K-280K Annually
Senior level
230K-280K Annually
Senior level
Artificial Intelligence • Big Data • Healthtech • Biotech • Pharmaceutical
Transform Electronic Health Records (EHR) data into structured assets, build scalable data pipelines, and apply Generative AI techniques for analytics within a healthcare-focused team.
Top Skills: DagsterDbtPythonSnowflakeSQL

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account