The Onyx Research Data Platform organization represents a major investment by GSK R&D and Digital & Tech, designed to deliver a step change in our ability to leverage data, knowledge, and prediction to find new medicines. We are a full-stack shop consisting of product and portfolio leadership, data engineering, infrastructure and DevOps, data / metadata / knowledge platforms, and AI/ML and analysis platforms, all geared toward:
- Building a next-generation data experience for GSK’s scientists, engineers, and decision-makers, increasing productivity, and reducing time spent on “data mechanics”
- Providing best-in-class AI/ML and data analysis environments to accelerate our predictive capabilities and attract top-tier talent
- Aggressively engineering our data at scale to unlock the value of our combined data assets and predictions in real-time
Data Engineering is responsible for the design, delivery, support, and maintenance of industrialised automated end to end data services and pipelines. They apply standardised data models and mapping to ensure data is accessible for end users in end-to-end user tools through use of APIs. They define and embed best practices and ensure compliance with Quality Management practices and alignment to automated data governance. They also acquire and process internal and external, structure and unstructured data in line with Product requirements.
A Senior NLP Data Engineer is a leading technical contributor who can consistently take a poorly defined business or technical problem, work it to a well-defined data problem / specification, and execute on it at a high level. They have a strong focus on metrics, both for the impact of their work and for its inner workings / operations. They are a model for the team on best practice for software development in general (and data engineering in particular), including code quality, documentation, DevOps practices, and testing, and consistently mentor junior members of the team. They ensure robustness of our services and serve as an escalation point in the operation of existing services, pipelines, and workflows
Key Responsibilities :
- Designs, builds, and operates data tools, services, workflows, etc that deliver high value through the solution to high-impact AI-driven products by leveraging modern data engineering tools (e.g. Spark, Kafka, Storm, …) and orchestration tools (e.g. Google Workflow, AirFlow Composer)
- Partners with AIML and knowledge graph platform team to build, test, and deploy NLP and GenAI pipelines, systems and solutions
- Apply graph-based data modelling techniques for efficient organization, integration, and data retrieval to ensure system flexibility and maintainability
- Produces well-engineered software, including appropriate automated test suites, technical documentation, and operational strategy
- Diverse problem solver who surfaces opportunities to reuse modular code and develop microservices to drive efficiencies
- Provides input into the roadmaps of upstream teams (e.g. Data Platforms, DataOps, DevOps) to help improve the overall program of work
- Ensures consistent application of platform abstractions to ensure quality and consistency with respect to logging and lineage
- Fully versed in coding best practices and ways of working, and participates in code reviews and partnering to improve the team’s standards
- Adheres to QMS framework and CI/CD best practices and helps to guide improvements to them that improve ways of working
- Provides leadership to team members to help others get the job done right
Basic Qualifications:
We are looking for professionals with these required skills to achieve our goals:
- Bachelors’ degree in Data Engineering, Computer Science, Software Engineering, or related discipline
- 5+ years of data engineering experience in industry
- Knowledge of NLP and GenAI techniques and experience of processing unstructured data, using vector stores, and approximate retrieval
- Experience with building end-to-end systems based on machine learning or deep learning methods
- Experience overcoming high volume, high compute challenges
- Familiarity with orchestrating tooling
- Cloud experience (e.g., AWS, Google Cloud, Azure)
- Experience in automated testing and design
- Experience with DevOps-forward ways of working
- Deep knowledge and use of at least one common programming language: e.g., Python, Scala, Java
- Deep experience with common big data tools (e.g., Spark, Kafka, Storm, …)
- Proven experience with machine learning algorithms and NLP frameworks like Pytorch, Tensorflow, Spacy, etc.
- Application experience of CI/CD implementations using git and a common CI/CD stack (e.g., Jenkins, CircleCI, GitLab, Azure DevOps) • Experience with agile software development environments using tools like Jira and Confluence
- Experience with Infrastructure as a Code and automation tools (i.e. Terraform)
If you have the following characteristics, it would be a plus:
- Master's or PhD in Data Engineering, Computer Science, Software Engineering, or related discipline
- Good understanding of ontologies and semantic harmonization of data across sources
- Experience implement Generative AI solutions a huge plus
- Proven track record of working with knowledge graphs and graph databases, and in general good understanding of database concepts
- Proficiency in semantic web technologies (SPARQL, RDF, OWL) and harmonization of data
- Experience working with complex biomedical datasets, including genomics, proteomics, and high-throughput screening
#LI-GSK
#GSKOnyx
Please visit GSK US Benefits Summary to learn more about the comprehensive benefits program GSK offers US employees.
Why GSK?
Uniting science, technology and talent to get ahead of disease together.
GSK is a global biopharma company with a special purpose – to unite science, technology and talent to get ahead of disease together – so we can positively impact the health of billions of people and deliver stronger, more sustainable shareholder returns – as an organisation where people can thrive. We prevent and treat disease with vaccines, specialty and general medicines. We focus on the science of the immune system and the use of new platform and data technologies, investing in four core therapeutic areas (infectious diseases, HIV, respiratory/ immunology and oncology).
Our success absolutely depends on our people. While getting ahead of disease together is about our ambition for patients and shareholders, it’s also about making GSK a place where people can thrive. We want GSK to be a place where people feel inspired, encouraged and challenged to be the best they can be. A place where they can be themselves – feeling welcome, valued, and included. Where they can keep growing and look after their wellbeing. So, if you share our ambition, join us at this exciting moment in our journey to get Ahead Together.
If you require an accommodation or other assistance to apply for a job at GSK, please contact the GSK Service Centre at 1-877-694-7547 (US Toll Free) or +1 801 567 5155 (outside US).
GSK is an Equal Opportunity Employer. This ensures that all qualified applicants will receive equal consideration for employment without regard to race, color, religion, sex (including pregnancy, gender identity, and sexual orientation), parental status, national origin, age, disability, genetic information (including family medical history), military service or any basis prohibited under federal, state or local law.
Important notice to Employment businesses/ Agencies
GSK does not accept referrals from employment businesses and/or employment agencies in respect of the vacancies posted on this site. All employment businesses/agencies are required to contact GSK's commercial and general procurement/human resources department to obtain prior written authorization before referring any candidates to GSK. The obtaining of prior written authorization is a condition precedent to any agreement (verbal or written) between the employment business/ agency and GSK. In the absence of such written authorization being obtained any actions undertaken by the employment business/agency shall be deemed to have been performed without the consent or contractual agreement of GSK. GSK shall therefore not be liable for any fees arising from such actions or any fees arising from any referrals by employment businesses/agencies in respect of the vacancies posted on this site.
Please note that if you are a US Licensed Healthcare Professional or Healthcare Professional as defined by the laws of the state issuing your license, GSK may be required to capture and report expenses GSK incurs, on your behalf, in the event you are afforded an interview for employment. This capture of applicable transfers of value is necessary to ensure GSK’s compliance to all federal and state US Transparency requirements. For more information, please visit the Centers for Medicare and Medicaid Services (CMS) website at https://openpaymentsdata.cms.gov/
Top Skills
Similar Jobs
What you need to know about the Boston Tech Scene
Key Facts About Boston Tech
- Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
- Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
- Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
- Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories