Data Curator/ Data Engineer – Computational Pathology, Oncology
Thursday 24th June 2021
Email Address: email@example.com
Close Date: 15th July 2021
Do you have a passion for data? Would you like to apply your expertise to accelerate, improve and help automate our data flow and provide our scientists with FAIR and ready to use data at their fingertips? in a company that follows the science and turns ideas into life changing medicines? Then AstraZeneca might be the one for you!
AstraZeneca is a global, science-led, patient-focused biopharmaceutical company that focuses on the discovery, development and commercialisation of prescription medicines for some of the world’s most serious diseases. But we’re more than one of the world’s leading pharmaceutical companies. At AstraZeneca we’re dedicated to being a Great Place to Work. Where you are empowered to push the boundaries of science and unleash your entrepreneurial spirit. There’s no better place to make a difference to medicine, patients and society.
Welcome to Computational Pathology Munich, one of over 400 sites here at AstraZeneca, providing a collaborative environment where everyone feels comfortable and able to be themselves is at the core of AstraZeneca’s priorities, it’s important to us that you bring your full self to work every day. To help you maintain your best self, here’s a sneak peek into some of the things this site provides for you: After-work events, Lunch & Learns, Bright and spacious environment, Sustainable office working environment, Networking events, family and childcare support and of course the Alps around the corner for hiking, biking and skiing fun.
Be part of fulfilling our ambition to be world leaders in Oncology. We are already the fastest growing team within AstraZeneca and across the industry, and there are countless new indications and targets in our game-changing pipeline. We deliver this value through launch excellence, commercial effectiveness and maximising the lifecycle. By leveraging our commercialised portfolio we are confident we can change the practice of medicine and redefine cancer treatment.
We’re brave disruptors – entrepreneurial, courageous and pioneering in our approach. Here you have the opportunity to step up, take personal accountability and lead changes in our ever-evolving environment.
With pace and drive, comes trust that we will get it done. Embrace the freedom to create and expand your horizons. Always backed and supported by pioneering leaders, this is the place to build a world-class career that’s meaningful and rewarding. Here we’re on a journey to becoming digitally-enabled, to discover new ways of offering better solutions to our patients. Join the team with a vision to use data as a tool, to build a better, deeper, more personal understanding of the people we’re helping. To ultimately deliver better outcomes for them – through dynamic omnichannel content, personal relationships and experience.
What you’ll do
As a Data Curator/ Data Engineer you will help driving our mission to streamline and automate the TM data flow breaking up data silos and establishing a FAIR data environment while supporting on-time data provisioning and efficient, high quality data consumption. You will be responsible for (semantically) cleaning up and integrating large datasets and will use your understanding on current gaps within the data flow to improve and automate our data pipelines. Understanding how our scientists address key scientific questions, data collection and analysis and how these processes can be formalized to improve data (re)use, data quality and decision-making will be critical to success in this role.
Main Duties and Responsibilities: Data capture, (semantic) cleaning and integration as a preparation step for bioinformatician/data scientist activities, while protecting privacy and compliance directives. Develop data standards, vocabularies and dictionary as a baseline for a coherent data flow together with our data governance team and scientific stakeholders Support/drive development of ETL pipelines in order to automate of our data flow through the system landscape Work with bioinformaticians, data scientist and data management teams to develop best practices in data wrangling/curation and storage
Essential for the role Bioinformatics, data science or related field (e.g. Physics, Math) with a focus on Life Science/ Pharma R&D (Master level) Expertise in data curation/management and integration with data entity focus genomics, transcriptomics, proteomics Good programming skills (experience in Python preferred, R) Understanding of the FAIR data principles and how to translate and implement requirements into a typical data flow Experienced working within cross-functional teams, including business data owners/stewards and technical product development staff
Desirable for the role Experience with common biomedical vocabularies (Gene Ontology, NCI Thesaurus, ISA-Tab, MeSH, Human Disease Onotolgy, BAO, EFO, Human Phenotype Ontology, others), ontology repositories (NCBO Bioportal, EBI OLS) and common reference databases (e.g. Uniprot, Ensembl, CHEMBL, EntrezGene, ClinicalTrials.gov). Experience with various data sources from different scientific domains, both structured and unstructured (e.g. HDFS, SQL, noSQL) Experience working across multiple scientific compute environments to create data workflows and pipelines (e.g. HPC, cloud, Unix/Linux systems) Various types of databases/datastores and semantic skills such as Oracle, Mysql, ElasticSearch, MongoDB, RDF/TripleStores etc.