Bioinformatics Data Engineer
Storm3
Levin is the parent company of Storm3
Looking for a Bioinformatics Data Engineer to design and build cutting-edge data and cloud infrastructure capable of handling the immense scale and complexity of our genomic datasets. This role is vital to ensuring that they can efficiently process, store, and analyze petabytes of data, unlocking the full potential of their models and driving discoveries across the life sciences.
You’ll play a pivotal role in building the foundations of our data and cloud architecture. You will have the autonomy to make critical decisions and directly influence the success of our mission to harness the power of data for machine learning.
This is an excellent opportunity to shape the future of AI-driven protein design and to work cross-functionally with a diverse team of experts across machine learning, protein engineering, cell biology, and gene editing.
Responsibilities
- Maintain and expand the world’s largest database of protein sequences
- Deploy cloud-based pipelines to process and search large-scale genomic datasets
- Build cloud databases for scalable storage and fast retrieval of terabases of genomic data, including genomes, genes, proteins, and structures
Qualifications
- BS, MS, or PhD in Bioinformatics, Genomics, Computer Science, or a related quantitative bioscience field
- 3+ years of industry or postdoc experience
- Experience working with Google Cloud Platform (GCP) or other cloud-based compute services (e.g. AWS)
- Experience building cloud pipelines, pipelining tools (snakemake, NextFlow), and containerized applications (docker)
- Experience with highly parallelized cloud-based computing platforms (Batch or Kubernetes)
- Experience with scalable databases (BigQuery, BigTable) and proficient in database programming (SQL)
- Fluent in Python data analysis tools (numpy, pandas, Jupyter notebook, biopython)
- Experience with Linux environments and version control (git)
Preferences
- Experience with bioinformatics tools for sequence and structure analysis
- Experience working with next-generation sequencing data
- Familiarity with public repositories like UniProt, EBI, JGI, NCBI, and SRA
- Familiar with concepts in molecular biology, biochemistry, and structural biology
- Biological knowledge about prokaryotic gene and genome structure
- Publications in major scientific journals or conferences