Programs and Analyzes genetic data as well as epidemiological data from various sources using several genetic analysis software packages. Prepares reports and summarizes data and analyses. Manipulates, manages and archives datasets of different formats in High Performance Computing (HPC) environment. Requires a Master degree in quantitative biomedical science fields (such as, but not limited to, genetic epidemiology, quantitative epidemiology, biostatistics, statistical genetics, bioinformatics, computational biology, systems biology with 1-2 years of experience in high-throughput/high dimension data analyses). Data scientist I or II (advanced level) positions will depend on applicant’s experience, skill sets and relevant research background. The majority of the time, the Programmer/Analyst works under immediate supervision of faculty, but applicants should be capable of some independent work on programming using statistical package/tools.
The data scientist will provide programming and computing support related to genetic epidemiological studies and bioinformatics, such as next generation sequencing, genome wide association studies, pathway analysis, large-scale genome-wide meta-analysis, expression QTL analysis, other relevant bioinformatics analyses as well as simulation studies. S/he should be familiar with and have experience using at least one of the general statistical tools, such as R (preferable), SAS, MatLab; at least one of several data manipulating tools, such as C/C++, AWK, Perl, Python, etc.; at least one of the publicly available genome browsers, such as NCBI Genome Browser, UCSC Genome Browser, Ensembl Genome Browser, etc. Skills in creating and/or presenting graphical and numerical data should be well-developed.
• Analysis and Reporting Functions: Provide analytic support for analytical projects, abstract, manuscript, and grant proposal submissions under the direction of the Investigators. Complete appropriate analyses; Prepares tables and graphics of publication quality
• Data manipulation: Prepare working dataset in different formats based on the requirement of the analytical tools/packages.
• Data management: Create dataset documentation, including codebooks, written descriptions of samples, and data summaries. Manage datasets in file servers and cloud platforms, and ensure appropriate backups on the network and on storage networks within the high performance cluster.
• Generate analysis pipelines for commonly performed analyses
• Participate and contribute to the planning or implantation of enhanced high performance computing tools or platforms for the Institute
Applicants with a degree relevant to quantitative/computational biology (such as statistical genetics, population genetics, genetic epidemiology or bioinformatics) are preferred. Typically this will be a Master degree but in many cases a more advanced degree such as a PhD. Consideration will be given for applicants with a Bachelor’s degree who have all of the qualifications specified. Specialized training or experience in statistical tools, such as R or Perl languages is required. Knowledge in C/C++, AWK or related languages that can be used to manipulate and re-format large-size files. Familiarity with SAS is preferred. Experience using HPC system(s) running LINUX or UNIX OS and experience with Cloud-computing platforms such as Amazon Web Services (AWS) or Google Cloud, or domain-specific products (DNANexus, FireCloud/TerraBio, SevenBridges, etc.) is strongly preferred.
Two to five years of programming experience for research studies is also preferred unless the candidate has superior computer programming skills. Applicants should have the ability to acquire new knowledge and skills to be able to perform analytical tasks using newly developed statistical methods/tools. Applicants should have the ability to work simultaneously on multiple projects with strict attention to detail, ability to work independently as well as part of a team, as well as superb oral and/or written English language skills.