Computational Science Analyst II

Cold Spring Harbor Laboratory
One Bungton RD Cold Spring Harbor NY
Cold Spring Harbor Laboratory is seeking an individual to fill a position as a Computational Science Analyst I. The responsibilities associated with this position include working under the supervision of a faculty member, postdoc, graduate student or Scientific Informatics Manager to solve biological problems that require considerable independent thinking. The successful candidate will work with whole-slide digital pathology images among a team of investigators exploring artificial intelligence in digital pathology. Our focus is on automated annotation of image data, and markers of diagnostic accuracy. The team includes computational neuroscientists (Prof. Partha Mitra and colleagues at CSHL), pathologists (Dr. James Crawford and colleagues at Northwell Health), collaborators at the US FDA, and other academic centers. The analyst is expected to work with this group, but also be self-motivated and able to work independently.

We are currently seeking a Data engineer/analyst with expertise in big image data and a background in machine learning to work on a petabyte+ dataset of histological brain image volumes and digital whole-slide histopathology images acquired from medical practice. A successful candidate should be comfortable working in a Linux environment and distributed/networked computation, and be able to participate in maintaining and growing a large storage and compute cluster.

-The individual is expected to be able to build efficient, flexible, extensible, and scalable solutions to system administration problems and big data handling.
-Develop and translate algorithms (image processing) that integrate into working prototype code.
-Create algorithms/heuristics to extract information from large data sets and implement into software/scripts.
-Maintain and enhance data pipeline (image handling, cluster) for scalability and reliability.
-Mine and organize data sets of both structured and unstructured data.
-Design, implement, and support a platform to provide ad-hoc access to large image datasets.
-Develop interactive dashboards, reports, and analysis templates.

-Master’s degree or PhD degree in Computer or Data Science, Machine Vision, Artificial Intelligence, Machine Learning, or related technical field (Mathematics/Statistics, physical science/ engineering strongly desired).
-Linux and software development skills are required together with experience coding in C/C++, Python and associated languages. (MATLAB experience is desirable.)
- Database engineering and coding skills are required, including those for big-data (MySQL/NoSQL, etc).
- Experience with the software stack/framework relevant for distributed processing of big data is required (e.g. Spark, SGE)
-Experience in building or maintaining, and interacting with, large, scalable, or high-performance computer systems is required. (GPU coding & Production backup processes a plus)
-Clustering of OS/Applications, and understanding points of scalability
-Experience supporting Apache/NGINX, load balancers
-AWS / Google Cloud experience a plus

