Position: Big Data ETL Developer
Site: MaRS Centre, Toronto
Department: Informatics and Biocomputing
Reports To: Associate Director, Informatics & Biocomputing
Salary: Commensurate with level of experience
Hours: 35 Hrs/week
Status: Permanent, Full-time
The Ontario Institute for Cancer Research (OICR) is looking for a Big Data ETL Developer to contribute to the technical development of a world-class software platform in cancer genomics. As a successful candidate, you will join the OICR Bioinfomatic Software Development team and will be responsible for designing, developing, and supporting the Big Data ETL pipeline which powers the next generation databases and web portal of the International Cancer Genome Consortium (ICGC) hosted at OICR. You will assist with data analysis, construct technical designs, develop ETL applications, and support existing ETL applications. You will work closely with our team of software engineers and software architect in order to implement solution requirements.
Our software infrastructure is built upon distributed and scalable frameworks such as Hadoop, Cascading, HBase, Elasticsearch and MongoDB. Our development/deployment environment is using Maven, Jenkins, Artifactory, OpenStack and Ansible. The current ICGC infrastructure consists on a cluster of 101 nodes of 64 to 128 GB of RAM and 3-12 TB of local storage.
You are someone who thrives in a fast-paced, data-centric, collaborative environment. You love to automate and you are innovative, creative, and analytical.
The responsibilities include:
- Ensure accuracy and integrity of data and applications through analysis, coding and writing clear documentation
- Implement execution framework, including reporting, auditing and alerting
- Perform data analysis and propose and ensure transformation covers all data possibilities including managing data anomalies
- Provide in-depth troubleshooting skills to assist in resolving errors and performance issues in both ETL and reporting databases
- Develop detailed test plans so that unit and integration testing is comprehensive and covers all possible data outcomes
- Develop and maintain data movement rules, so that the ETL process is clearly defined at every level and data lineage is clearly communicated to business users
- Remain current on integrated product developments that translate to environmental or code modifications
- Lead the ongoing development of technical best practices for data movement, data quality, data cleansing and other ETL-related activities
- Develop ETL applications and guide ETL development activities of other developers
QUALIFICATIONS:
- Bachelor's degree in computer science, engineering, or a related field
- Excellent communication and documentation skills
- Experience with Java and related tooling (Maven, Jenkins, etc.)
- Automation (Bash, AWK, etc.)
- Well-versed in Linux
- Python
- Configuration management / provisioning
- Cloud computing (EC2, OpenStack)
- Experience with Ansible is desirable
- Experience with Hadoop workflows and MapReduce is desirable
- Experience in Cascading, Pig, Scala, Sqoop or Flume is desirable
- Experience with NoSQL databases such as HBase, Elasticsearch and MongoDB is desirable
OICR is an innovative cancer research institute located in the MaRS Centre in the Discovery District in downtown Toronto. OICR is addressing significant challenges in cancer research with multi-disciplinary, multi-institutional teams. New discoveries to prevent, detect and treat cancer will be moved from the bench to practical applications in patients. The OICR team is growing quickly. We are innovative, dedicated professionals who bring expertise to each of our roles. We are looking for individuals interested in being part of a culture of excellence that will result in Ontario being recognized internationally as a leading jurisdiction for cancer research.
Launched in December 2005, OICR is an independent institute funded by the Government of Ontario through the Ministry of Research and Innovation.
For more information about OICR, please visit the website at www.oicr.on.ca.
POSTED DATE: August 21, 2014
CLOSING DATE: Posted until filled
HOW TO APPLY:
Interested candidates can apply here: https://www.recruitingsite.com/csbsi...bNumber=737011
OICR has a diverse workforce and is an equal opportunity employer.
The Ontario Institute for Cancer Research thanks all applicants. However, only those under consideration will be contacted.
Resume Format: If you elect to apply, you will need a text or HTML version of your resume so that you can cut and paste it into the application box provided. Before you submit the completed application, you will be asked to attach one or two files to your application. Please attach your resume as a .doc file.
Site: MaRS Centre, Toronto
Department: Informatics and Biocomputing
Reports To: Associate Director, Informatics & Biocomputing
Salary: Commensurate with level of experience
Hours: 35 Hrs/week
Status: Permanent, Full-time
The Ontario Institute for Cancer Research (OICR) is looking for a Big Data ETL Developer to contribute to the technical development of a world-class software platform in cancer genomics. As a successful candidate, you will join the OICR Bioinfomatic Software Development team and will be responsible for designing, developing, and supporting the Big Data ETL pipeline which powers the next generation databases and web portal of the International Cancer Genome Consortium (ICGC) hosted at OICR. You will assist with data analysis, construct technical designs, develop ETL applications, and support existing ETL applications. You will work closely with our team of software engineers and software architect in order to implement solution requirements.
Our software infrastructure is built upon distributed and scalable frameworks such as Hadoop, Cascading, HBase, Elasticsearch and MongoDB. Our development/deployment environment is using Maven, Jenkins, Artifactory, OpenStack and Ansible. The current ICGC infrastructure consists on a cluster of 101 nodes of 64 to 128 GB of RAM and 3-12 TB of local storage.
You are someone who thrives in a fast-paced, data-centric, collaborative environment. You love to automate and you are innovative, creative, and analytical.
The responsibilities include:
- Ensure accuracy and integrity of data and applications through analysis, coding and writing clear documentation
- Implement execution framework, including reporting, auditing and alerting
- Perform data analysis and propose and ensure transformation covers all data possibilities including managing data anomalies
- Provide in-depth troubleshooting skills to assist in resolving errors and performance issues in both ETL and reporting databases
- Develop detailed test plans so that unit and integration testing is comprehensive and covers all possible data outcomes
- Develop and maintain data movement rules, so that the ETL process is clearly defined at every level and data lineage is clearly communicated to business users
- Remain current on integrated product developments that translate to environmental or code modifications
- Lead the ongoing development of technical best practices for data movement, data quality, data cleansing and other ETL-related activities
- Develop ETL applications and guide ETL development activities of other developers
QUALIFICATIONS:
- Bachelor's degree in computer science, engineering, or a related field
- Excellent communication and documentation skills
- Experience with Java and related tooling (Maven, Jenkins, etc.)
- Automation (Bash, AWK, etc.)
- Well-versed in Linux
- Python
- Configuration management / provisioning
- Cloud computing (EC2, OpenStack)
- Experience with Ansible is desirable
- Experience with Hadoop workflows and MapReduce is desirable
- Experience in Cascading, Pig, Scala, Sqoop or Flume is desirable
- Experience with NoSQL databases such as HBase, Elasticsearch and MongoDB is desirable
OICR is an innovative cancer research institute located in the MaRS Centre in the Discovery District in downtown Toronto. OICR is addressing significant challenges in cancer research with multi-disciplinary, multi-institutional teams. New discoveries to prevent, detect and treat cancer will be moved from the bench to practical applications in patients. The OICR team is growing quickly. We are innovative, dedicated professionals who bring expertise to each of our roles. We are looking for individuals interested in being part of a culture of excellence that will result in Ontario being recognized internationally as a leading jurisdiction for cancer research.
Launched in December 2005, OICR is an independent institute funded by the Government of Ontario through the Ministry of Research and Innovation.
For more information about OICR, please visit the website at www.oicr.on.ca.
POSTED DATE: August 21, 2014
CLOSING DATE: Posted until filled
HOW TO APPLY:
Interested candidates can apply here: https://www.recruitingsite.com/csbsi...bNumber=737011
OICR has a diverse workforce and is an equal opportunity employer.
The Ontario Institute for Cancer Research thanks all applicants. However, only those under consideration will be contacted.
Resume Format: If you elect to apply, you will need a text or HTML version of your resume so that you can cut and paste it into the application box provided. Before you submit the completed application, you will be asked to attach one or two files to your application. Please attach your resume as a .doc file.