Description:
Site Reliability Engineers are responsible for designing and implementing infrastructure and systems automation for several HPC clusters. Requires working closely with the Software and Hardware engineering team to define goals and solutions.
Responsibilities:
• Designing and implementing compute pipeline infrastructure (3 clusters and supporting services).
• Evaluate and recommend network software and hardware for the enterprise system including capacity modeling.
• Jointly responsible for maintaining customized LINUX kernel derived from CentOS in support of Production compute.
• Work with core production support personnel in IT and Engineering to automate deployment and operation of the infrastructure.
• Process automation or policy solutions in perl or Python .
• Ability to work with and as a member of the Development Engineering group as required to refine our Production capabilities: testing, kernel issues, compatibility and deployment of new versions of custom software.
• Ability to work well with a team of highly motivated and skilled personnel - interaction and dialog are requisite in this dynamic environment.
Qualifications:
• Senior Linux/Unix/Operating system experience, SRE experience.
• Prior experience with proactive monitoring systems.
• Experience with High Availability NAS and SAN technologies. Isilon experience a plus.
• Strong networking and Windows/Linux interoperability experience.
• BS or MS in Computer Science or Electrical Engineering, or equivalent experience.
• Startup (industry) experience and good cultural fit for a venture-backed startup.
• Good collaboration & communication skills, ability to participate in interdisciplinary team.
For immediate consideration, please email your CV to [email protected]
Please reference Job Code: PH-SRE-001
Site Reliability Engineers are responsible for designing and implementing infrastructure and systems automation for several HPC clusters. Requires working closely with the Software and Hardware engineering team to define goals and solutions.
Responsibilities:
• Designing and implementing compute pipeline infrastructure (3 clusters and supporting services).
• Evaluate and recommend network software and hardware for the enterprise system including capacity modeling.
• Jointly responsible for maintaining customized LINUX kernel derived from CentOS in support of Production compute.
• Work with core production support personnel in IT and Engineering to automate deployment and operation of the infrastructure.
• Process automation or policy solutions in perl or Python .
• Ability to work with and as a member of the Development Engineering group as required to refine our Production capabilities: testing, kernel issues, compatibility and deployment of new versions of custom software.
• Ability to work well with a team of highly motivated and skilled personnel - interaction and dialog are requisite in this dynamic environment.
Qualifications:
• Senior Linux/Unix/Operating system experience, SRE experience.
• Prior experience with proactive monitoring systems.
• Experience with High Availability NAS and SAN technologies. Isilon experience a plus.
• Strong networking and Windows/Linux interoperability experience.
• BS or MS in Computer Science or Electrical Engineering, or equivalent experience.
• Startup (industry) experience and good cultural fit for a venture-backed startup.
• Good collaboration & communication skills, ability to participate in interdisciplinary team.
For immediate consideration, please email your CV to [email protected]
Please reference Job Code: PH-SRE-001