HPC Systems Administrator at DeepScale

DeepScale

HPC Systems Administrator

Full-Time in Mountain View, CA - Engineering

DeepScale, Inc. is a fast growing start-up in the Advanced Driving space, providing perceptual systems for Advanced Driver Assist Systems and Autonomous Vehicles. DeepScale uses deep learning to build accurate and efficient perception systems that enable automated machines to “see”. Our software takes input from sensors and produces an environmental model of the real world. Our prior work has produced neural nets that maintain state-of-the-art accuracy but are up to 500x smaller than other nets designed for the same task. We have thought leaders and experienced practitioners in computer vision, AI-powered 3D reconstruction, and deploying small neural nets in embedded applications.

Job description

As HPC Systems Administrator for DeepScale your role will be to help design and build clusters using state of the art hardware for Deep Learning for autonomous cars.

Responsibilities

  • Management of local cluster in a High Performance Computing Setting
  • Develop usage policies for deep learning training
  • Develop tools and infrastructure to scale deep learning
  • Maintain network infrastructure for local and cloud compute
  • Data management and backups


Important Qualifications

  • Excellent scripting abilities with fluency in bash or python, and appropriate use of built-in functions and libraries
  • Familiarity with environments including LDAP, NFS, bare metal GPU servers, deployments and automation / configuration management, modular user shell environments, networking
  • Hands on server hardware configuration experience
  • Experience with cluster management software
  • Comfortable with GPU servers
  • Proficiency with source control, continuous integration and testing methods (git)
  • Strong knowledge of Linux systems and internals (Debian preferred) with a good understanding of networking and related protocols, OS customization, and package management (APT)

Nice-to-have Qualifications

  • Hands on Infiniband experience
  • Have used or developed metrics/analytics tools for usage
  • Experience with Slurm or similar job systems


Education/Experience Required

  • Minimum BS Computer Engineering
  • 4 years’ work experience in a related field


Location

  • Mountain View, CA