Parallel & Embedded Tools Computing Engineer (Deployment/Computing) at DeepScale


Parallel & Embedded Tools Computing Engineer (Deployment/Computing)

Full-Time in Mountain View, CA - Mid Level - Engineering

DeepScale, Inc. is a fast growing start-up in the Advanced Driving space, providing perceptual systems for Advanced Driver Assist Systems and Autonomous Vehicles. DeepScale uses deep learning to build accurate and efficient perception systems that enable automated machines to “see”. Our software takes input from sensors and produces an environmental model of the real world. Our prior work has produced neural nets that maintain state-of-the-art accuracy but are up to 500x smaller than other nets designed for the same task. We have thought leaders and experienced practitioners in computer vision, AI-powered 3D reconstruction, and deploying small neural nets in embedded applications.

Job Description

Find, customize, and/or implement systems for deploying deep-learning based applications to modern embedded and parallel platforms. Additionally, depending on experience, skills, and inclination, work may extend to custom development/implementation of core NN computations (see Nice-to-have Qualifications).


  • Evaluation of existing libraries, toolkits, and flows for performing deep learning on embedded linux platforms, ranging from ~10W ARM CPU based SBCs to ~200+W NVIDIA Drive PX modules.
  • Profiling, Optimizing, and/or Implementing critical and/or custom NN operations for various target platforms.

Important Qualifications

  • Be able to use cross-compilation toolchains for targeting a variety embedded platforms.
  • Experience with NN/Deep-learning toolkits and libraries.
  • Experience with the design of middleware for deploying NN applications.
  • Proficiency with C/C++ systems/performance programming and optimization.

Nice-to-have Qualifications

  • Experience with high performance GPU programming in CUDA, OpenCL, and/or other environments (DSP, custom accelerators).
  • Experience with implementing high performance numerical computation, such as Matrix Multiply and/or CNN convolution.
  • Experience with vectorization/SIMD-optimization using ARM NEON or similar instruction sets.

Education/Experience Required

  • Minimum BS
  • 4 years’ work experience in a related field