As a Machine Learning Infrastructure Engineer, you will be responsible for developing and deploying a robust full-stack pipeline that can support various perception/planning/prediction projects. In this role, you will build up various tools that will optimize efficiency when working with large data sets. This includes developing Data Stores, implementing Data Visualization utilities, and deriving quantifiable metrics to support the Machine Learning pipeline. Your actions will directly contribute to the development of safer, more efficient machines for Caterpillar’s customers. Basic duties include building and supporting the tools, scripts, and processes for managing data and training pipelines for machine learning.

Responsibilities will include creating tools, scripts, and process for

-Extracting, processing, and organizing raw data (images, lidar)
-Submitting images to external vendors for annotation
-Retrieving image annotations from external vendors
-Normalizing and integrating these annotations into our infrastructure.
-Providing basic analytics, such as total number of datasets, frames, and annotations.
-Interactive ML dataset creation and management.
-Support for data visualization tools
-ML model training, evaluation, and deployment.

Basic Qualifications

Bachelor’s Degree in Computer Science, Engineering, Mathematics, or an equivalent discipline

Top Candidates Will Also Have

-Master’s Degree in Computer Science, Engineering, Mathematics, or an equivalent discipline
-3+ years of experience in ML pipeline development/setup/maintenance
-AWS S3 and SageMaker
-Experience in Linux environments
-Scala and scripting languages
-Python and creating/managing Jupyter notebooks
-Database configuration, scripting
-Familiarity with ROS
-Experience with Node.js and React
-Exposure to with ML/deep learning, computer vision, perception algorithms
-Experience working with data annotation vendors
-Experience integrating with CI/regression testing frameworks

