BoulderRecruiter Since 2001
the smart solution for Boulder jobs

Staff Machine Learning Operations Engineer (Secret) (4172)

Company: Aitopics
Location: Boulder
Posted on: March 12, 2025

Job Description:

Staff Machine Learning Operations Engineer (Secret) (4172)SMXSMX harnesses the transformative power of technology to help realize your digital future.Outside Analytics has recently become a proud subsidiary of SMX, marking an exciting collaboration that enhances our collective capabilities to deliver cutting-edge digital transformation solutions.Are you interested in the next generation of Space Force Remote Sensing capabilities? At Outside Analytics, we're on the ground floor of helping across the future remote sensing ecosystem across all orbital regimes (LEO, MEO, HEO, and GEO)! We build, integrate, and operationally support our customer's emerging space-ground systems to include real-time data processing frameworks, sensor data processing, and data visualization.We are seeking an experienced Machine Learning Operations (MLOps) Engineer to join and help shape our new MLOps team. This role focuses on deploying and optimizing machine learning models for always-on, high-availability systems in real-world, real-time unclassified and classified environments. As part of a new and growing team, you will have the unique opportunity to evangelize MLOps practices, contribute to the development of an on-premises development platform, and drive innovation in mission-critical applications.Position location is on-site in Boulder, CO 5 days per week.Essential Duties & Responsibilities

  • Deploy and maintain high-performing ML models (e.g., ensembles of LSTMs and Random Forests) in real-time environments.
  • Monitor deployed models for drift or performance degradation and implement automated retraining pipelines.
  • Implement advanced deployment strategies (e.g., Blue-Green, Canary, Champion-Challenger).
  • Develop modular and flexible ML pipelines that ensure uptime and reliability.
  • Build and manage scalable infrastructure using Kubernetes, Docker, Terraform, and related tools.
  • Design and implement an on-premises development platform using Kubeflow to replicate cloud capabilities in classified environments.
  • Set up robust monitoring, logging, and alerting systems using Prometheus, Grafana, and Loki.
  • Optimize performance metrics like inference latency and system throughput while ensuring fault tolerance.
  • Work with cross-functional teams, including Data Engineering, Machine Learning, and DevOps, to integrate and enhance ML systems.
  • Define touchpoints and handoffs with DevOps and Data Engineering to ensure seamless integration of ML workflows with existing infrastructure and data pipelines.
  • Mentor junior team members and contribute to building a collaborative and innovative team culture.
  • Other duties as assigned.Required Skills & Experience
    • Secret clearance.
    • 4+ years, including deploying and/or maintaining at least one ML model or pipeline in a production environment.
    • Proficiency in writing clean, maintainable Python code for automation and basic scripting tasks.
    • Basic experience building and maintaining CI/CD pipelines for small-scale projects or systems.
    • Basic familiarity with distributed environments and frameworks like Protobufs or ZeroMQ.
    • Basic familiarity with MLflow, Kubeflow, or similar platforms for managing ML experiments and pipelines.
    • Basic familiarity with Kubernetes and Terraform for managing containerized environments and infrastructure.
    • Strong problem-solving and analytical skills.
    • Excellent communication and collaboration capabilities.
    • Ability to thrive in a dynamic, fast-paced environment.
    • Good written and verbal communication skills.
    • Detail oriented.Desired Skills & Experience
      • Bachelor's, Master's, or PhD in Computer Science, Engineering, or a related technical field.
      • Relevant certifications (e.g., Certified Kubernetes Administrator, Certified Kubernetes Application Developer, Terraform Associate) are a plus.
      • Familiarity with C++ and/or Rust.
      • Experience with workflow orchestration tools such as Airflow or Prefect.
      • Experience with distributed data processing frameworks such as PySpark.
      • Familiarity with SQL and modern database technologies (e.g., MinIO, Yugabyte).
      • Experience with DVC, Ansible, Kustomize, Helm, Prometheus, and Grafana.
      • Understanding of secure software development practices and/or experience working in classified environments.Application Deadline: April 14, 2025The SMX salary determination process takes into account a number of factors, including but not limited to geographic location, Federal Government contract labor categories, relevant prior work experience, specific skills, education, and certifications. At SMX, one of our Core Values is to Invest in Our People so we offer a competitive mix of compensation, learning & development opportunities, and benefits. Some key components of our robust benefits include health insurance, paid leave, and retirement.The proposed salary for this position is: $103,200-$172,000 USD.All qualified candidates will receive consideration for employment without regard to disability status, protected veteran status, race, color, age, religion, national origin, citizenship, marital status, sex, sexual orientation, gender identity or expression, pregnancy, or genetic information.Selected applicant may be subject to a background investigation and/or education verification.
        #J-18808-Ljbffr

Keywords: Aitopics, Boulder , Staff Machine Learning Operations Engineer (Secret) (4172), Engineering , Boulder, Colorado

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category
within


Log In or Create An Account

Get the latest Colorado jobs by following @recnetCO on Twitter!

Boulder RSS job feeds