Staff Machine Learning Operations Engineer (Secret) (4172)
Company: Aitopics
Location: Boulder
Posted on: March 12, 2025
Job Description:
Staff Machine Learning Operations Engineer (Secret) (4172)SMXSMX
harnesses the transformative power of technology to help realize
your digital future.Outside Analytics has recently become a proud
subsidiary of SMX, marking an exciting collaboration that enhances
our collective capabilities to deliver cutting-edge digital
transformation solutions.Are you interested in the next generation
of Space Force Remote Sensing capabilities? At Outside Analytics,
we're on the ground floor of helping across the future remote
sensing ecosystem across all orbital regimes (LEO, MEO, HEO, and
GEO)! We build, integrate, and operationally support our customer's
emerging space-ground systems to include real-time data processing
frameworks, sensor data processing, and data visualization.We are
seeking an experienced Machine Learning Operations (MLOps) Engineer
to join and help shape our new MLOps team. This role focuses on
deploying and optimizing machine learning models for always-on,
high-availability systems in real-world, real-time unclassified and
classified environments. As part of a new and growing team, you
will have the unique opportunity to evangelize MLOps practices,
contribute to the development of an on-premises development
platform, and drive innovation in mission-critical
applications.Position location is on-site in Boulder, CO 5 days per
week.Essential Duties & Responsibilities
- Deploy and maintain high-performing ML models (e.g., ensembles
of LSTMs and Random Forests) in real-time environments.
- Monitor deployed models for drift or performance degradation
and implement automated retraining pipelines.
- Implement advanced deployment strategies (e.g., Blue-Green,
Canary, Champion-Challenger).
- Develop modular and flexible ML pipelines that ensure uptime
and reliability.
- Build and manage scalable infrastructure using Kubernetes,
Docker, Terraform, and related tools.
- Design and implement an on-premises development platform using
Kubeflow to replicate cloud capabilities in classified
environments.
- Set up robust monitoring, logging, and alerting systems using
Prometheus, Grafana, and Loki.
- Optimize performance metrics like inference latency and system
throughput while ensuring fault tolerance.
- Work with cross-functional teams, including Data Engineering,
Machine Learning, and DevOps, to integrate and enhance ML
systems.
- Define touchpoints and handoffs with DevOps and Data
Engineering to ensure seamless integration of ML workflows with
existing infrastructure and data pipelines.
- Mentor junior team members and contribute to building a
collaborative and innovative team culture.
- Other duties as assigned.Required Skills & Experience
- Secret clearance.
- 4+ years, including deploying and/or maintaining at least one
ML model or pipeline in a production environment.
- Proficiency in writing clean, maintainable Python code for
automation and basic scripting tasks.
- Basic experience building and maintaining CI/CD pipelines for
small-scale projects or systems.
- Basic familiarity with distributed environments and frameworks
like Protobufs or ZeroMQ.
- Basic familiarity with MLflow, Kubeflow, or similar platforms
for managing ML experiments and pipelines.
- Basic familiarity with Kubernetes and Terraform for managing
containerized environments and infrastructure.
- Strong problem-solving and analytical skills.
- Excellent communication and collaboration capabilities.
- Ability to thrive in a dynamic, fast-paced environment.
- Good written and verbal communication skills.
- Detail oriented.Desired Skills & Experience
- Bachelor's, Master's, or PhD in Computer Science, Engineering,
or a related technical field.
- Relevant certifications (e.g., Certified Kubernetes
Administrator, Certified Kubernetes Application Developer,
Terraform Associate) are a plus.
- Familiarity with C++ and/or Rust.
- Experience with workflow orchestration tools such as Airflow or
Prefect.
- Experience with distributed data processing frameworks such as
PySpark.
- Familiarity with SQL and modern database technologies (e.g.,
MinIO, Yugabyte).
- Experience with DVC, Ansible, Kustomize, Helm, Prometheus, and
Grafana.
- Understanding of secure software development practices and/or
experience working in classified environments.Application Deadline:
April 14, 2025The SMX salary determination process takes into
account a number of factors, including but not limited to
geographic location, Federal Government contract labor categories,
relevant prior work experience, specific skills, education, and
certifications. At SMX, one of our Core Values is to Invest in Our
People so we offer a competitive mix of compensation, learning &
development opportunities, and benefits. Some key components of our
robust benefits include health insurance, paid leave, and
retirement.The proposed salary for this position is:
$103,200-$172,000 USD.All qualified candidates will receive
consideration for employment without regard to disability status,
protected veteran status, race, color, age, religion, national
origin, citizenship, marital status, sex, sexual orientation,
gender identity or expression, pregnancy, or genetic
information.Selected applicant may be subject to a background
investigation and/or education verification.
#J-18808-Ljbffr
Keywords: Aitopics, Boulder , Staff Machine Learning Operations Engineer (Secret) (4172), Engineering , Boulder, Colorado
Didn't find what you're looking for? Search again!
Loading more jobs...