Logo Agile Robots Ag

ML Platform Engineer

Job

  • Level
    Erfahren
  • Job Feld
    Software, Data
  • Anstellung
    Vollzeit
  • Vertragsart
    Unbefristetes Dienstverhältnis
  • Ort
    München
  • Arbeitsmodell
    Onsite
  • Job Zusammenfassung

    In dieser Position entwickelst du die Infrastruktur für die verteilte Trainings-, Bereitstellungs- und Experimentierumgebung, wobei du Technologien wie Kubernetes und PyTorch verwendest, um ML-Modelle effizient in die Produktion zu überführen.

    Job Technologien

    Deine Rolle im Team

    • The AI Research Division of Agile Robots is looking for an ML Platform Engineer, who will build and operate the distributed training, deployment, and experimentation infrastructure that research, data, and robotics teams depend on to move models from prototype to production.
    • Design and scale distributed training workflows for large models using tools such as PyTorch Distributed, DeepSpeed, and cluster schedulers like SLURM or Kubernetes.
    • Build and maintain containerised ML environments that support reproducible experimentation and benchmarking.
    • Develop and maintain CI/CD pipelines for machine learning systems to enable reliable testing, training, and deployment of models.
    • Implement experiment tracking, model versioning, and reproducibility workflows using tools such as ClearML or Weights & Biases.
    • Set up monitoring systems such as Prometheus and Grafana to track model performance and system health and detect drift in production.
    • Work with research, data, and robotics teams to connect new models to robust production systems.

    Unsere Erwartungen an dich

    Ausbildung

    • Degree in Computer Science, Software Engineering, or a related field, with professional experience building and operating ML or software infrastructure in production.

    Qualifikationen

    • Familiarity with infrastructure-as-code tools such as Terraform.
    • Exposure to high-performance or distributed compute environments.

    Erfahrung

    • Experience designing and operating distributed training systems on Kubernetes and Docker, using PyTorch Distributed, DeepSpeed, and schedulers such as SLURM.
    • Experience building CI/CD pipelines that support reliable model testing, training, and deployment.
    • Experience operating ML workloads on cloud infrastructure, preferably AWS.
    • Hands-on experience with experiment tracking and model versioning using tools such as MLflow or Weights & Biases.
    • Experience with monitoring and drift detection using tools such as Prometheus and Grafana.
    • Python and system design skills, with experience building and operating ML systems beyond the prototype stage.
    • Experience with large-scale or multimodal ML systems such as vision-language-action models.
    • Experience with ML pipeline and orchestration tools.

    Unser Angebot

    • Dynamic high-tech company combined with financial soundness and world class investors.
    • Join an interdisciplinary, international team with 60+ different nationalities in a collaborative work environment.
    • Lots of development opportunities in the context of our continued growth.
    • Challenging tasks and impactful projects alongside experts that enable professional and personal growth.
    • Corporate Benefits Program that covers health, mobility and learning with 100 € net per month.
    • Modern office facilities with a rooftop terrace overlooking Munich, free drinks & fruits, and regular company events contribute to a good working environment.

    Benefits

    Gesundheit, Fitness & Fun

    Work-Life-Integration

    Themen mit denen du dich im Job beschäftigst

    Job Standorte

    • Standort München

      Bayern

      Deutschland

    Das ist dein Arbeitgeber

    Agile Robots Ag

    Agile Robots Ag

    Agile Robots SE, gegründet von führenden Robotik-Forschern, fokussiert sich auf die Entwicklung von KI-gesteuerten Robotern und hat sich als Vorreiter in der Automatisierung etabliert.

    Description

  • Unternehmenstyp
    Etablierte Firma
  • Arbeitsmodell
    Onsite
  • Branche
    Elektronik, Automatisation
  • Logo Agile Robots Ag

    ML Platform Engineer

    Ort
    München
    Arbeitsmodell
    Onsite
    Diversität
    Für alle Personen geeignet (m/w/d)
    Nur Englisch
    Nur Englisch erforderlich

    Weitere Jobs