仕事内容
<p><strong>Senior Machine Learning Engineer - Model Evaluations, Public Sector</strong></p>
<p>The Public Sector ML team at Scale deploys advanced AI systems—including LLMs, agentic models, and multimodal pipelines—into mission-critical government environments. We build evaluation frameworks that ensure these models operate reliably, safely, and effectively under real-world constraints. As an ML Engineer, you will design, implement, and scale automated evaluation pipelines that help customers trust and operationalize advanced AI systems across defense, intelligence, and federal missions.</p>
<p><strong>You will:</strong></p>
<ul>
<li>Develop and maintain automated evaluation pipelines for ML models across functional, performance, robustness, and safety metrics, including LLM-judge–based evaluations.</li>
<li>Design test datasets and benchmarks to measure generalization, bias, explainability, and failure modes.</li>
<li>Build evaluation frameworks for LLM agents, including infrastructure for scenario-based and environment-based testing.</li>
<li>Conduct comparative analyses of model architectures, training procedures, and evaluation outcomes.</li>
<li>Implement tools for continuous monitoring, regression testing, and quality assurance for ML systems.</li>
<li>Design and execute stress tests and red-teaming workflows to uncover vulnerabilities and edge cases.</li>
<li>Collaborate with operations teams and subject matter experts to produce high-quality evaluation datasets.</li>
<li>Comfortable with light travel (approximately 10%) for customer interaction and team needs.</li>
</ul>
<p><em>This role will require an active security clearance or the ability to obtain a security clearance.</em></p>
<p><strong>Ideally you’d have:</strong></p>
<ul>
<li>Experience in computer vision, deep learning, reinforcement learning, or NLP in production settings.</li>
<li>Strong programming skills in Python; experience with TensorFlow or PyTorch.</li>
<li>Background in algorithms, data
求めるスキル
Python
PyTorch
TensorFlow
LLM
LoRA
AWS
GCP
NLP