仕事内容
<h3><strong>About the Role</strong></h3>
<p>Together AI is seeking a Machine Learning Engineer to join our<strong> </strong>Inference Engine team, focusing on optimizing and enhancing the performance of our AI inference systems. This role involves working with state-of-the-art large language models models and ensuring they run efficiently and effectively at scale. If you are passionate about AI inference, PyTorch, and developing high-performance systems, we want to hear from you. This position offers the chance to collaborate closely with AI researchers and engineers to create cutting-edge AI solutions. Join us in shaping the future at Together AI!</p>
<h3><strong>Responsibilities</strong></h3>
<ul>
<li>Design and build the production systems that power the Together AI inference engine, enabling reliability and performance at scale.</li>
<li>Develop and optimize runtime inference services for large-scale AI applications.</li>
<li>Collaborate with researchers, engineers, product managers, and designers to bring new features and research capabilities to the world.</li>
<li>Conduct design and code reviews to ensure high standards of quality.</li>
<li>Create services, tools, and developer documentation to support the inference engine.</li>
<li>Implement robust and fault-tolerant systems for data ingestion and processing.</li>
</ul>
<h3><strong>Requirements</strong></h3>
<ul>
<li>3+ years of experience writing high-performance, well-tested, production-quality code.</li>
<li>Proficiency with Python and PyTorch.</li>
<li>Demonstrated experience in building high performance libraries and tooling.</li>
<li>Excellent understanding of low-level operating systems concepts including multi-threading, memory management, networking, storage, performance, and scale.</li>
<li>Preferred: Knowledge of existing AI inference systems such as TGI, vLLM, TensorRT-LLM, Optimum</li>
<li>Preferred: Knowledge of AI inference techniques such as speculative decoding.</li>
<li>Preferred: Knowledge
求めるスキル
Python
PyTorch
CUDA
LLM
Rust