仕事内容
<h3><strong>About the Role</strong></h3>
<p>Our team focuses on enabling custom models and dedicated inference on Together. We are responsible for building a container platform, optimizing autoscaling, minimizing cold starts, achieving the best end-to-end model performance, and providing a best-in-class developer experience with great tooling. We often focus on video or audio generation across the stack: CUDA kernels, pytorch optimization, inference engines, container orchestration, queueing theory, etc. An ideal candidate will be great at profiling/optimization but know the word kubernetes, or be intimately familiar with multi-cluster scheduling and have some sense of ML bottlenecks.</p>
<h3>Responsibilities</h3>
<ul>
<li>New hires may work on multi-cluster orchestration, portfolio optimization, predictive autoscaling, control panes, model bring-up, model optimization, APIs for managing deployments, inference worker SDKs, and CLI tools.</li>
<li>Analyze and improve the robustness and scalability of existing distributed systems, APIs, databases, and infrastructure</li>
<li>Partner with product teams to understand functional requirements and deliver solutions that meet business needs</li>
<li>Write clear, well-tested, and maintainable software and IaC for both new and existing systems</li>
<li>Conduct design and code reviews, create developer documentation, and develop testing strategies for robustness and fault tolerance</li>
</ul>
<h3>Requirements</h3>
<ul>
<li>5+ years of demonstrated experience in building large scale, fault tolerant, distributed systems.</li>
<li>Experience running serverless inference platforms, doing model bring-up on short notice, being on call, or running a cloud provider is a very big plus</li>
<li>Good taste and ability to thoughtfully discuss how what you’ve built has failed over time</li>
<li>Experience designing, analyzing and improving efficiency, scalability, and stability of various system resources</li>
<li>Excellent understanding o
求めるスキル
Python
PyTorch
CUDA
LLM
Kubernetes
Rust
Go
C++