仕事内容
<h3>About the Role</h3>
<p>Together AI is building the AI Acceleration Cloud, an end-to-end platform for the full generative AI lifecycle, combining the fastest LLM inference engine with state-of-the-art AI cloud infrastructure.</p>
<p>As a Senior Backend Engineer, you will play a key role in building the next generation AI cloud platform – a highly available, global, blazing-fast cloud infrastructure that virtualizes cutting-edge ML hardware (GB200s/GB300s, BlueField DPUs) and enables state-of-the-art ML practitioners with self-serve AI cloud services, such as on-demand + managed Kubernetes and Slurm clusters. This platform serves both our internal StaaS products (inference, fine-tuning) and our external cloud customers, spanning dozens of data centers across the world.</p>
<p>Some of what you’ll work on:</p>
<ul>
<li>Work on a distributed GPU scheduling system for the on-demand clusters product, Instant Clusters.</li>
<li>Build out a global management plane for managing our data center compute, networking, and storage.</li>
<li>Design and build new customer-facing cloud platform services, delivering killer enterprise AI cloud features.</li>
</ul>
<h3><strong>Responsibilities</strong></h3>
<ul>
<li>Identify, design, and develop foundational backend services that power Together’s cloud platform</li>
<li>Analyze and improve the robustness and scalability of existing distributed systems, APIs, databases, and infrastructure</li>
<li>Partner with product teams to understand functional requirements and deliver solutions that meet business needs</li>
<li>Write clear, well-tested, and maintainable software and IaC for both new and existing systems</li>
<li>Conduct design and code reviews, create developer documentation, and develop testing strategies for robustness and fault tolerance</li>
<li>Participate in an on-call rotation to address critical incidents when necessary</li>
</ul>
<h3><strong>Requirements</strong></h3>
<ul>
<li>5+ years of demonstrated experience in buil
求めるスキル
LLM
Kubernetes
AWS
GCP
Azure