仕事内容
<h3><strong>About the Role</strong></h3>
<p>Together AI is building the best inference infrastructure for voice applications. Our Voice AI platform powers production-grade, real-time voice agents and applications — serving speech-to-text and text-to-speech models with best-in-class latency and reliability.</p>
<p>We're looking for a Staff ML Engineer to drive the model serving layer for voice workloads. You'll work hands-on with inference engines like TRT-LLM and SGLang to optimize how we serve models like Whisper, Parakeet, Orpheus, and Kokoro — pushing latency and throughput to the frontier. You'll profile GPU utilization, design batching strategies for streaming audio, and ensure new model architectures can go from research to production quickly.</p>
<p>This is a foundational hire on a small, high-impact team. Voice inference has unique challenges — streaming audio, tokenization, real-time latency budgets — that require dedicated ML engineering focus. You'll shape how Together serves voice models as the industry moves from pipeline architectures (ASR → LLM → TTS) toward end-to-end speech-to-speech.</p>
<ul>
<li>Own the model serving stack that powers Together's voice platform across STT, TTS, and speech-to-speech.</li>
<li>Work directly with state-of-the-art accelerators (H100s, H200s, B200s) to optimize voice model inference.</li>
<li>Collaborate with model partners (Cartesia, Deepgram, Rime, and others) to bring their models to production on Together's infrastructure.</li>
<li>Build quality evaluation frameworks that guide model selection for customers and inform the roadmap.</li>
<li>Join a small, early-stage team with outsized impact on a fast-growing product area.</li>
</ul>
<p class="font-claude-response-body break-words whitespace-normal leading-[1.7]"> </p>
<p class="font-claude-response-body break-words whitespace-normal leading-[1.7]"><strong>Responsibilities</strong></p>
<ul class="[li_&]:mb-0 [li_&]:mt-1 [li_&]:gap-1 [&:not(:last-