Together AI

公式採用ページ

Machine Learning, Platform Engineer

0万円〜 0万円

San Francisco

正社員・契約社員

経験年数：

仕事内容

<h3><strong>About the Role</strong></h3> <p>Our team focuses on enabling custom models and dedicated inference on Together. We are responsible for building a container platform, optimizing autoscaling, minimizing cold starts, achieving the best end-to-end model performance, and providing a best-in-class developer experience with great tooling. We often focus on video or audio generation across the stack: CUDA kernels, pytorch optimization, inference engines, container orchestration, queueing theory, etc. An ideal candidate will be great at profiling/optimization but know the word kubernetes, or be intimately familiar with multi-cluster scheduling and have some sense of ML bottlenecks.</p> <h3>Responsibilities</h3> <ul> <li>New hires may work on multi-cluster orchestration, portfolio optimization, predictive autoscaling, control panes, model bring-up, model optimization, APIs for managing deployments, inference worker SDKs, and CLI tools.</li> <li>Analyze and improve the robustness and scalability of existing distributed systems, APIs, databases, and infrastructure</li> <li>Partner with product teams to understand functional requirements and deliver solutions that meet business needs</li> <li>Write clear, well-tested, and maintainable software and IaC for both new and existing systems</li> <li>Conduct design and code reviews, create developer documentation, and develop testing strategies for robustness and fault tolerance</li> </ul> <h3>Requirements</h3> <ul> <li>5+ years of demonstrated experience in building large scale, fault tolerant, distributed systems.</li> <li>Experience running serverless inference platforms, doing model bring-up on short notice, being on call, or running a cloud provider is a very big plus</li> <li>Good taste and ability to thoughtfully discuss how what you’ve built has failed over time</li> <li>Experience designing, analyzing and improving efficiency, scalability, and stability of various system resources</li> <li>Excellent understanding o

必須要件

求めるスキル

Python PyTorch CUDA LLM Kubernetes Rust Go C++

勤務条件

勤務時間
雇用形態	正社員・契約社員
勤務地	San Francisco
リモートワーク	不可

Together AI 公式採用ページ掲載求人

この求人に応募する

11日前に掲載

公式ページで応募する

※ 企業の公式採用ページへ移動します

Together AI

Machine Learning, Platform Engineer

仕事内容

必須要件

求めるスキル

勤務条件

この求人に応募する

人気求人

おすすめコンテンツ

Together AI

Machine Learning, Platform Engineer

仕事内容

必須要件

求めるスキル

勤務条件

この求人に応募する

人気求人

おすすめコンテンツ

メールアドレスで無料会員登録

求職者ログイン

掲載企業様の方はこちら

企業様 新規登録

企業ログイン

求職者の方はこちら

パスワードリセット

企業様 パスワードリセット

新しいパスワードを設定

Cookieの使用について

Cookie設定

必須Cookie

分析Cookie

機能Cookie

企業様新規登録

企業様パスワードリセット