Together AI

公式採用ページ

Staff Engineer, Distributed Storage and HPC & AI Infrastructure

0万円〜 0万円

San Francisco

正社員・契約社員

経験年数：

仕事内容

<h3><strong>About the Role</strong></h3> <p>In this role, you will operate, scale, and optimize multi-petabyte storage systems purpose-built for the world’s largest AI training and inference workloads. You’ll manage and scale high-performance parallel filesystems and object stores, evaluate and integrate cutting-edge technologies such as Vast, Weka, Ceph, and Lustre, and solve the complex engineering challenges of operating at extreme throughput, low-latency data paths, and massive cluster-scale storage operations. </p> <p>You will also build Kubernetes-native storage operators and self-service platforms that provide automated provisioning, strict multi-tenancy, performance isolation, and quota enforcement at cluster scale. Day-to-day, you’ll optimize end-to-end data paths for 10-50 GB/s per node, design multi-tier caching architectures, implement intelligent prefetching and model-weight distribution, and tune parallel filesystems for AI workloads. </p> <h3><strong>Responsibilities</strong></h3> <ul> <li>Architect and implement the technical strategy and storage roadmap for Together AI, driving high-performance architectural decisions as we scale our GPU fleet.</li> <li>Engineer and scale multi-petabyte AI/ML storage systems by integrating Vast, Weka, and Ceph while executing deep cost optimization through automated tiering and lifecycle policies.</li> <li>Develop intelligent caching and tiered storage architectures to achieve extreme IOPS and cluster-wide throughput at GPU scale for training and inference workloads.</li> <li>Tune storage isolation at the L2/L3 network layers to ensure secure, production-grade multi-tenancy for storage clients.</li> <li>Code Kubernetes storage operators and controllers to enable automated provisioning, self-service abstractions, and quota enforcement.</li> <li>Engineer end-to-end data paths to achieve 10+ GB/s per GPU node; architect multi-tier caching for model weights and datasets; tune parallel filesystems using advance

必須要件

求めるスキル

Python Kubernetes

勤務条件

勤務時間
雇用形態	正社員・契約社員
勤務地	San Francisco
リモートワーク	不可

Together AI 公式採用ページ掲載求人

この求人に応募する

11日前に掲載

公式ページで応募する

※ 企業の公式採用ページへ移動します

Together AI

Staff Engineer, Distributed Storage and HPC & AI Infrastructure

仕事内容

必須要件

求めるスキル

勤務条件

この求人に応募する

人気求人

おすすめコンテンツ

Together AI

Staff Engineer, Distributed Storage and HPC & AI Infrastructure

仕事内容

必須要件

求めるスキル

勤務条件

この求人に応募する

人気求人

おすすめコンテンツ

メールアドレスで無料会員登録

求職者ログイン

掲載企業様の方はこちら

企業様 新規登録

企業ログイン

求職者の方はこちら

パスワードリセット

企業様 パスワードリセット

新しいパスワードを設定

Cookieの使用について

Cookie設定

必須Cookie

分析Cookie

機能Cookie

企業様新規登録

企業様パスワードリセット