仕事内容
<p data-pm-slice="1 1 []">GAQ127R40 </p>
<p><strong>Team:</strong> IT Infrastructure and Operations</p>
<h3><strong>About the Role</strong></h3>
<p>At Databricks Information Technology, we are a product-led organization transforming how we work—from the ease of using our IT services to the applications we develop to scale seamlessly during rapid growth.</p>
<p>As a <strong>Site Reliability Engineer (SRE)</strong>, you will bridge the gap between software engineering and systems architecture. You will be a core contributor to the IT Infrastructure team, owning the evolution of core infrastructure and observability platforms. This role requires a strong software engineering mindset and deep technical breadth to deliver high-quality, scalable solutions for "immature" system problems. Your focus will be on building resilient, automated infrastructure that empowers development teams and ensures our cloud environment is cost-optimized, secure, and highly available.</p>
<h3><strong>The Impact You Will Have</strong></h3>
<ul>
<li><strong>Architect and Automate:</strong> Design and deploy production-grade infrastructure on cloud platforms (AWS/Azure) using Infrastructure as Code (IaC) tools like Terraform or Pulumi.</li>
<li><strong>Reliability and Performance Engineering:</strong>Optimize system performance, architecture, and scaling to ensure maximum uptime and minimal latency for critical IT services.</li>
<li><strong>CI/CD Excellence:</strong> Architect robust deployment pipelines (e.g., GitHub Actions), managing both hosted and self-hosted runners for specialized build requirements.</li>
<li><strong>Observable by Default:</strong> Create underlying infrastructure to ensure new internal applications are secure and have logging, metrics and alerts enabled by default.</li>
<li><strong>Agentic ToolingI:</strong> Build internal AI plugins, and automation scripts to streamline developer workflows and enhance operational efficiency.</li>
<li><strong>Incident Response:</st
求めるスキル
Python
Kubernetes
Docker
AWS
GCP
Azure