Head of Observability
You'll own the observability strategy end-to-end: the internal platform our teams rely on for incident response and reliability, and the customer-facing product that gives…
Job Description
Join the VRChat Team! VRChat offers a first-of-its-kind, game-changing platform that provides an endless collection of social VR experiences and gives the power of creation to its robust community. With over 250,000 worlds and growing, VRChat’s vision is to allow users to bring their imaginations to life and help shape the metaverse anywhere in the world on any device. VRChat has raised $100M to date with the support of investors Makers Fund, Anthos Capital and HTC. We have a great team which includes people from: Netflix, Twitter, Meta, Microsoft, Roblox, Google, Amazon, Unity, Spotify, Discord, Uber, eBay, Robinhood, Twitch, Zynga and TikTok. Come and join the mission!
We are looking for a Senior/Staff Platform Engineer to help improve the reliability, performance, and scalability of our production platform.
This role focuses on operating reliable infrastructure, improving observability, driving incident response, and using data-driven reliability practices such as SLIs, SLOs, SLAs, error budgets, and DORA metrics. Database experience with MongoDB, Elasticsearch, or Redis is a must.
Help us run and secure our platform that allows our users to connect and create their part of the VRChat universe. If you’re interested in keeping the machinery behind the scenes humming and finely tuned, then this role could be right up your alley.
The role reports to the Head of Platform at VRChat. This Engineer will work closely with the IT and Engineering teams, as well as the heads of various functions to plan and deploy infrastructure.
Operate and improve production infrastructure with a focus on reliability, security, performance, and cost efficiency.
Define, measure, and improve reliability using SLIs, SLOs, SLAs, error budgets, and DORA metrics.
Build and improve monitoring, alerting, dashboards, logging, and incident response processes.
Participate in incident management, root cause analysis, postmortems, and follow-up remediation.
Automate infrastructure and operational workflows using modern IaC and scripting tools.
Work closely with engineering teams to improve service reliability, deployment quality, and operational readiness.
Turn ambiguous infrastructure, reliability, and operational problems into clear, scalable, and measurable solutions.
Engage with backend codebases through code reviews, pull requests, and occasional feature or tooling work to build shared context with product engineering teams.
8+ years of experience in SRE, DevOps, Platform Engineering, or Infrastructure Engineering.
Strong experience operating high-availability production systems.
Experience with cloud or hybrid cloud environments and tools such as Terraform or OpenTofu.
Strong knowledge of Linux, networking, automation, observability, and incident management.
Strong communication skills and ability to work with technical and non-technical stakeholders.
Operational knowledge of databases such as MongoDB, Elasticsearch, or Redis.
Experience with AWS, including core infrastructure services, cost optimization, and multi-account architecture.
Experience with Kubernetes, including networking, service discovery, ingress, and workload reliability.
Experience with Cilium or other Kubernetes networking/security solutions.
Experience supporting large-scale storage systems.
Experience with CDNs, caching, distributed systems, or real-time platforms.
Work from anywhere! VRChat is a 100% remote company
Health Benefits
401K for US & RRSP for Canadian Employees
Stock Options
Generous paid holiday schedule
Unlimited/Flexible vacation time
Paid parental leave benefits
Benefits & perks
Remote-First Work
Flexible Work
Global Team Collaboration
About the company
Vrchat
View all open roles ↗
More Engineering remote jobs
Head of Observability
You'll own the observability strategy end-to-end: the internal platform our teams rely on for incident response and reliability, and the customer-facing product that gives…
Senior Manager - Technical Program Management
You'll report to the Head of Product, Engineering, and Design and partner directly with the heads of Customer Success, Support, GTM, and PMM. You'll manage a small team of senior…
Senior Engineering Manager
We work on turning radical new ideas in the fraud detection space into reality. Our products are developer-focused and our clients range from solo developers to publicly traded…
Engineering Manager, Identification Accuracy
We work on turning radical new ideas in the fraud detection space into reality. Our products are developer-focused and our clients range from solo developers to publicly traded…
Engineering Manager - Solutions Engineering
We want to transform the world of software operations by enabling true model-driven operations via next-generation infrastructure-as-code. This will allow companies to run very…
Web Frontend Engineer - JS, CSS, React, Flutter
We are hiring a Web/UI Engineer to develop a data-rich and reliable user experience. These frontends are constructed using JS, CSS, React, and Flutter, and serve as Canonical’s…