Skip to main content
Back to Jobs

Software Engineer (Ray Core)

Build and optimize Ray's distributed systems for scalable ML applications

Develop and maintain Ray's C++ backend, focusing on performance, reliability, and scalability. Work on optimizing large-scale workloads, improving fault tolerance, and enhancing stability. Contribute to open-source software while mentoring junior team members.

Why This Role?

Directly impact Ray's distributed systems used by companies like OpenAI and Uber

Key Responsibilities

  • Optimize performance of large-scale workloads on Ray
  • Improve fault tolerance and high availability features
  • Develop and maintain Ray's C++ backend
  • Mentor junior team members on distributed systems
  • Contribute to open-source software development

Requirements

  • Experience with systems software
  • Knowledge of C++
  • Understanding of distributed systems
  • Experience with Ray or similar frameworks
  • Strong problem-solving skills

Required Skills

c++distributed systemssystems softwaretestingdebuggingRayOpen Source Development

Keywords

Software EngineerRay CoreDistributed SystemsC++Machine LearningOpen Source
View Original Description from Ashby Job Boards

Original description from Ashby Job Boards

About Anyscale: At Anyscale https://www.anyscale.com/, we're on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We’re commercializing Ray https://docs.ray.io/en/latest/, a popular open-source project that's creating an ecosystem of libraries for scalable machine learning. Companies like OpenAI https://thenewstack.io/how-ray-a-distributed-ai-framework-helps-power-chatgpt/, Uber https://www.uber.com/blog/horovod-ray/, Spotify https://engineering.atspotify.com/2023/02/unleashing-ml-innovation-at-spotify-with-ray/, Instacart https://www.youtube.com/watch?v=3t26ucTy0Rs&list=PLzTswPQNepXmLUiL4F_1VHrPcCz1OeILw&index=23&pp=iAQB, Cruise https://www.youtube.com/watch?v=gj0BqvfX_wI&list=PLzTswPQNepXmLUiL4F_1VHrPcCz1OeILw&index=46&pp=iAQB, and many more, have Ray in their tech stacks to accelerate the progress of AI applications out into the real world. With Anyscale, we’re building the best place to run Ray, so that any developer or data scientist can scale an ML application from their laptop to the cluster without needing to be a distributed systems expert. Proud to be backed by Andreessen Horowitz, NEA, and Addition https://www.wsj.com/articles/ai-startup-anyscale-adds-99-million-to-andressen-horowitz-led-funding-round-11661254200 with $250+ million raised to date. About the role Ray aims to provide a universal API for building distributed applications. To achieve this goal requires a distributed system with high levels of performance and reliability. We're looking for engineers with systems software experience that are interested in contributing to the Ray backend. About the Ray Core Team The Ray Core team develops and maintains the Ray C++ backend (e.g., distributed scheduler, language runtime integration, I/O and memory subsystems). We are responsible for the reliability, scalability, and performance of Ray as well as ensuring that Ray provides the right feature set to support higher level libraries and use cases. The team works on a balance of new features / distributed libraries, test infra improvements, debugging, and longer-term architectural improvements to Ray. A snapshot of projects you can work on: - Optimizing performance of large-scale workloads on Ray - Stability and stress testing infrastructure - Improving fault tolerance (HA) As part of this role, you will: - Leading cross-team projects while mentoring junior team members - Develop high quality open source software to simplify distributed programming (Ray) - Identify, implement, and evaluate architectural improvements to Ray core - Improve the testing process for Ray to make releases as smooth as possible - Communicate your work to a broader audience through talks, tutorials, and blog posts We'd love to hear from you if you have: - At least 5 year of relevant work experience - Experience in building scalable and fault-tolerant distributed systems - Extensive experience working in C/C++ and on low level operating systems - Solid background in algorithms, data structures, system design - Knowledge of distributed model training and inference (e.g. tensor parallel, pipeline parallel) is preferred - Knowledge of GPU programming is preferred Anyscale Inc. is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law.  Anyscale Inc. is an E-Verify company and you may review the Notice of E-Verify Participation https://drive.google.com/file/d/1Kt2S6_k_SjxaEdGowH4rngVdg2ApAQV3/view?usp=sharing and the Right to Work posters in English and Spanish https://drive.google.com/file/d/1K3Nz72xgsU2hngnVUEu53wEeZjbAMbnZ/view?usp=sharing

Apply free

Free account · no credit card · Log in

Pro Rp39k/mo · unlimited applies + AI resume

Company
Anyscale
Source
Ashby Job Boards
Job Type
full time
Location
Remote
Category
Seniority
mid
Posted
May 14, 2026

Share this job

Help a friend find their next remote role.

Market data & reports

Salary & skill-demand research built from our own listings data.

Apply free

Free account · no credit card · Log in

Pro Rp39k/mo · unlimited applies + AI resume