Full-stack/Backend developers LLM focused.

Develop objective rubrics to evaluate LLM agent performance and reliability

Develop objective, verifiable criteria (rubrics) to evaluate system performance and ensure outputs meet strict functional requirements. Review system logs and trajectories to refactor code, improve execution paths, and reach a 'Golden Path' of perfect reliability. Test systems for vulnerabilities including improper data exposure, unauthorized access escalations, and edge-case failures. Contribute to training advanced generative AI systems that...

Why This Role?

Help train the world's most advanced generative systems for autonomous agents like OpenClaw

Key Responsibilities

Develop objective, verifiable criteria (rubrics) to evaluate system performance
Review system logs and trajectories to refactor code and improve execution paths
Test systems for vulnerabilities including improper data exposure and unauthorized access
Identify edge-case failures and work toward a 'Golden Path' of perfect reliability
Provide clear, high-density technical feedback on complex system behaviors
Contribute to training LLMs to function as proactive, multi-step agents

Requirements

2+ years of experience in backend engineering, AI automation, or complex systems integration
Proven ability to build and maintain production-grade software with modular separation
Strong command of at least two major languages (e.g., Python, JavaScript, Go, or Java)
Experience working with SQL databases
Practical experience building for live, non-mocked environments
Outstanding attention to detail and ability to provide clear technical feedback

Required Skills

PythonJavaScriptGoJavaSQLAI Automationbackend engineeringLLM trainingsystems integrationrubric developmentvulnerability testingtechnical feedback

Keywords

LLM agent trainingbackend developerAI automationrubric developmentsystem reliabilitygenerative AIremote worksoftware engineering

View Original Description from WeWorkRemotely

Original description from WeWorkRemotely

Headquarters: San Francisco URL: https://clouddevs.com/ Location LATAMs (Americas) and Europe preffered. Do you want to shape the future of autonomous agents like OpenClaw? We collaborate with leading AI organizations to train Large Language Models (LLMs) to function as proactive, multi-step agents. Our projects focus on teaching these systems how to design, coordinate, and optimize complex, real-world architectural workflows. Whether you are a passionate orchestration guru or experienced software developer -- we want you to help us train the world's most advanced generative systems. About the opportunity: Outlier is looking for skilled software experts to help train generative AI models. This freelance role is fully remote and offers flexible hours—you can contribute whenever it fits your schedule. You may contribute your expertise by… Developing objective, verifiable criteria (rubrics) to evaluate system performance and ensure outputs meet strict functional requirements. Reviewing system logs and "trajectories" to refactor code, improve execution paths, and reach a "Golden Path" of perfect reliability Testing systems for vulnerabilities, including improper data exposure, unauthorized access escalations, and edge-case failures. We’re looking for people with… 2+ years of experience in backend engineering, AI automation, or complex systems integration Proven ability to build and maintain production-grade software with modular separation (e.g., distinct services for data parsing, logic processing, and reporting) Strong command of at least two major languages (e.g., Python, JavaScript, Go, or Java) and experience working with SQL databases Practical experience building for live, non-mocked environments and handling multi-turn system interactions. Outstanding attention to detail and the ability to provide clear, high-density technical feedback on complex system behaviors Nice to have… Expertise building multi-stage coordination tasks where data acquisition leads to reasoned output Hands on experience integrating agents with live tools such as Supabase, Gmail, and various APIs to solve real-world problems High level of comfort implementing persistent state and session discovery using MEMORY.md to track agent progress. Experience identifying subtle failures like privacy leaks, authority escalation, or indirect prompt injections. To apply: https://weworkremotely.com/remote-jobs/clouddevs-full-stack-backend-developers-llm-focused

For you

Skip the ads, unlock 1-click autofill — go Pro →

Apply free

Free account · no credit card · Log in

Pro Rp39k/mo · unlimited applies + AI resume

Remote-friendly · fits your timezone

Company

CloudDevs

Source

WeWorkRemotely

Job Type

full time