Full-stack/Backend developers LLM focused.
Develop objective rubrics to evaluate LLM agent performance and reliability
Develop objective, verifiable criteria (rubrics) to evaluate system performance and ensure outputs meet strict functional requirements. Review system logs and trajectories to refactor code, improve execution paths, and reach a 'Golden Path' of perfect reliability. Test systems for vulnerabilities including improper data exposure, unauthorized access escalations, and edge-case failures. Contribute to training advanced generative AI systems that...
Why This Role?
Help train the world's most advanced generative systems for autonomous agents like OpenClaw
Key Responsibilities
- Develop objective, verifiable criteria (rubrics) to evaluate system performance
- Review system logs and trajectories to refactor code and improve execution paths
- Test systems for vulnerabilities including improper data exposure and unauthorized access
- Identify edge-case failures and work toward a 'Golden Path' of perfect reliability
- Provide clear, high-density technical feedback on complex system behaviors
- Contribute to training LLMs to function as proactive, multi-step agents
Requirements
- 2+ years of experience in backend engineering, AI automation, or complex systems integration
- Proven ability to build and maintain production-grade software with modular separation
- Strong command of at least two major languages (e.g., Python, JavaScript, Go, or Java)
- Experience working with SQL databases
- Practical experience building for live, non-mocked environments
- Outstanding attention to detail and ability to provide clear technical feedback
Required Skills
Keywords
View Original Description from WeWorkRemotely
Original description from WeWorkRemotely
Headquarters: San Francisco URL: https://clouddevs.com/ Location LATAMs (Americas) and Europe preffered. Do you want to shape the future of autonomous agents like OpenClaw? We collaborate with leading AI organizations to train Large Language Models (LLMs) to function as proactive, multi-step agents. Our projects focus on teaching these systems how to design, coordinate, and optimize complex, real-world architectural workflows. Whether you are a passionate orchestration guru or experienced software developer -- we want you to help us train the world's most advanced generative systems. About the opportunity: Outlier is looking for skilled software experts to help train generative AI models. This freelance role is fully remote and offers flexible hours—you can contribute whenever it fits your schedule. You may contribute your expertise by… Developing objective, verifiable criteria (rubrics) to evaluate system performance and ensure outputs meet strict functional requirements. Reviewing system logs and "trajectories" to refactor code, improve execution paths, and reach a "Golden Path" of perfect reliability Testing systems for vulnerabilities, including improper data exposure, unauthorized access escalations, and edge-case failures. We’re looking for people with… 2+ years of experience in backend engineering, AI automation, or complex systems integration Proven ability to build and maintain production-grade software with modular separation (e.g., distinct services for data parsing, logic processing, and reporting) Strong command of at least two major languages (e.g., Python, JavaScript, Go, or Java) and experience working with SQL databases Practical experience building for live, non-mocked environments and handling multi-turn system interactions. Outstanding attention to detail and the ability to provide clear, high-density technical feedback on complex system behaviors Nice to have… Expertise building multi-stage coordination tasks where data acquisition leads to reasoned output Hands on experience integrating agents with live tools such as Supabase, Gmail, and various APIs to solve real-world problems High level of comfort implementing persistent state and session discovery using MEMORY.md to track agent progress. Experience identifying subtle failures like privacy leaks, authority escalation, or indirect prompt injections. To apply: https://weworkremotely.com/remote-jobs/clouddevs-full-stack-backend-developers-llm-focused
Market data & reports
Salary & skill-demand research built from our own listings data.
- Indonesia IT Jobs vs Global Remote (2026)Primary analysis of 2,049 listings: methodology, classification rules, downloadable datasets.
- AI-Skill Demand: Indonesia vs Global Remote (2026)10,000+ postings, taxonomy-first classifier, Wilson CIs, pre-registered before analysis.
- Indonesia Hiring Report: Tech vs Non-TechJob demand by field from aggregate open-job counts — never individual listings.
- Indonesia Salary BenchmarkAggregate salary ranges across roles, with open methodology and dataset.
- Remote Market Reports by RoleAuto-generated per role family — skills, seniority, companies, salary.