Already have an account? Log in
Full-stack/Backend developers LLM focused.
Develop objective criteria for LLM performance
Contribute to training generative AI models by developing evaluation criteria and testing systems for vulnerabilities. You will help ensure outputs meet strict functional requirements.
Why This Role?
Contribute to training the world's most advanced generative systems
Required Skills
Keywords
View Original Description from WeWorkRemotely
Original description from WeWorkRemotely
Headquarters: San Francisco URL: https://clouddevs.com/ Location LATAMs (Americas) and Europe preffered. Do you want to shape the future of autonomous agents like OpenClaw? We collaborate with leading AI organizations to train Large Language Models (LLMs) to function as proactive, multi-step agents. Our projects focus on teaching these systems how to design, coordinate, and optimize complex, real-world architectural workflows. Whether you are a passionate orchestration guru or experienced software developer -- we want you to help us train the world's most advanced generative systems. About the opportunity: Outlier is looking for skilled software experts to help train generative AI models. This freelance role is fully remote and offers flexible hours—you can contribute whenever it fits your schedule. You may contribute your expertise by… Developing objective, verifiable criteria (rubrics) to evaluate system performance and ensure outputs meet strict functional requirements. Reviewing system logs and "trajectories" to refactor code, improve execution paths, and reach a "Golden Path" of perfect reliability Testing systems for vulnerabilities, including improper data exposure, unauthorized access escalations, and edge-case failures. We’re looking for people with… 2+ years of experience in backend engineering, AI automation, or complex systems integration Proven ability to build and maintain production-grade software with modular separation (e.g., distinct services for data parsing, logic processing, and reporting) Strong command of at least two major languages (e.g., Python, JavaScript, Go, or Java) and experience working with SQL databases Practical experience building for live, non-mocked environments and handling multi-turn system interactions. Outstanding attention to detail and the ability to provide clear, high-density technical feedback on complex system behaviors Nice to have… Expertise building multi-stage coordination tasks where data acquisition leads to reasoned output Hands on experience integrating agents with live tools such as Supabase, Gmail, and various APIs to solve real-world problems High level of comfort implementing persistent state and session discovery using MEMORY.md to track agent progress. Experience identifying subtle failures like privacy leaks, authority escalation, or indirect prompt injections. To apply: https://weworkremotely.com/remote-jobs/clouddevs-full-stack-backend-developers-llm-focused
Already have an account? Log in