Senior Machine Learning Engineer - Multimodal Data
Bangun pipeline data untuk pelatihan agent multimodal di Canva
Sebagai Senior Machine Learning Engineer, Anda akan merancang dan membangun pipeline data untuk pelatihan agent multimodal di Canva. Tugas Anda termasuk mengumpulkan, memfilter, dan memformat data dari berbagai sumber, serta memastikan kualitas data yang dapat diandalkan. Anda akan bekerja sama dengan tim riset untuk memahami kebutuhan data dan mengembangkan sistem yang dapat diandalkan dan dapat diskalakan. Anda juga akan bertanggung jawab untuk
Kenapa Menarik?
Bergabung dengan Canva untuk berkontribusi pada pengembangan AI yang memungkinkan siapa saja untuk menciptakan desain dengan percaya diri.
Tanggung Jawab Utama
- Merancang dan membangun pipeline data untuk pelatihan agent, termasuk pengumpulan, penyaringan, deduplikasi, dan pemformatan data dari berba
- Membangun dan memelihara infrastruktur untuk pemuatan, penyimpanan, dan pengambilan data secara efisien dan dapat diskalakan
- Bekerja sama dengan ilmuwan data untuk menerjemahkan kebutuhan riset menjadi spesifikasi data konkret
- Membuat dataset evaluasi dan benchmark dalam kolaborasi dengan riset, dengan mendistribusikan tugas yang mengungkapkan mode kegagalan sebena
- Mengembangkan alat untuk konstruksi dataset, termasuk alur kerja anotasi manusia, generasi data sintetis, dan pengumpulan data preferensi un
Persyaratan
- Pengalaman dalam bekerja dengan tim riset untuk memahami kebutuhan data dan mengembangkan sistem yang dapat diandalkan dan dapat diskalakan
Skills Wajib
Keywords
Lihat Deskripsi Asli dari SmartRecruiters
Deskripsi asli dari SmartRecruiters
At Canva, our mission is to empower the world to design. We’re building AI that feels magical and lands real impact for millions of people - helping anyone create with confidence. We're looking for a Machine Learning Engineer to own the data foundations that power our multimodal agent research—building the pipelines, datasets, and tooling that turn ambitious research ideas into trainable reality. About the team We explore multimodal agentic architectures, build scalable training and evaluation loops, and partner closely with product and platform teams to turn breakthroughs into delightful product features. We are a cutting-edge post-training team, developing new multimodal agentic systems. We work on all topics of multimodal modelling, post-training and design agents, we build scalable training and evaluation loops, and partner closely with product and platform teams to turn breakthroughs into delightful product features. About the role You'll be responsible for the data lifecycle that fuels our agent research: from collection and curation through to preprocessing, quality assurance, and delivery into training pipelines. You'll work closely with research scientists to understand what data is needed, then design and build the systems to make it happen—reliably and at scale. You'll have significant autonomy over how data problems get solved, while aligning on what problems matter most with the broader team. What you'll do Design and build data pipelines for agent training: collection, filtering, deduplication, formatting, and versioning across text, image, and multimodal sources. Build and maintain infrastructure for efficient data loading, storage, and retrieval at scale (S3, distributed systems, streaming pipelines). Collaborate with research scientists to translate research requirements into concrete data specifications, and iterate as experiments reveal new needs. Create evaluation datasets and benchmarks in collaboration with researchers—curating task distributions that surface real failure modes. Develop tooling for dataset construction—including human annotation workflows, synthetic data generation, and preference data collection for RLHF/DPO-style training. Own data quality: build validation frameworks, monitor for drift and contamination, and establish standards that make datasets trustworthy and reproducible. Document datasets thoroughly: provenance, known limitations, intended use cases, and versioning history. Implement comprehensive test coverage for data pipelines and ML workflows, ensuring reliability and catching regressions early. Elevate codebase quality through code reviews, refactoring, and establishing engineering best practices that help research velocity scale sustainably. Contribute to team roadmaps by identifying data bottlenecks and proposing solutions that unblock research velocity. You're likely a match if you have Strong software engineering skills in Python, with experience building production-grade data pipelines and ML DevOps. Practical experience with prompt engineering—designing, testing, and refining prompts for reliable LLM/VLM outputs. Experience with ML data workflows: large-scale data processing and loading (Ray, or similar), data versioning, and format considerations for training (tokenization, batching, sharding). Hands-on experience working with data pipelines for large-scale distributed ML training runs. Familiarity with annotation tooling and human-in-the-loop data collection (Label Studio or internal systems). Understanding of ML training requirements—you know what "good data" looks like for LLM/VLM fine-tuning and can anticipate downstream issues. Experience loading and writing large datasets to/from cloud infrastructure (AWS) and distributed storage systems. Strong communication skills: you can work with researchers to scope ambiguous problems and translate needs into actionable plans. A collaborative approach, comfortable taking ownership and iterating quickly. Nice to have Experience with preference data collection for RLHF or reward modelling. Familiarity with multimodal data (image-text pairs, video, design assets). Experience building synthetic data generation pipelines using LLMs. Background in data quality metrics and monitoring systems. Contributions to dataset releases or benchmarks in the ML community. What's in it for you? Achieving our crazy big goals motivates us to work hard - and we do - but you'll experience lots of moments of magic, connectivity and fun woven throughout life at Canva, too. We also offer a range of benefits to set you up for every success in and outside of work. Here's a taste of what's on offer: Equity packages - we want our success to be yours too Inclusive parental leave policy that supports all parents & carers An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more Flexible leave options that empower you to be a force for good, take time to recharge and supports you personally Check out lifeatcanva.com for more info. Other stuff to know We make hiring decisions based on your experience, skills and passion, as well as how you can enhance Canva and our culture. When you apply, please tell us the pronouns you use and any reasonable adjustments you may need during the interview process. We celebrate all types of skills and backgrounds at Canva so even if you don’t feel like your skills quite match what’s listed above - we still want to hear from you! Please note that interviews are conducted virtually.
Data & laporan pasar
Riset gaji & permintaan skill dari data lowongan kami sendiri.
- Lowongan IT Indonesia vs Remote Global (2026)Analisis data primer 2.049 lowongan: metodologi, klasifikasi, dataset bisa diunduh.
- Permintaan Skill AI: Indonesia vs Global (2026)10.000+ lowongan, classifier taxonomy-first, Wilson CI, pra-registrasi sebelum analisis.
- Laporan Hiring Indonesia: Tech vs Non-TechPermintaan lowongan per bidang dari hitungan agregat — bukan listing per-listing.
- Benchmark Gaji IndonesiaKisaran gaji agregat lintas peran, dengan metodologi dan dataset terbuka.
- Laporan Pasar Remote per PeranLaporan otomatis per kelompok peran — skill, senioritas, perusahaan, gaji.