Senior Backend Engineer (Data, Search, Infrastructure)
Build and optimize data pipelines for academic literature search at scale
You will work across systems that ingest, process, store, and serve a literature database of 250M+ academic papers and user data. This includes building pipelines, optimizing search, handling PDFs at scale, and exposing clean APIs. The role focuses on reliable, data-heavy backend systems in production.
Why This Role?
Work on a literature database of 250M+ academic papers with real-world impact
Key Responsibilities
- Build and operate data ingestion pipelines for heterogeneous academic sources
- Deploy and maintain services on AWS for scalable backend operations
- Design and optimize full-text search systems including indexing and query tuning
- Develop and maintain reliable REST APIs for internal and external use
- Process PDFs at scale including extraction, transformation, and storage
- Ensure data quality through deduplication, consistency, and correctness checks
Requirements
- Strong backend engineering background with data-heavy systems in production
- Experience deploying and operating services on AWS
- Experience designing and maintaining data ingestion pipelines from messy sources
- Comfort with web scraping and third-party data APIs
- Familiarity with Node.js and TypeScript (or equivalent backend languages)
- Solid understanding of full-text search systems and relevance tuning
Required Skills
Keywords
View Original Description from Jobspresso
Original description from Jobspresso
Paperpile runs on data at scale, with a literature database of 250M+ academic papers and a growing body of user data accumulated over more than a decade. You’ll work across the systems that ingest, process, store, and serve this data reliably: building pipelines, optimizing search, handling PDFs at scale, and exposing clean APIs. Requirements – Strong backend engineering background with experience building and operating data-heavy systems in production. – Experience deploying and operating services on AWS. – Experience designing and maintaining data ingestion pipelines handling messy, heterogeneous sources. Comfortable with web scraping and working with third-party data sources and APIs. – Familiarity with Node.js and TypeScript. It’s fine if you come from a different background, such as Java or Python, but you should be comfortable working in this environment. – High standards for data quality. You think carefully about correctness, deduplication, and consistency. – Solid understanding of full-text search systems including indexing strategy, relevance tuning, and query optimization. – Proficient in building reliable REST APIs. More useful experience – Familiarity with academic publishing formats and data sources (PubMed, Crossref, arXiv…) – Experience with PDF processing pipelines (extraction, transformation, storage and delivery at scale). – Experience with LLM-based document processing or ML pipelines for extracting structured data from unstructured text. – Large scale web crawling and scraping. Benefits – Base compensation €60,000–€90,000 based on the level of your experience – Bonus/equity program. – 4 weeks paid vacation + local holidays. – We sponsor co-working space in your city. – Learn and grow. Try out new things. We sponsor relevant courses, seminars, and conferences.
Market data & reports
Salary & skill-demand research built from our own listings data.
- Indonesia IT Jobs vs Global Remote (2026)Primary analysis of 2,049 listings: methodology, classification rules, downloadable datasets.
- AI-Skill Demand: Indonesia vs Global Remote (2026)10,000+ postings, taxonomy-first classifier, Wilson CIs, pre-registered before analysis.
- Indonesia Hiring Report: Tech vs Non-TechJob demand by field from aggregate open-job counts — never individual listings.
- Indonesia Salary BenchmarkAggregate salary ranges across roles, with open methodology and dataset.
- Remote Market Reports by RoleAuto-generated per role family — skills, seniority, companies, salary.