Senior Backend Engineer (Data, Search, Infrastructure)

Build and optimize data pipelines for academic literature search at scale

You will work across systems that ingest, process, store, and serve a literature database of 250M+ academic papers and user data. This includes building pipelines, optimizing search, handling PDFs at scale, and exposing clean APIs. The role focuses on reliable, data-heavy backend systems in production.

Why This Role?

Work on a literature database of 250M+ academic papers with real-world impact

Key Responsibilities

Build and operate data ingestion pipelines for heterogeneous academic sources
Deploy and maintain services on AWS for scalable backend operations
Design and optimize full-text search systems including indexing and query tuning
Develop and maintain reliable REST APIs for internal and external use
Process PDFs at scale including extraction, transformation, and storage
Ensure data quality through deduplication, consistency, and correctness checks

Requirements

Strong backend engineering background with data-heavy systems in production
Experience deploying and operating services on AWS
Experience designing and maintaining data ingestion pipelines from messy sources
Comfort with web scraping and third-party data APIs
Familiarity with Node.js and TypeScript (or equivalent backend languages)
Solid understanding of full-text search systems and relevance tuning

Required Skills

Node.jsTypeScriptAWSData PipelinesREST APIFull-text Searchbackend engineeringsearch optimizationREST APIsPDF processing

Keywords

senior backend engineerdata infrastructureacademic data systemsAWS pipelinessearch optimizationPDF processing at scale

View Original Description from Jobspresso

Original description from Jobspresso

Paperpile runs on data at scale, with a literature database of 250M+ academic papers and a growing body of user data accumulated over more than a decade. You’ll work across the systems that ingest, process, store, and serve this data reliably: building pipelines, optimizing search, handling PDFs at scale, and exposing clean APIs. Requirements – Strong backend engineering background with experience building and operating data-heavy systems in production. – Experience deploying and operating services on AWS. – Experience designing and maintaining data ingestion pipelines handling messy, heterogeneous sources. Comfortable with web scraping and working with third-party data sources and APIs. – Familiarity with Node.js and TypeScript. It’s fine if you come from a different background, such as Java or Python, but you should be comfortable working in this environment. – High standards for data quality. You think carefully about correctness, deduplication, and consistency. – Solid understanding of full-text search systems including indexing strategy, relevance tuning, and query optimization. – Proficient in building reliable REST APIs. More useful experience – Familiarity with academic publishing formats and data sources (PubMed, Crossref, arXiv…) – Experience with PDF processing pipelines (extraction, transformation, storage and delivery at scale). – Experience with LLM-based document processing or ML pipelines for extracting structured data from unstructured text. – Large scale web crawling and scraping. Benefits – Base compensation €60,000–€90,000 based on the level of your experience – Bonus/equity program. – 4 weeks paid vacation + local holidays. – We sponsor co-working space in your city. – Learn and grow. Try out new things. We sponsor relevant courses, seminars, and conferences.

For you

Skip the ads, unlock 1-click autofill — go Pro →

Apply free

Free account · no credit card · Log in

Pro Rp39k/mo · unlimited applies + AI resume

View 5 similar jobs →

Company

Paperpile

Source

Jobspresso

Job Type

full time