Langsung ke konten utama
Primary Data Analysis — v2·Pre-registered · Taxonomy-validated

AI-Skill Demand: Indonesia vs Global Remote — Data Notes

Supporting AI-Skill Demand Gap: 7 Findings · · Loker Dollar Research

10,000+ job listings. Taxonomy-first classifier with pre-declared regex banks, a 200-row hand-labeled gold set, and Wilson 95% confidence intervals on every proportion. Pre-registered before any results were computed. Not a peer-reviewed study — every claim traces back to one of the three datasets below.

Indonesia Local Dataset · JobStreet + Loker.id + Glints + Kalibrr

indonesia-it-jobs-2026-06.csv (free)

Columns: id, title, company, location, source, task_altitude, seniority, has_any_ai_skill, ai_skill_tags, ai_usage_mode, tools, language_barrier, industry, generalist_specialist

Global Remote Dataset · Contra, WWR, RemoteOK, Remotive, HN + others

global-remote-2026-06.csv (free)

Same schema as Indonesia dataset. D1 corpus snapshot, June 2026.

AI-Skill Demand Summary · Dimension-level proportions with CIs

ai-skill-demand-2026-06.csv (free)

Market-level summary: n_total, n_any_ai_skill, pct, CI lo/hi, per sub-dimension counts.

Pre-Registration

Hypotheses, operational definitions, exclusion rules, and statistical methods were committed to the repository before any results were computed. The pre-registration document is version-controlled at scripts/research/indonesia-it-vs-global-v2-2026-06/PRE-REGISTRATION.md. This is the falsifiability anchor v1 lacked.

Six hypotheses were pre-declared (H1–H6). Each maps to a two-proportion z-test at α = 0.05. Results in the article report which hypotheses were supported and which were not.

Dataset Overview

DatasetBoardsPeriodNotes
Indonesia localJobStreet ID, Loker.id, Glints, KalibrrJune 2026One-time scrape, raw archived
Global remoteContra, WWR, RemoteOK, Remotive, HN, Adzuna, The Muse + othersJune 2026 snapshotLive D1 corpus, frozen export

Classifier Architecture

Taxonomy-first approach

The classifier is taxonomy-first: deterministic regex banks do the bulk of the work. An LLM pass fires only for postings where signals are absent or conflicting for task_altitude and seniority — the two dimensions where short or ambiguous titles can leave the regex banks without a clear signal. Tool/language/AI-skill counts are pure presence matches and never need LLM resolution.

The taxonomy version v2-2026-06 is baked into every classified row. Re-running pnpm research:v2:analyze from the frozen dataset.jsonl reproduces every published figure exactly.

AI-skill taxonomy (six dimensions)

DimensionRepresentative anchors
Agent orchestrationLangChain, AutoGen, CrewAI, LlamaIndex, Haystack, multi-agent
Prompt engineeringprompt engineering, system prompts, few-shot, chain-of-thought
Eval / testing AILLM evaluation, RLHF, model benchmarking, red-teaming, hallucination
RAG & vector DBsRAG, retrieval-augmented generation, Pinecone, Weaviate, pgvector, Qdrant
MLOps & inferencevLLM, BentoML, Ray Serve, ONNX, MLflow, LoRA, model serving
AI governanceresponsible AI, AI ethics, model alignment, guardrails, EU AI Act

Classifier accuracy disclosure

Taxonomy version: v2-2026-06
Gold-set size: 200 stratified, hand-labeled rows (100 Indonesia + 100 global; balanced across seniority, task altitude, and AI-skill presence)
CI gate: overall macro-F1 ≥ 0.80 on the deterministic path (tests/unit/research/classify-posting.test.ts); failing this gate blocks the push.
LLM-path accuracy: measured offline; % of rows resolved by LLM reported in analysis-report.md.

Statistical Methods

Every proportion reported in the article carries a Wilson 95% confidence interval. The Wilson interval is appropriate for proportions near 0 or 1 where the normal approximation is poor. It is hand-computed in scripts/research-v2-analyze.ts — no external stats library (worker bundle budget).

Statistical tests use a two-proportion z-test (two-tailed, α = 0.05). Cells with n < 30 are flagged "directional only" and excluded from hypothesis conclusions. Results are observational — no causal claims.

Exclusion Rules

  • Blue-collar: isBlueCollarTitle(title) — physical-labour roles identified by title patterns (driver, chef, security guard, cleaning, waiter).
  • Non-IT titles: Sales executive, cashier, accounting, HR generalist, content writer, and similar non-technical roles identified by title patterns.
  • Empty titles:Rows with title length < 2.
  • Salary: Excluded from all analysis. Disclosure asymmetry (Indonesia rarely publishes salary ranges; global sources vary) would confound any cross-market comparison.

Reproducibility

The full pipeline is committed to the repository and can be re-run from scratch:

  1. pnpm research:v2:export — export global D1 slice
  2. pnpm exec tsx scripts/research/indonesia-it-vs-global-v2-2026-06/parse.ts — parse Indonesia scrape
  3. pnpm research:v2:build — merge + classify → dataset.jsonl
  4. pnpm research:v2:analyze — analysis report + public CSVs

The Indonesia raw scraped files, parser, gold set, dataset, and CSVs are all committed in the same PR per the Research Retention SOP.

Limitations

  • Non-probability sample. Indonesia-local data comes from listing-page scrapes, not a random sample of all Indonesian IT employers.
  • Listing ≠ hire. A posting reflects stated demand, not actual hiring outcomes.
  • Description quality varies. Indonesia-local listings are often shorter; regex classifiers may under-detect AI skills in brief postings (recall bias against Indonesia — conservative for H1).
  • Single point in time. Snapshot from June 2026; AI tool adoption is changing rapidly.
  • Classifier errors. Overall macro-F1 floor of 0.80 means roughly 20% of cells may be misclassified at the individual-row level. Aggregate proportions are more reliable than individual classifications.

Version History

VersionDateChange
v1May 2026199 Indonesia + 1,010 global. No CIs, no pre-registration, parser lost in /tmp.
v2June 202610,000+ postings. Taxonomy-first classifier. Gold-set validation. Wilson CIs. Two-proportion z-tests. Pre-registered. All raw data and parser committed.

Questions or corrections: contact us. Source code: github.com/kelvindesman/lokerdollar.com.