Case Study · Open Source

A primary-source intelligence platform for cross-border biotech

Biotech Intelligence translates Mandarin regulatory filings and corporate registries into structured, typed, queryable intelligence — then serves it as statically-rendered pages and an interactive ownership graph. This page is the engineering story behind it.

See the live graph

Next.js 14 RSCTypeScriptSQLitePython pipelineMDXTailwind

The Problem

The intelligence that moves cross-border deals is locked in Mandarin primary sources

Cross-border biotech licensing between Asian and Western companies reached $137.7 billion in 2025 — 38% of Big Pharma's $50M+ transactions in the first half. Yet the data that actually de-risks those deals — CDE drug filings, NMPA announcements, ultimate beneficial ownership, VIE structures, BIOSECURE exposure — lives in Chinese-language regulatory systems and corporate registries (Tianyancha, GSXT).

Deal teams are left stitching together machine translations and stale secondary reporting. There was no single, structured, primary-source view that both sides of a transaction could read from the same page. So I built one — and made it free and open source.

What had to be true

Read the primary sources in the original language — not machine glosses.
Normalise messy filings into one structured, queryable model.
Make ownership and deal relationships explorable, not buried in prose.

What the platform does

One platform, four primary-source data products

Weekly briefings

16 long-form MDX issues translating the week's CDE filings and deals.

Deal tracker

Licensing terms benchmarked against comparable transactions, sortable by value.

Company profiles

Pipeline, ownership chains, and BIOSECURE status from corporate registries.

BIOSECURE tracker

Enforcement timeline and BCC list, joined to the entities under analysis.

Architecture & Data Flow

From a Mandarin filing to a typed, statically-rendered page

A Python pipeline does the heavy ingestion offline; the web tier stays a thin, fast, typed read layer. The database is exported to static JSON at build time, so production never depends on a writable filesystem and pages render instantly.

Scrape
CDE filings, NMPA, corporate registries, deal news
Translate + Enrich
Mandarin → English, entity resolution, BIOSECURE tagging
SQLite
entities · deals · filings · subsidiaries · trials
Typed API
app/api/intel/* — rate-limited, cached JSON
Render
Static RSC pages + interactive force graph

Static surface

42 server-rendered pages

Briefings, deals, company profiles and the BIOSECURE tracker, pre-rendered for speed and SEO.

Interactive surface

Corporate-ownership graph

A client-side force simulation hydrated from the typed graph endpoint.

Build-time export

A prebuild step exports SQLite to static JSON in public/data/, so the serverless tier reads data without a writable disk.

Resilient fetches

Every external call is wrapped in try/catch with a graceful fallback and hourly ISR — the build can never fail on a flaky API.

One source of truth

The curated-entity filter and slug rule live in one module each, shared by SQL, API routes, and link builders — so a row always resolves to a real page.

The Tech Stack

Typed end-to-end, from the database row to the rendered cell

Next.js 14

App Router · React Server Components · ISR

TypeScript

End-to-end typing, DB rows to API to UI

Tailwind CSS

Editorial design system, dual light/dark theme

SQLite · better-sqlite3

Read-only intel store, prepared statements

MDX

16 long-form briefings as typed, queryable content

Python

Scrape + translate + enrich intel pipeline

Vitest

20 test files: lib, components, API, integration

Vercel

Static export + serverless API, hourly revalidation

No runtime UI framework beyond React; the interactive graph is a hand-built force simulation rather than a heavyweight charting dependency.

Engineering Highlights

Three things worth opening the repo for

Corporate-ownership force graph

A client-rendered, physics-based network of biotech entities, their subsidiaries, VIE structures, and deal relationships — colour-coded by BIOSECURE exposure. Hydrated from a typed graph endpoint and ships its own loading skeleton so a dense full-screen UI never flashes in cold.

Open the graph

BIOSECURE compliance tracker

A primary-source timeline of the BIOSECURE Act, the 1260H list, and BCC designations, joined against the entity store so each tracked company carries a live compliance status. The same curated-entity filter drives the tracker, the graph, and every entity page from one place.

View the tracker

Test suite + typed data layer

Twenty Vitest files cover the data layer, components, the intel API, and end-to-end content integrity — slug derivation parity between JS and SQL, deal-value math, markdown sanitisation, and SEO invariants. better-sqlite3 row generics keep query results typed instead of cast to any.

Read the source

Built & Maintained By

Antony Tan

Computational biologist · Software engineer

Antony holds an MS in Computational Biology from the Harvard T.H. Chan School of Public Health and conducted research at the Broad Institute of MIT & Harvard, with a publication at NeurIPS 2025. BS in Computer Science from the University of Toronto.

Fluent in English, Mandarin, and Cantonese, Antony reads CDE filings, NMPA documents, and corporate registries (Tianyancha, GSXT) in their original language — which is what makes the primary-source methodology behind this platform possible. This project pairs that domain fluency with the full-stack engineering above.

GitHub LinkedIn Portfolio Google Scholar

Credentials

MS Computational Biology — Harvard T.H. Chan
Researcher — Broad Institute of MIT & Harvard
Publication — NeurIPS 2025
BS Computer Science — University of Toronto
Languages — English · Mandarin · Cantonese

Editorial Methodology

Every briefing is built from primary sources

The engineering exists to serve a rigorous editorial process. The pipeline gathers, but a human with computational-biology training does the reading and the judgement.

1. SOURCE

Monitor CDE filings, NMPA announcements, Chinese corporate registries (Tianyancha, GSXT), and deal announcements in both English and Chinese media.

2. TRANSLATE

Filings are read by a native Mandarin speaker with computational-biology training — not machine translation. Terminology is checked against pharmacological and regulatory standards.

3. ANALYZE

Deal terms are benchmarked against comparable transactions. Corporate structures are mapped through holding-company registries. BIOSECURE exposure is assessed against BCC designation criteria.

4. DELIVER

Analysis is structured for business-development and compliance professionals, closing with concrete takeaways relevant to current deal and regulatory activity.

Independence disclosure: Biotech Intelligence has no financial relationships with any company, institution, or government entity covered. The author holds no stock positions in any company covered. All analysis is independent and cites its sources. This is research, not investment advice, legal counsel, or policy advocacy.

The whole platform is open source

Read the code, fork the pipeline, or build on it. Free to read, free to build on — no account, no paywall.

Read the briefings

Open source · MIT licensed