Antony Tan

MODEL.FORWARD() · LOSS.BACKWARD() · OPTIMIZER.STEP() · EPOCH++ · GRADIENT · ATTENTION · TRANSFORMER · EMBEDDING · LATENT SPACE

About

I build software that solves hard problems. Lately that's ML and healthcare, but the engineering is what I love.

I like building things and I like hard problems. Right now that looks like full-stack platforms, deep learning on large-scale data, and getting systems into production. I care a lot about clean architecture and writing code that actually ships.

I'm finishing my MS in Computational Biology at Harvard, cross-registered at MIT EECS, and doing research at the Broad Institute and Harvard Medical School. Day to day I build full-stack applications, write ML pipelines, and work with large-scale data. The kind of problems where good engineering and good science have to work together.

Before Harvard, I studied CS and Biology at the University of Toronto, shipped production software at RBC, and spent three years teaching CS. I speak English, Cantonese, and Mandarin.

Engineering

Python, TypeScript, Java, SQL. React, Next.js, Node. AWS, GCP, Docker, PostgreSQL, Supabase. Production systems at scale.

Machine Learning

PyTorch, Transformers, foundation models, interpretable ML (SHAP), scikit-learn, multi-agent LLM systems

Data & Infrastructure

Large-scale data pipelines, ETL, REST APIs, NLP, real-time streaming, multi-source data integration

Domain: Healthcare & Bio

Proteomics, multi-omics, clinical data, drug target validation, pathway analysis

Publications

Nature Communications (in revision)

Evaluating Individual Level Performance of Polygenic Risk Scores Using Early Onset High Genetic Risk Coronary Artery Disease as a Benchmark

S. Liang*, M.S. Kim*, Y. Sui, Y. Tan, ..., A.C. Fahed*, Z. Yu*

Nature Communications (in revision)

Machine Learning Cross-Platform Proteomic Imputation Enables Protein Quality Scoring and Replication of Epidemiological Associations

L. Li, A. Alaa, Y. Tan, I. Demirel, S. Friedman, ..., A. Philippakis, P. Natarajan, Z. Yu

NeurIPS 2025 — ML & Physical Sciences Workshop

AstroCo: Self-Supervised Conformer-Style Transformers for Light-Curve Embeddings

Antony Tan, P. Protopapas, M. Cádiz-Leyton, G. Cabrera-Vives, C. Donoso-Oliva, I. Becker

Open-source model

SIGCSE 2024

A Systematic Literature Mapping of COVID-19 Papers in CS Education

B. Harrington, ..., Y. Tan, et al.

Education

MS Computational Biology & Quantitative Genetics

Harvard T.H. Chan School of Public Health

GPA 4.0/4.0 · Thesis: Biologically Informed Neural Networks for Proteomic Pathway Discovery · Advised by Zhi Yu, PhD (Broad / HMS / MGH)

Cross-registered, EECS

MIT

GPA 5.0/5.0 · Graduate ML, Generative AI in Computational Biology, Clinical Data Learning

BSc Computer Science Specialist (Software Eng.), Biology Major, Statistics Minor

University of Toronto

Azure DP-900 Certified · 3 years CS Teaching Assistant

Experience

Oct 2025 – Now

ML Researcher, Proteomics

Harvard Medical School / Broad Institute / MGH — Yu Lab

May 2025 – Now

ML Research Assistant, Foundation Models

Harvard SEAS — Protopapas Lab

Sep 2021 – Jan 2026

Teaching Fellow & Teaching Assistant

Harvard CS1090A Machine Learning (~400 students) · UofT (3 years)

Sep 2022 – Apr 2023

Software Developer Co-op

Royal Bank of Canada

Selected Work

Software that ships

Full-stack platforms, ML pipelines, and data-intensive applications. All deployed live on GCP.

TargetScout 2026

Enter a gene target, get back a complete validation dossier: disease associations, pathway biology, druggability, tissue safety, clinical trial landscape, and key literature. Built to replace weeks of manual lit review in early-stage pharma.

Pulls from OpenTargets, PubMed, ChEMBL, ClinicalTrials.gov, UniProt, and GTEx. Produces weighted confidence scores across 5 evidence axes with go/no-go recommendations. Validated against known targets like PCSK9, BRCA1, and TP53.

Drug Discovery Target Validation 6 Biomedical APIs React Open Source

Live Code

Evidence Axes Scored

Biomedical APIs Integrated

Biomedical Intelligence Platform 2026

Tracks pharmaceutical deals, clinical trials, and regulatory filings across 100 biotech companies. Designed for BD teams and compliance analysts.

Processes 2,400+ records from ClinicalTrials.gov, SEC EDGAR, PubMed, ChEMBL, and international regulatory databases. Multilingual NLP pipeline handles foreign-language drug filings.

Pharma Intelligence Clinical Trials NLP Next.js Open Source

Live Code

Research Showcase 2026

Interactive, tab-by-tab walkthroughs of five ML projects: an AlphaFold-3 structure dashboard, a NeurIPS conformer, a pathway-aware neural network, an AHA 2026 abstract, and a reproducible genetics pipeline.

Built for technical deep-dives. Light and dark mode, hash-routed, with a test suite. React, TypeScript, Tailwind.

Structural ML Proteomics React + TS

Live

AI Council 2026

Multiple AI models debate scientific and clinical questions in real time, challenging each other instead of agreeing by default.

Orchestrates GPT-4, Claude, and Gemini with anti-sycophancy scoring and a judge model. Real-time streaming deliberation.

Scientific Debate Hypothesis Testing Multi-Agent LLM Open Source

Live Code

AstroCo 2025

Foundation model for irregular time-series data. The architecture applies to clinical time series like longitudinal lab values. Published at NeurIPS 2025.

Self-supervised Conformer-style transformer on 1.5M observations. 70% lower RMSE than benchmarks. Open-sourced on Hugging Face.

Clinical Time Series NeurIPS 2025 PyTorch

Model Paper

Biologically Informed Neural Networks 2025+

Neural networks encoding Reactome pathway structure so disease predictions come with mechanistic explanations. 50K+ plasma samples from major cohorts.

BMI prediction (R² = 0.73, 4.8x over Ridge), heart failure subtype classification (HFpEF vs HFrEF). SHAP interpretability across SomaLogic and Olink.

Proteomics Heart Failure Broad Institute Interpretable ML

MS Thesis, in progress