Antony Tan

Software engineer and ML researcher. Production platforms, deep learning pipelines, large-scale data.

MODEL.FORWARD() · LOSS.BACKWARD() · OPTIMIZER.STEP() · EPOCH++ · GRADIENT · ATTENTION · TRANSFORMER · EMBEDDING · LATENT SPACE

About

I build software that solves hard problems. Lately that's ML and healthcare, but the engineering is what I love.

I like building things and I like hard problems. Right now that looks like full-stack platforms, deep learning on large-scale data, and getting systems into production. I care a lot about clean architecture and writing code that actually ships.

I'm finishing my MS in Computational Biology at Harvard, cross-registered at MIT EECS, and doing research at the Broad Institute and Harvard Medical School. Day to day I build full-stack applications, write ML pipelines, and work with large-scale data. The kind of problems where good engineering and good science have to work together.

Before Harvard, I studied CS and Biology at the University of Toronto, shipped production software at RBC, and spent three years teaching CS. I speak English, Cantonese, and Mandarin.

Engineering

Python, TypeScript, Java, SQL. React, Next.js, Node. AWS, GCP, Docker, PostgreSQL, Supabase. Production systems at scale.

Machine Learning

PyTorch, Transformers, foundation models, interpretable ML (SHAP), scikit-learn, multi-agent LLM systems

Data & Infrastructure

Large-scale data pipelines, ETL, REST APIs, NLP, real-time streaming, multi-source data integration

Domain: Healthcare & Bio

Proteomics, multi-omics, clinical data, drug target validation, pathway analysis

Publications

Nature Communications (in revision)
S. Liang*, M.S. Kim*, Y. Sui, Y. Tan, ..., A.C. Fahed*, Z. Yu*
NeurIPS 2025 — ML & Physical Sciences Workshop
Antony Tan, P. Protopapas, M. Cádiz-Leyton, G. Cabrera-Vives, C. Donoso-Oliva, I. Becker
SIGCSE 2024
B. Harrington, ..., Y. Tan, et al.
Education
MS Computational Biology & Quantitative Genetics
Harvard T.H. Chan School of Public Health
GPA 4.0/4.0 · Thesis: Biologically Informed Neural Networks for Proteomic Pathway Discovery · Advised by Zhi Yu, PhD (Broad / HMS / MGH)
Cross-registered, EECS
MIT
GPA 5.0/5.0 · Graduate ML, Generative AI in Computational Biology, Clinical Data Learning
BSc Computer Science Specialist (Software Eng.), Biology Major, Statistics Minor
University of Toronto
Azure DP-900 Certified · 3 years CS Teaching Assistant
Experience
Oct 2025 – Now
ML Researcher, Proteomics
Harvard Medical School / Broad Institute / MGH — Yu Lab
May 2025 – Now
ML Research Assistant, Foundation Models
Harvard SEAS — Protopapas Lab
Sep 2021 – Jan 2026
Teaching Fellow & Teaching Assistant
Harvard CS1090A Machine Learning (~400 students) · UofT (3 years)
Sep 2022 – Apr 2023
Software Developer Co-op
Royal Bank of Canada

Selected Work

Software that ships

Full-stack platforms, ML pipelines, and data-intensive applications. All deployed live on GCP.

Biomedical Intelligence Platform 2026

Tracks pharmaceutical deals, clinical trials, and regulatory filings across 100 biotech companies. Designed for BD teams and compliance analysts.

Processes 2,400+ records from ClinicalTrials.gov, SEC EDGAR, PubMed, ChEMBL, and international regulatory databases. Multilingual NLP pipeline handles foreign-language drug filings.

Pharma Intelligence Clinical Trials NLP Next.js Open Source
AI Council 2026

Multiple AI models debate scientific and clinical questions in real time, challenging each other instead of agreeing by default.

Orchestrates GPT-4, Claude, and Gemini with anti-sycophancy scoring and a judge model. Real-time streaming deliberation.

Scientific Debate Hypothesis Testing Multi-Agent LLM Open Source
AstroCo 2025

Foundation model for irregular time-series data. The architecture applies to clinical time series like longitudinal lab values. Published at NeurIPS 2025.

Self-supervised Conformer-style transformer on 1.5M observations. 70% lower RMSE than benchmarks. Open-sourced on Hugging Face.

Clinical Time Series NeurIPS 2025 PyTorch
Biologically Informed Neural Networks 2025+

Neural networks encoding Reactome pathway structure so disease predictions come with mechanistic explanations. 50K+ plasma samples from major cohorts.

BMI prediction (R² = 0.73, 4.8x over Ridge), heart failure subtype classification (HFpEF vs HFrEF). SHAP interpretability across SomaLogic and Olink.

Proteomics Heart Failure Broad Institute Interpretable ML
Blog
Writing

Thoughts on machine learning, computational biology, and building things.

Coming soon.