Skip to main content
ScienceDex

Transparency

Open-Source Methodology

Our NIH funding rankings are built from public data using a fully open-source pipeline. Every step — from data ingestion to institution normalization to final ranking — is auditable and reproducible. Explore the data: Medical Schools, Departments, Disciplines, Investigators, Institutions, Geographic, Compare.

1. Data Sources

We use two official NIH data systems, both maintained by the Office of Extramural Research:

NIH RePORTER API v2

The primary source for current fiscal year data. api.reporter.nih.gov provides programmatic access to all active and recently completed NIH-funded projects, including award amounts, principal investigators, institutions, and department codes.

NIH ExPORTER

Bulk CSV downloads for historical fiscal years (FY2001–FY2024). exporter.nih.gov provides the same underlying data in downloadable flat files, enabling reproducible historical analysis.

What's included

  • Award types: All extramural grant mechanisms — Research (R01, R21, R34, etc.), Program (P01, P30, P50), Cooperative Agreements (U01, U54, UG3), Training (T32), Fellowships (F30, F31, F32), Career Development (K08, K23, K99), and other grant types.
  • Subcontracts: Included. The RePORTER API returns subcontract awards linked to their performing institution, which are counted toward that institution's total.
  • R&D contracts: Included in all-institution rankings. Excluded from medical school and department rankings for cleaner comparability (contracts are often large, institution-wide, and not department-specific).
  • Intramural funding: Excluded. NIH intramural research program (IRP) funding is not grant-based and is not reported through the same system.

2. Data Pipeline

Raw data goes through a multi-step pipeline before producing rankings. Each step is implemented as an auditable script in our open-source repository:

  1. 1

    Ingest

    Fetch all NIH awards for the target fiscal year from the RePORTER API (paginated, 500 records per request). For historical years, parse ExPORTER CSV files. Store raw award records with project number, award amount, fiscal year, organization name, department code, and PI details.

  2. 2

    Deduplicate

    NIH projects can have multiple sub-projects and supplements. We deduplicate by project number + fiscal year, summing award amounts across sub-projects. Each unique project is counted once toward grant counts.

  3. 3

    Normalize Institutions

    The NIH database contains hundreds of name variations for the same institution (see Section 3 below). We apply ~200 alias mappings to consolidate these into canonical institution records, each with a unique slug, city, state, and organization type classification.

  4. 4

    Classify Departments

    Map NIH department codes to our 27 standard department categories (see Section 4 below). Awards with unrecognized or missing department codes are included in institution-level rankings but excluded from department-specific rankings.

  5. 5

    Classify Organization Types

    Each institution is classified as School of Medicine, University, Hospital, Research Institute, Government Lab, or Independent Organization. This classification determines which ranking tables an institution appears in — only Schools of Medicine appear in the medical school rankings.

  6. 6

    Aggregate & Rank

    For each ranking type (medical school, all institutions, department, state, PI), sum total funding across all qualifying awards and sort descending. Rank ties are broken by grant count, then PI count. No weighting, normalization, or adjustment is applied — rankings reflect raw total award dollars.

  7. 7

    Store & Index

    Final rankings are stored in PostgreSQL with full-text search indexes on institution and PI names. Historical rankings are preserved for trend analysis. The web application queries this database directly for all ranking pages and API endpoints.

3. Institution Standardization

The single most critical step in producing accurate rankings is institution name normalization. The NIH database contains hundreds of variations for the same institution, which must be consolidated to avoid double-counting and produce meaningful rankings.

Why this matters

Without normalization, a university with three separately named entities in the NIH system would appear as three separate institutions in the rankings, each with a fraction of their true funding. Standardization ensures that all funding flows to the correct canonical institution.

Alias mapping examples

NIH Name Variation(s)Canonical NameRationale
Mayo Clinic Rochester, Mayo Clinic College of Medicine, Mayo Clinic Arizona, Mayo Clinic FloridaMayo ClinicSingle integrated institution across campuses
Rutgers Biomedical and Health Sciences, Rutgers New Jersey Medical School, Rutgers Robert Wood JohnsonRutgers UniversitySchools within the same university
Brigham and Women's Hospital, Massachusetts General Hospital, Beth Israel Deaconess, Dana-Farber Cancer Institute, Boston Children's HospitalHarvard Medical SchoolHarvard-affiliated hospitals (SOM rankings only)
Mount Sinai School of Medicine, Icahn School of Medicine at Mount SinaiIcahn School of Medicine at Mount SinaiName changed in 2012

We maintain approximately 200 alias mappings as open-source code. These are curated manually, cross-referenced against official NIH institution records (IPF numbers), and updated when institutions merge, rename, or restructure. The full alias list is available in our GitHub repository.

Harvard: a special case

Harvard Medical School presents a unique challenge. NIH awards flow to individual hospitals (MGH, BWH, BIDMC, etc.) rather than to "Harvard" directly. For medical school rankings, we aggregate funding from all major Harvard-affiliated teaching hospitals under "Harvard Medical School." For the all-institution rankings, each hospital appears separately, reflecting where the research is actually performed.

4. Department Classification

NIH awards include a department code from the IMPAC II system, which we map to 27 standard medical departments organized into two categories. These correspond to departments as they typically exist in US medical schools.

See all departments and their aggregate funding on the Departments page and Disciplines overview.

Basic Science (8)

  • Anatomy & Cell Biology
  • Biochemistry
  • Biomedical Engineering
  • Genetics
  • Microbiology & Immunology
  • Neurosciences
  • Pharmacology
  • Physiology

Clinical Science (19)

  • Anesthesiology
  • Dermatology
  • Emergency Medicine
  • Family Medicine
  • Internal Medicine
  • Neurology
  • Neurosurgery
  • Obstetrics & Gynecology
  • Ophthalmology
  • Orthopedics
  • Otolaryngology
  • Pathology
  • Pediatrics
  • Physical Medicine & Rehabilitation
  • Psychiatry
  • Public Health
  • Radiology
  • Surgery
  • Urology

Department combining rules

Some NIH department codes map to combined categories. For example:

  • INTERNAL MEDICINE/MEDICINE — Combines internal medicine and general medicine divisions
  • MICROBIOLOGY/IMMUN/VIROLOGY — Combines microbiology, immunology, and virology departments
  • PUBLIC HEALTH & PREV MEDICINE — Combines public health and preventive medicine
  • RADIATION-DIAGNOSTIC/ONCOLOGY — Mapped to "Radiology" (includes diagnostic radiology and radiation oncology)
  • PHYSICAL MEDICINE & REHAB — Combines PM&R and rehabilitation medicine

Awards with department codes outside these 27 categories (e.g., Dentistry, Psychology, Veterinary Sciences) are included in institution-level and PI-level rankings but do not appear in the department breakdown tables.

5. Ranking Types

ScienceDex produces six distinct ranking tables, each with its own inclusion criteria:

Medical Schools

Schools of Medicine ranked by total NIH extramural grant funding. R&D contracts excluded. Institution aliases consolidated. Currently tracking 133 schools.

All Institutions

Every institution receiving NIH extramural funding, including universities, hospitals, research institutes, and government labs. R&D contracts included. Currently tracking 2,871 institutions.

Departments

27 medical departments ranked within each medical school. Shows which departments at each school receive the most NIH funding. R&D contracts excluded.

Investigators

Individual principal investigators ranked by total NIH award amount. Includes all PIs with at least one active NIH grant in the fiscal year. Currently tracking 49,285 PIs.

Geographic

NIH funding aggregated by state, with per-capita calculations using US Census population estimates. Includes state-level and city-level breakdowns.

Disciplines

Cross-tabulation of all 27 departments — aggregate national funding, school counts, and funding trends. Shows basic science vs. clinical science funding distribution.

6. Ranking Computation

Rankings are computed by simple summation and sorting. We deliberately avoid complex weighting or normalization schemes to maintain transparency and reproducibility.

StepMethod
MetricTotal award dollars (SUM of all qualifying awards)
Sort orderDescending by total funding
Tie-breakingGrant count (more grants ranked higher), then PI count
WeightingNone. All grant types contribute equally per dollar
NormalizationNone applied to base rankings. Per-capita figures available on geographic pages
Fiscal yearRankings are computed per fiscal year (Oct 1 – Sep 30). No multi-year averaging

7. Geographic & Per-Capita Rankings

Geographic rankings aggregate NIH funding by state (using the institution's state field from the NIH database). Per-capita funding is calculated using US Census Bureau population estimates (2024 vintage).

  • Total funding by state: Sum of all NIH awards to institutions in that state
  • Per-capita funding: Total state funding / state population (Census 2024 estimates)
  • Year-over-year change: Percentage change in total state funding from the previous fiscal year
  • City-level breakdown: Available within each state detail page, aggregated by the city field in NIH records

DC and Puerto Rico are included as separate entities. US territories other than Puerto Rico are not separately broken out due to small sample sizes.

8. Historical Data & Trends

Rankings are available from FY2001 through FY2024. Historical data enables:

  • Rank trajectories: How an institution's position has changed over time (visible as sparklines in tables and in the interactive bump chart)
  • Funding trends: Total dollar amounts by year for any institution, department, or state
  • Year-over-year movers: Institutions with the largest rank changes between consecutive fiscal years (see Trends)

Note on inflation: All dollar amounts are nominal (not adjusted for inflation or purchasing power). This is consistent with how NIH reports its own funding data. Users should account for inflation when comparing dollar amounts across widely separated fiscal years.

9. Known Limitations

We document the following limitations transparently:

  • Subcontract inclusion: The RePORTER API includes subcontract awards. This means some funding may be double-counted — once for the prime awardee and once for the subcontract performer. This primarily affects large multi-site clinical trials.
  • Data timing: NIH data is a point-in-time snapshot. Awards may be added, corrected, or reclassified after our ingestion date. We re-ingest when significant corrections are published.
  • Department codes: Not all NIH awards have department codes. Awards without department codes are included in institution-level rankings but cannot be attributed to specific departments.
  • Multi-PI awards: Awards with multiple PIs are attributed to the contact PI's institution. Funding is not split across co-PI institutions unless separate subcontracts exist.
  • Institution classification: Organization type classification (School of Medicine vs. University vs. Hospital) is based on NIH's own institutional profile, supplemented by manual curation. Edge cases exist for institutions with both a medical school and a broader university identity.
  • Nominal dollars: Rankings reflect nominal (current-year) dollars. No inflation adjustment is applied.

10. API Access

All ranking data is available programmatically via JSON API endpoints. These are the same endpoints that power the ScienceDex website.

EndpointDescription
/api/nih/medical-schoolsMedical school rankings with pagination, search, year filter
/api/nih/medical-schools/[slug]Individual school profile with departments, history, top PIs
/api/nih/institutionsAll institution rankings
/api/nih/institutions/[slug]Individual institution profile
/api/nih/departmentsDepartment summary across all schools
/api/nih/departments/[slug]Department detail with per-school breakdown
/api/nih/investigatorsPI rankings with search
/api/nih/investigators/[slug]Individual PI profile with grant history
/api/nih/geographicState-level funding with per-capita data
/api/nih/geographic/[state]State detail with city breakdown
/api/nih/bump-chartRank trajectories over time for visualizations
/api/nih/bar-raceYearly funding frames for bar chart race
/api/nih/department-trendsDepartment funding trends over time
/api/nih/compareSide-by-side institution comparison

Common query parameters: year (fiscal year), limit (pagination), offset (pagination), search (text search), sort (sort field).

11. Reproducibility & Open Source

Our entire pipeline is open-source under the MIT license:

github.com/ConductScience-Foundation/sciencedex-nih-pipeline
  • scripts/ingest-nih-data.ts — Data ingestion from RePORTER API and ExPORTER CSVs
  • scripts/standardize-institutions.ts — ~200 institution alias mappings
  • scripts/classify-departments.ts — Department code → 27-category mapping
  • scripts/compute-rankings.ts — Aggregation and ranking computation
  • scripts/populate-geographic.ts — State/city aggregation with Census population data

Anyone with access to the NIH RePORTER API can reproduce our rankings by running these scripts. We welcome corrections, additions, and improvements via pull requests or email.

This methodology page is versioned alongside our code. Last updated February 2026. Data through FY2024.