Longevity research spans tens of thousands of clinical and observational publications, yet no systematic, claim-level, quality-graded synthesis of the human literature exists. We present an end-to-end natural language processing pipeline that retrieves, screens, structures, normalises, validates, and quality-grades evidence claims from PubMed at scale. The pipeline uses a local large language model (LLM) for relevance screening and record splitting, and frontier LLMs for structured extraction, entity filtering, taxonomy normalisation, polarity correction, claim validation, and hallmark mapping. Applied to 108,431 retrieved records, the pipeline produced a final dataset of 2,987 quality-graded claims from 1,797 publications, merged into 2,641 factor–outcome claim pairs. The results reveal a broad but shallow evidence landscape: exercise and physical training account for 33.8% of the final corpus, only 1.1% of claims target direct survival or longevity outcomes, and 91.9% of claim pairs are supported by a single study. The main contribution is a modular, updatable NLP system for large-scale claim-level evidence synthesis, together with a public database available at longevityevidence.org.

This work is licensed under a Creative Commons Attribution 4.0 International License.