Abstract
Living evidence databases offer a robust and dynamic alternative to static systematic reviews but require a resilient technical infrastructure for continuous evidence processing. This working paper describes the architecture and implementation of a complete, end-to-end pipeline for this purpose, developed initially for the conservation science domain. Designed to operate on local infrastructure using self-hosted models, the system ingests and normalizes documents from academic publishers, screens them for relevance using a multi-stage process, and extracts structured data according to a predefined schema. Key features include a hybrid retrieval model; a human-AI collaborative process for refining inclusion criteria from complex protocols, and the integration of an established, statistically-principled stopping rule to ensure efficiency. In a baseline evaluation against a prior large-scale manual review, the fully automated pipeline achieved 97% recall and identified a significant number of relevant studies not included in the original review, demonstrating its viability as a foundational tool for maintaining living evidence databases.
Supplementary materials
Title
Supplementary Materials
Description
This contains the supplementary materials detailing inclusion criteria, terminology descriptions, prompts and meta prompts referred to in the main working paper.
Actions