Search

Transformer-based large language models are receiving considerable attention because of their ability to analyse scientific literature. Small language models (SLMs), however, also have potential in this area as they have smaller compute footprints and allow users to keep data in-house. Here, we quantitatively evaluate the ability of SLMs to: (i) score references according to project-specific relevance and (ii) extract and structuring data from unstructured sources (scientific abstracts). By comparing SLMs’ outputs against those of a human on hundreds of abstracts, we found that (i) SLMs can effectively filter literature and extract structured information relatively accurately (error rates as low as 10%), but not with perfect yield (as low as 50% in some cases), (ii) that there are tradeoffs between accuracy, model size and computing requirements and (iii) that clearly written abstracts are needed to support accurate data extraction. We recommend advanced prompt engineering techniques, full-text resources and model distillation as future directions.

Summary

Chapter 7 presents the soil carbon cycle. The chapter largely by-passes the still uncertain processes that occur at the molecular scale. The focus is on macroscopic properties and how they vary with space and time. Soil C storage is first examined from a box model perspective, which introduces mass balance equations and how they are useful, when coupled with data, in beginning to understanding soil C dynamics. The chapter includes an introductory perspective on the vertical trends in soil C and the transport-reaction models that are needed to fully explain these patterns. Soil organic C is largely removed from soil as CO2, and production-diffusion models are introduced to explain observable CO2 depth profiles and to calculate the fluxes to the atmosphere. Diffusion impacts the C isotope composition of soil CO2 and any CaCO3 minerals that subsequently form. These are examined through the lens of diffusion modeling, which is now common, and critical, in any examination of soil properties with depth.

Search Results

Refine search

Refine search

Actions for selected content:

2 results

Small language models enable rapid and accurate extraction of structured data from unstructured text: An example with plants and their specialized metabolites

7 - The Soil Carbon Cycle

Summary

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

2 results

Small language models enable rapid and accurate extraction of structured data from unstructured text: An example with plants and their specialized metabolites

7 - The Soil Carbon Cycle

Summary