1. Introduction
According to the Standard Big Bang Nucleosynthesis model, only hydrogen, helium, and traces of light elements (e.g. lithium) were produced during the Big Bang. These baryons clustered with the structures defined by dark matter, eventually collapsing to form the first stars at
$z \gt10$
(e.g. Beers & Christlieb Reference Beers and Christlieb2005; Bromm et al. Reference Bromm, Yoshida, Hernquist and McKee2009; Frebel & Norris Reference Frebel and Norris2015; Fraser et al. Reference Fraser, Casey, Gilmore, Heger and Chan2017). The first stars, often referred to as Population III (or Pop. III), dominated the star formation history until their supernovae enriched the interstellar medium, allowing the formation of the next generations of stars: the Pop. II and Pop. I stars (e.g. Klessen & Glover Reference Klessen and Glover2023). Direct observations of Pop. III stars in the high-redshift Universe (e.g. Oh, Haiman, & Rees Reference Oh, Haiman and Rees2001; Scannapieco, Schneider, & Ferrara Reference Scannapieco, Schneider and Ferrara2003; Greif et al. Reference Greif, Johnson, Klessen and Bromm2009; Zackrisson et al. Reference Zackrisson, Rydberg, Schaerer, Östlin and Tuli2011; Zackrisson et al. Reference Zackrisson2012; Rydberg et al. Reference Rydberg, Zackrisson, Lundqvist and Scott2013; Mas-Ribas, Dijkstra, & Forero-Romero Reference Mas-Ribas, Dijkstra and Forero-Romero2016; Riaz, Hartwig, & Latif Reference Riaz, Hartwig and Latif2022) has so far been unsuccessful, with instruments like the James Web Space Telescope requiring a Pop. III star to be gravitationally lensed for it to be detectable (Zackrisson et al. Reference Zackrisson2024). The likelihood of low-mass Pop. III stars (
${M} \lt 0.8\,\mathrm{M}_\odot$
) forming, which could live until the present-day, is unknown (e.g. Ishiyama et al. Reference Ishiyama2016; Chandra & Schlaufman Reference Chandra and Schlaufman2021), but if they exist, forthcoming massive spectroscopic surveys such as 4MOST and WEAVE (e.g. de Jong et al. Reference de Jong2019; Jin et al. Reference Jin2024) will be able to put constraints on their existence.
One approach to understanding the properties of Pop. III stars is through the study of EMP stars, defined as
$[\text{Fe}/\text{H}]$
Footnote
a
$\leq -3.0$
dex (Beers & Christlieb Reference Beers and Christlieb2005). These stars may be direct descendants of Pop. III stars, with their chemical composition offering insights into the first stars and the evolution of the early Galaxy. Numerous stellar archaeological studies have provided insights into EMP stars by studying them in Galactic regions dominated by old stars such as the halo and/or bulge (e.g. Christlieb et al. Reference Christlieb2008; Schörck et al. Reference Schörck2009; Frebel Reference Frebel2010; Salvadori et al. Reference Salvadori, Ferrara, Schneider, Scannapieco and Kawata2010; Caffau et al. Reference Caffau2013; Howes et al. Reference Howes2015; Howes et al. Reference Howes2016; Da Costa et al. Reference Da Costa2019; Arentsen et al. Reference Arentsen2020; Yong et al. Reference Yong2021; Ishigaki et al. Reference Ishigaki2021; Li et al. Reference Li2022).
Recently, it was discovered that a population of EMP stars confined to disk-like orbits exists (e.g. Sestito et al. Reference Sestito2019; Sestito et al. Reference Sestito2020; Kielty et al. Reference Kielty2021; Fernández-Alvar et al. Reference Fernández-Alvar2021; Cordoni et al. Reference Cordoni2021; Chiti et al. Reference Chiti, Mardini, Frebel and Daniel2021b), with the majority on prograde rather than retrograde orbits. For example, studies such as Sestito et al. (Reference Sestito2020), which used the Pristine survey (Starkenburg et al. Reference Starkenburg2017), showed that at the
$5.0\sigma$
confidence level, the majority of their 1 027 metal-poor stars (with
$\left[\text{Fe/H}\right] \leq -2.5$
dex) are classified as prograde disk stars. The follow-up study Sestito et al. (Reference Sestito2021) used the NIHAO-UHD simulations to show that there is a distinct population of prograde metal-poor stars in the models. Similarly, the observational study Hong et al. (Reference Hong2024), using stellar parameters derived from Huang et al. (Reference Huang2022, Reference Huang2023), showed that from their sample of 11.5 million stars from the SkyMapper Southern Survey Data Release 2 (Onken et al. Reference Onken2019) and Stellar Abundance and Galactic Evolution Survey (Fan et al. Reference Fan2023), the majority of their disk EMPs are confined to prograde orbits with low eccentricities.
To understand the true nature of these stars, high-resolution spectroscopy of EMP prograde disk stars is needed to gain a detailed perspective from their chemistry. For example, the star SDSS J102915+172927, also known as the ‘Caffau’ star (Caffau et al. Reference Caffau2011, Reference Caffau2012) at
$\left[\text{Fe/H}\right]{} = -4.89 \pm 0.10$
has prograde disk orbit with
$z_{\mathrm{max}} \lt 3$
kpc and
$e = 0.12 \pm 0.01$
(Sestito et al. Reference Sestito2019). This star has no significant [C/Fe] enhancement, with an upper limit of
$\left[\text{C/Fe}\right] \lt 0.6$
derived with 3D hydrodynamical model atmospheres, assuming local thermodynamical equilibrium (LTE) (Lagae et al. Reference Lagae2023), a surprise as most stars at this metallicity range are carbon-enhanced (e.g. Placco et al. Reference Placco, Frebel, Beers and Stancliffe2014; Arentsen et al. Reference Arentsen2021). The nature of this star challenges models for early star formation that require [C/Fe] enhancement to allow low-mass star formation (Schneider et al. Reference Schneider2012).
Another EMP star confined on a prograde orbit is P1836849 (Dovgal et al. Reference Dovgal2024), which has
$\left[\text{Fe/H}\right]{} = -3.3 \pm 0.1$
dex. Compared with other metal-poor stars, P1836849 has very low abundances (with respect to iron) of
$\alpha$
-elements (Na, Mg, Si), together with large abundances for Cr and Mn. Comparisons with other EMPs in prograde disk orbits (e.g. Cordoni et al. Reference Cordoni2021; Yong et al. Reference Yong2021) shows that there is no obvious common formation origin for these stars when considering the differences in chemistry and kinematics.
These recent revelations provide a new perspective on the diverse process that contributed to the formation of the early Galaxy. Among the various possibilities, the disk may have formed through in-plane accretions (e.g. Dinescu et al. Reference Dinescu2002; Majewski et al. Reference Majewski2012; Myeong et al. Reference Myeong, Evans, Belokurov, Sanders and Koposov2018), in-situ events (e.g. Alfaro et al. Reference Alfaro2022; Conroy et al. Reference Conroy2022), or through stellar migration (e.g. Grenon Reference Grenon1999; Schönrich & Binney Reference Schönrich and Binney2009; Minchev et al. Reference Minchev2012). EMP stars on both prograde and retrograde disk orbits are thus important, as they can be used as tracers for the early build-up of the Galaxy (Sestito et al. Reference Sestito2021), but the significant lack of EMP disk stars makes this difficult. In this paper, we expand the pool of EMP disk stars by presenting a new survey using 2dF coupled with the AAOmega spectrograph (Saunders et al. Reference Saunders2004; Sharp et al. Reference Sharp2006), focused on the Galactic disk using stars selected from Gaia DR3 (Gaia Collaboration et al. Reference Collaboration2023). In what follows we present our sample selection (Section 2), the determination and validation of stellar parameters (Section 3), a chemo-dynamical analysis (Section 4) and a discussion of the results (Section 5). A detailed comparison of our spectroscopic parameters against those inferred from other analyses of Gaia XP spectra is provided in Appendix 1.
2. Observations
2.1. Sample selection
Gaia Data Release 3 includes 220 million low-resolution spectra from the Blue (BP) and Red (RP) spectrophotometers (Gaia Collaboration et al. Reference Collaboration2023), which cover the wavelength regions 330–680 nm and 640–1 050 nm, respectively, with a spectral resolving power ranging from 30 to 100 (De Angeli et al. Reference De Angeli2023). This wide wavelength coverage allows analysis of Gaia BP/RP (XP hereafter) spectra to provide reasonably good inferences on stellar parameters like metallicity and effective temperature from the spectral energy distributions (SED) (e.g. Yao et al. Reference Yao, Ji, Koposov and Limberg2024).
Numerous catalogues of stellar parameters have been published based on Gaia XP spectra (e.g. Andrae et al. Reference Andrae2023b; Andrae, Rix, & Chandra Reference Andrae, Rix and Chandra2023a; Fouesneau et al. Reference Fouesneau2023; Martin et al. Reference Martin2023; Li et al. Reference Li, Wong, Hogg, Rix and Chandra2024; Viswanathan et al. 2024b; Xylakis-Dornbusch et al. Reference Xylakis-Dornbusch2024). For our work, we adopted the Zhang et al. (Reference Zhang, Green and Rix2023) catalogue. These authors developed a data-driven model to provide estimates of stellar parameters (
$T_{\text{eff}}$
,
$\log g$
and
$\left[\text{Fe/H}\right]$
) and reddening for the entire Gaia XP spectra dataset.
We used that catalogue to select metal-poor star candidates, initially regardless of location on the sky, stellar parameters, evolutionary state, and reddening E(B
$-$
V), but adopting the suggested basic reliability cut
$\texttt{quality\_flags} \lt 8$
in order to maximise completeness at the cost of potential contamination from misclassified more metal-rich stars. The adopted reliability cut indicates, broadly, that the quality of the fit was acceptable and that the parallax estimate is within
$10\,\sigma$
agreement with the actual Gaia measurement. Nevertheless, uncertainties in
$T_{\text{eff}}$
,
$\log g$
or
$\left[\text{Fe/H}\right]$
may be significant.
In order to mitigate the effects of metallicity uncertainties on our selection, we specifically chose stars with
$\left[\text{Fe/H}\right] + \sigma_{\left[\text{Fe/H}\right]}\lt -1$
. We included the error term in our selection to adopt the 1
$\sigma$
upper bound on the metallicity. This process resulted in a selection of 3 979 676 stars.
We intended to observe the sample at Siding Spring Observatory (SSO) using the 2-degree field fibre-fed multi-object system (2dF) instrument (Lewis et al. Reference Lewis2002). For this, we further filtered the target list according to a blending criterion, based on experience with typical seeing at SSO. In particular we required:

where s is the separation between the target star and its neighbours in units of arcsec, and
$\Delta G_{BP}$
is the difference in brightness (in units of mag). The required separation for a contaminating star fainter by 1 mag is 3 arcsec, and for successively brighter contaminating stars the required separations are 5, 9, 15, 23, 33, 45 and 60 arcsec. Applying this criterion reduced our catalogue to 3 230 740 stars.
We then broadly grouped stars according to metallicity (actually
$\left[\text{Fe/H}\right] + \sigma_{\left[\text{Fe/H}\right]}$
) in bins of 0.5 dex, and used this as a primary priority marker. At a finer level, targets were further prioritised by brightness, with those at fainter magnitudes having lower priority than those with brighter magnitudes. This priority impacts the allocation of fibres in 2dF, where targets with greater priority (e.g. brighter targets) are more likely to be given allocations than those with lower priority (e.g. fainter targets). Further, in order to avoid observing known globular cluster members, we deselected stars within the tidal radius of all objects listed in the Harris (Reference Harris2010) catalogue of globular clusters (with the exception of NGC 362, see next Section). We tiled the sky with overlapping fields of
$2 \deg$
diameter, and prioritised the selection of fields according to the number of candidate metal-poor stars with
$\left[\text{Fe/H}\right] \lt -1$
. Fields close to the Galactic disk were preferentially selected over others. To ensure maximum use of the 2dF fibres, targets with lower priority were used to fill out the fibres once the higher priority targets were allocated. We did not use reddening either for selecting fields or for prioritising candidate stars.
Table 1. First three fields observed at the AAT, identified by the field centre coordinates in equatorial and Galactic coordinates, alongside the number of stars observed in each field. Field ra_0103-7059 is the field located by the SMC, containing globular cluster NGC 362 for validation. Full table available on the electronic version of the paper.


Figure 1. Location of fields observed on the AAT, shown in Galactic coordinates with Aitoff projection. The Gaia DR3 image is used as the background image (Gaia Collaboration et al. Reference Collaboration2023). Red circles mark observed fields, and blue star symbols indicate the vetted EMP stars. Fields near the centre of the figure are at
$l \sim 0^{\circ}$
, fields on the far-right are at
$l\sim 265^{\circ}$
, and the field near the SMC is at
$l \sim 300^{\circ}$
. The field near the SMC is a calibration field incorporating the globular cluster NGC 362.
2.2. Medium resolution spectroscopy
Stars selected from the Zhang et al. (Reference Zhang, Green and Rix2023) sample were followed up using 2dF+AAOmega on the AAT at SSO. The 2dF instrument allows us to target up to 392 targets simultaneously within a 2-degree field (Sharp et al. Reference Sharp2006). The real number of fibres allocated to targets is typically lower than this (
$\sim$
300) due to the need for fibres placed on the sky (typically 20–30) for sky subtraction, avoidance of collisions with other fibres (i.e. in crowded regions), and fibres that are broken.
The AAOmega spectrograph (e.g. Sharp et al. Reference Sharp2006) was used with the 580V grating in the blue arm covering the wavelength region
$3\,700 \leq \lambda \leq 5\,800$
Å at
$R \approx 1\,300$
, and the 1700D grating for the red arm covering the wavelength region
$8\,450 \leq \lambda \leq 9\,000$
Å at
$R \approx 10\,000$
. These were chosen as the 580V grating provides coverage of the Ca H & K lines (3 933 and 3 969 Å) for metallicity estimates (e.g. Beers, Preston, & Shectman Reference Beers, Preston and Shectman1992; Frebel et al. Reference Frebel2005; Norris et al. Reference Norris2007; Roederer et al. Reference Roederer2014), the CH G-band (
$\sim$
4 300 Å) for carbon abundances, and the Mg I b lines (5 167, 5 172, 5 183 Å) with relatively good throughput. The 1700D grating provides coverage of the Ca II triplet (8 498, 8 542 and 8 662 Å), another excellent indicator of metallicity (e.g. Armandroff & Da Costa Reference Armandroff and Da Costa1991; Starkenburg et al. Reference Starkenburg2010; Carrera et al. Reference Carrera, Pancino, Gallart and del Pino2013; Howes et al. Reference Howes2015; Da Costa Reference Da Costa2016), as well as accurate radial velocities. These constraints allowed us to probe down to an apparent G magnitude of about 17.5 mag, well matched by how faint XP spectra go for reliable parameter determination. This also returns a number of metal-poor candidates which broadly matches the available number of fibres. To avoid the impact of scattered light, we selected targets within a range of 2.5 mag. At
$m_G \sim 16$
across 4 410–4 500 Å, we achieved a signal-to-noise (S/N) ratio of
$\sim 25$
, and for 8 560–8 650 Å S/N
$\sim 30$
(see Fig. 2). Typical exposure times for our fields are
$4 \times 1\,800$
s, with
$\sim 2$
” seeing.
With this setup, 8 911 metal-poor star candidates from our sample were observed over 74 h across 10 nights in 2023 covering 28 fields. The list of fields is in Table 1, and their locations are shown by the red circles in Fig. 1. We have 21 fields above and below the Galactic centre (
$b \pm 20^{\circ}$
at
$l \sim 0$
), with six fields at
$l \sim 265^{\circ}$
and
$b \sim -8^{\circ}$
. The field near the Small Magellanic Cloud (SMC;
$l \sim 300^{\circ}$
and
$b \sim -46^{\circ}$
) contains the globular cluster NGC 362, targeted as part of our validation (see Section 3.4.2), but also including field metal-poor star targets. As discussed below, 15 EMPs were found in the data, shown by the blue star symbols in Fig. 1.
The data was reduced using version 8.03b of the 2dF reduction pipeline 2dfdr.Footnote
b
Here, fibre flat-field exposures were used to define fibre positions on the detector, whilst arc lamp exposures were used to wavelength calibrate the extracted spectra (e.g. Howes et al. Reference Howes2016; Da Costa Reference Da Costa2016). Accurate sky subtraction is crucial for the Ca II triplet region, so we performed this using skyflux(cor), which takes the strengths of night sky emission lines present in the raw spectra and uses them to normalise the fibre throughputs. Cosmic rays were then rejected using the bclean method, a simple ‘zap-ray’ method that is more general and efficient than the Laplacian edge detection methods like lacosmic and pycosmic, but is less effective (Dokkum Reference Dokkum2001). The wavelength-calibrated sky-subtracted spectra from the individual exposures were then combined using flux weighting with rejection to remove any remaining cosmic rays and to improve the overall signal-to-noise ratio. Fig. 2 shows the distribution of S/N values, calculated using the snr_derived method in specutils
Footnote
c
(Astropy-Specutils Development Team 2019). For the blue detector, we measured the S/N from the continuum at 4 410–4 500 Å, while for the red detector, we measured it from the continuum at 8 560–8 650 Å. Stars with S/N
$_{\text{red}} \leq 12$
Footnote
d
were removed due to poor data quality in both the blue and red arms. Stars with the
$\texttt{phot\_variable\_flag} = \texttt{VARIABLE}$
flag in the Gaia DR3 dataset were also removed, reducing the final sample to 5 795 stars. Examples of reduced AAOmega spectra are shown in Fig. 3.
3. Analysis techniques
3.1. Stellar parameters
To accurately extract stellar parameters for the sample, we developed a methodology that self-consistently derives
$T_{\text{eff}}$
and
$\log g$
, then uses the template fitting code RVSpecFit
Footnote
e
(Koposov et al. Reference Koposov2011; Koposov Reference Koposov2019) to get
$\left[\text{Fe/H}\right]$
and
$\left[\text{C/Fe}\right]$
. The code initialises with parameters from the Gaia XP analysis of Zhang et al. (Reference Zhang, Green and Rix2023), iterating through the process by determining
$\left[\text{Fe/H}\right]$
for a given
$T_{\text{eff}}$
and
$\log g$
, then redetermining
$T_{\text{eff}}$
and
$\log g$
with the new metallicity. This repeats until reaching convergence, typically in 2–3 iterations.
$\left[\text{C/Fe}\right]$
is then determined independently using the converged stellar parameters. Uncertainties for stellar parameters are described in Section 3.2. Below we describe our procedure in more detail.

Figure 2. S/N ratios for our sample in both the blue and red detectors. Stars with S/N
$_{\text{red}} \leq 12$
were removed due to insufficient data quality, shown by the grey points. The fit to the S/N data is shown by the black line. Upper panel: S/N ratios of the red arm, measured from the continuum at 8 560–8 650 Å. Lower panel: S/N ratios of the blue arm, measured from the continuum at 4 410–4 500 Å.
3.1.1. Effective temperature
To compute stellar effective temperatures we use Gaia DR3 photometry with the colte
Footnote
f
code (Casagrande et al. Reference Casagrande2021). This uses colour-
$T_{\text{eff}}$
relations derived implementing the InfraRed Flux Method in the Gaia and 2MASS system which have very mild dependency on metallicity and surface gravity. Reddening corrections used the Schlegel et al. (Reference Schlegel, Finkbeiner and Davis1998) extinction map, rescaled as described in Casagrande et al. (Reference Casagrande2019). To ensure consistency with our bolometric corrections (see Section 3.1.2), in colte we adopted the ‘COD’ extinction law (based on Cardelli, Clayton, & Mathis Reference Cardelli, Clayton and Mathis1989 and O’Donnell Reference O’Donnell1994).
3.1.2. Surface gravity
A precise parallax measurement together with an estimate of the luminosity and stellar mass can yield a highly precise estimate of surface gravity. Low-precision parallaxes, however, can be biased due to sampling effects, and require a more involved treatment that relies on knowing the true distribution of stellar distances to derive
$\log g$
values (e.g. Bailer-Jones et al. Reference Bailer-Jones, Rybizki, Fouesneau, Demleitner and Andrae2021).
We therefore opted for a two-pronged approach based on parallax quality, which we describe below. Distances and therefore absolute Gaia DR3 magnitudes were derived using the inverse of these processes once
$\log g$
has been found (see Section 3.2.2).
For stars with good parallax (
$\pi \geq 3 \sigma$
), we used the absolute bolometric magnitude,
$T_{\text{eff}}$
and stellar mass (assuming 0.8 M
$_{\odot}$
) directly to get the surface gravity. For the bolometric magnitude, we used bcutil
Footnote
g
(Casagrande & VandenBerg Reference Casagrande2018) with the Gaia DR3 G filter, which computes synthetic bolometric corrections for given values of
$T_{\text{eff}}$
,
$\left[\text{Fe/H}\right]$
,
$\log g$
(using the initial value) and
$E(B-V)$
, while adopting the Cardelli et al. (Reference Cardelli, Clayton and Mathis1989) and O’Donnell (Reference O’Donnell1994) extinction laws.
For stars lacking good parallaxes (
$\pi \lt 3 \sigma$
), we instead opted to employ isochrones calculated by the Dartmouth evolution code (Dotter et al. Reference Dotter2008).Footnote
h
The base isochrones cover a range of metallicities from
$\left[\text{Fe/H}\right]=0.5$
down to
$-4.0$
dex with a metallicity stepsize of 0.5 dex. We interpolated these isochrones to a finer stepsize of 0.1 dex using iso_interp_feh.Footnote
i
For all models, helium enrichment was set as
$Y = 0.245+1.5Z$
. We then selected the following isochrones: for
$\left[\text{Fe/H}\right]$
from 0.0 to
$-0.2$
dex, we assumed solar ages with [
$\alpha$
/Fe] = 0.0. For
$\left[\text{Fe/H}\right]$
from
$-0.2$
to
$-0.5$
dex, [
$\alpha$
/Fe] = 0.2 was adopted, again assuming solar ages. Isochrones with
$\left[\text{Fe/H}\right]$
from
$-0.6$
to
$-4.0$
dex had an assumed age of 12 Gyr with
$[\alpha/\mathrm{Fe}] = 0.4$
dex. Although the goal of this work is to identify and analyse metal-poor stars, which presumably are old, we note that for metal-rich stars selecting an older isochrone rather than a younger one results in an increase of
$\sim 0.3$
dex in
$\log g$
. In order to allow for a more precise
$T_{\text{eff}}$
-
$\log g$
relationship, we resampled the isochrones in the
$T_{\text{eff}}$
-
$\log g$
plane to double the sampling. We further assumed stars with
$\pi \lt 3 \sigma$
to be giants. In doing so, equivalent evolutionary points were used to restrict the isochrones to luminosities brighter than the turn-off.

Figure 3. The Ca II H and K region alongside the Ca II triplet region for stars ra_0816-5131_s154, ra_1648-2642_s70 and ra_1639-2632_s98. The names with their red S/N are written above the spectra in the right panel.
$T_{\text{eff}}$
,
$\log g$
and
$\left[\text{Fe/H}\right]$
values for each star are beneath the spectra on the same panel. All three stars have
$\text{m}_{\text{G}} \approx 16.25$
mag. Star ra_0816-5131_s154 (top) shows an example of a relatively hot spectrum at
$T_{\text{eff}} = 6\,900$
K, star ra_1648-2642_s70 (middle) is an example cool spectrum at
$T_{\text{eff}} = 4\,914$
K, and star ra_1639-2632_s98 (bottom) is an example very metal-poor star at
$\left[\text{Fe/H}\right]=-2.71$
. Left panel: Ca II H and K region from the 580V arm for the three stars. Vertical dashed lines show the Ca II H line at 3 933.66 Å, and the Ca II K line at 3 968.47 Å. Right panel: Ca II triplet region from the 1700D arm for the three stars. Vertical dashed lines show the first two of the Ca II triplet lines, at 8 498.02 and 8 542.09 Å respectively.
3.1.3. Metallicity and radial velocity
We derived metallicities, radial velocities and carbon abundances with our code, which performs a least-squares fit to a grid of synthetic template spectra with a Nelder-Mead searchFootnote
j
. The templates are fitted to observations through multiplying by a polynomial continuum over the wavelengths
$3\,800$
–
$5\,500$
Å for the blue arm, and
$8\,400$
–
$8\,800$
Å for the red arm. This is done for a set of stellar parameters, whereby
$T_{\text{eff}}$
and
$\log g$
are determined as per the previous sections, while
$\left[\text{Fe/H}\right]$
, [
$\alpha$
/Fe], and radial velocity RV are determined using RVSpecFit (see Li et al. (Reference Li2019) for more detail). We found through trial and error that we achieved good outcomes for our spectra when continuum fitting using fifth order Chebyshev polynomials for both arms, noting that the code by default uses radial basis functions.
The RVSpecFit code comes prepacked with the PHOENIX synthetic grid library (Husser et al. Reference Husser2013), but this lacked the ability to vary and fit [C/Fe]. To overcome this, we simultaneously fit both the blue and red arms as follows. For the 580V arm, we computed spectra that spanned a range of
$\left[\text{C/Fe}\right]$
using the LTE Turbospectrum code (Plez Reference Plez2012), while for the 1700D arm, we calculated spectra using the pySME code (Wehrhahn, Piskunov, & Ryabchikova Reference Wehrhahn, Piskunov and Ryabchikova2023). This utilities NLTE corrections that are necessary to accurately calculate the cores of the Ca-II triplet lines to determine
$\left[\text{Fe/H}\right]$
abundances. Fits to the blue and red spectra were iterated five times using updated stellar parameters at each step. Grids of synthetic template spectra were processed using a two-step interpolation procedure. First, spectra were broadened to the observed resolving power and sampled at appropriate steps in wavelength. Then a Delaunay triangulation (see e.g. Amidror Reference Amidror2002) was performed to interpolate between templates, providing smoothly changing spectral templates as a function of stellar atmospheric parameters. As old, metal-poor stars have generally long since spun down to rotational velocities well below the instrumental resolution of AAOmega, we opted to neglect rotational broadening.
3.1.4. Carbon abundances
RVSpecFit has the facility to fit
$\left[\text{Fe/H}\right]$
alongside another dimension in abundance. Typically this is [
$\alpha$
/Fe], but as our 580V spectra contained the CH G-band (
$\sim 4\,300$
Å), we fitted for [C/Fe] instead after determining the other stellar parameters. For this, we calculated
$\chi^2$
for the CH G-band region while simultaneously fitting the continuum slope. The range
$4\,150 \leq \lambda \leq 4\,450$
Å was considered for each star over a grid of synthetic spectra, while adopting the same
$T_{\text{eff}}$
,
$\log g$
and
$\left[\text{Fe/H}\right]$
values determined in the previous sections. Abundances were measured from
$-5.0 \leq \mathrm{[C/Fe]} \leq -\left[\text{Fe/H}\right] + 0.5$
at 0.01 dex stepsizes, then we took the lowest
$\chi^2$
as the accepted
$\left[\text{C/Fe}\right]$
value.
3.2. Stellar parameter error analysis and distances
With our data, we have two main sources of error: those that come directly from measurements on the spectrum (treated as an uncorrelated Gaussian error), and those that are related to the uncertainties in the stellar parameters that are propagated through the analysis.
For the stellar parameters, we perturbed input parameters (magnitudes, parallax and reddening) for each star 100 times randomly, sampling their uncertainties as we describe more in detail below. All errors excluding radial velocity were derived from the standard deviation of the randomisations. Median errors for each stellar parameter based on S/N quality is shown in Table 2. Within this procedure we also derived distances and distance errors for the stars, which we describe at the end of this section.
Table 2. Median error on stellar parameters, representing Monte Carlo propagated systematic uncertainties for
$T_{\text{eff}}$
,
$\left[\text{Fe/H}\right]$
,
$\log g$
and
$\left[\text{C/Fe}\right]$
. Errors on radial velocity were derived directly from RVSpecFit. Errors are split into stars with high S/N (
$\geq 25$
) and those with low S/N (
$\lt 25$
). For
$\left[\text{C/Fe}\right]$
, the mean errors exclude stars with no detections.

3.2.1. Stellar parameter errors
Uncertainty in effective temperature was determined by running colte with randomised Gaia magnitudes (using errors provided by Gaia DR3) and reddening values (taking errors to be 10%) to produce new
$T_{\text{eff}}$
estimates from the BP
$-$
RP colour. Nominal values of
$\left[\text{Fe/H}\right]$
and
$\log g$
derived from Section 3.1 were used for this.
After retrieving the randomised
$T_{\text{eff}}$
’s, we derive new bolometric corrections and
$\log g$
values using the methods outlined in Section 3.1.2. Given that the parallax was also randomised, several stars had their
$\log g$
derived either using the randomised parallax directly (
$\pi \geq 3\sigma$
) or using the randomised
$T_{\text{eff}}$
with initial
$\left[\text{Fe/H}\right]$
to interpolate over isochrones (
$\pi \lt 3\sigma$
).
In order to rapidly estimate how an uncertainty in
$T_{\text{eff}}$
will affect the derived
$\left[\text{Fe/H}\right]$
, we assumed a linear dependence between these two quantities. We estimated the slope using a test spectrum fit for each spectrum when perturbing
$T_{\text{eff}}$
by
$+100$
K. From tests on both the Wide-Field Spectrograph (WiFeS) (Dopita et al. Reference Dopita2010) data and on the globular clusters (see Section 3.3), we imposed a floor of the precision of 0.2 dex in metallicity. We found that RVSpecFit metallicity errors were unreliable due to residual sky emission, sometimes resulting in extremely large uncertainties that were not representative of the actual quality of the fit.
Uncertainties in the radial velocities were determined directly within RVSpecFit via the Nelder-Mead fit to the spectra. At the resolution of our spectra, the relatively small stellar parameter uncertainties on the radial velocity determination is smaller than the measurement error itself, and we therefore neglect it.
Uncertainties in
$\left[\text{C/Fe}\right]$
were found by inputting randomised
$T_{\text{eff}}$
,
$\log g$
and
$\left[\text{Fe/H}\right]$
into our explicit CH G-band fitting procedure described in Section 3.1.4 to get randomised
$\left[\text{C/Fe}\right]$
values.
As for determining
$2\sigma$
detection limits for
$\left[\text{C/Fe}\right]$
, we explicitly integrated the likelihood as a function of the
$\left[\text{C/Fe}\right]$
abundance. We used synthetic spectra covering
$-5.0 \le \left[\text{C/Fe}\right] \le -\left[\text{Fe/H}\right] + 0.5$
in steps of 0.01 dex, calculating the likelihood for each. We also calculated the likelihoods when comparing to the synthetic spectrum with
$\left[\text{C/Fe}\right] = -5.0$
, under the assumption that this had negligible CH absorption, and used this case as our null hypothesis. We adopted a
$2 \sigma$
detection limit at the abundance where our synthetic spectra differ at the
$2 \sigma$
level from the spectrum with
$\left[\text{C/Fe}\right] = -5.0$
. In cases where the best-fitting spectrum had an abundance below this
$2 \sigma$
detection limit, we report this abundance value as the upper limit to the fit. The fits of the CH G-band for our 15 EMPs are shown in Appendix 2.
3.2.2. Distance
Distances and their errors were determined by using the randomised parallax directly in the distance modulus if the parallax quality was good (
$\pi \geq 3 \sigma$
), or through isochrones with randomised
$\left[\text{Fe/H}\right]$
, BP
$-$
RP and
$E(B-V)$
values if the parallax quality was poor (
$\pi \lt 3 \sigma$
). The median was then taken as the final value, with the uncertainties being the standard deviation of the distribution.
For stars with
$\pi \geq 3 \sigma$
, we show in Fig. 4 that the two techniques correspond well with each other. The scatter decreases for stars with
$\pi \geq 5 \sigma$
, showing that our isochrone distances are reliable.
3.3. Methodology validation
The tools we developed in Sections 3.1 and 3.2 were tested with internal consistency checks: first involving the HR diagram with isochrones, and then on several independent datasets. These datasets involve comparing stellar parameters with data from WiFeS (Section 3.4.1), with stars in globular clusters observed with the same instrumental setup (Section 3.4.2), and with those from other Gaia XP surveys (Section 2.1). No mention of our stars is made in any high-resolution literature.
3.4. HR diagram
The sample in this work covers a broad range of evolutionary stages. We show this through the Hertzsprung-Russell (HR) diagram in Fig. 5 segmented by parallax (
$\pi$
) quality, where it is evident that both turn-off and red giant branch (RGB) stars are included in the survey. The majority of the stars on the RGB have poor parallaxes (
$\pi \leq 2\sigma$
and
$2\sigma \lt \pi \leq 3\sigma$
), while those at the turn-off generally have good parallaxes (
$\pi \gt 5\sigma$
). BP
$-$
RP colours and Gaia DR3 magnitudes were corrected for reddening using the Schlegel dustmap as described in Section 3.1.1, and MG was calculated using either parallax measurements or isochrone fits as described in Section 3.2.2.
Table 3. The initial and shifted stellar parameters for three EMPs in Fig. 5. The first two stars (ra_1639-2632-s419 and ra_1659-2154-s261) were shifted from
$(\mathrm{BP}_0 - \mathrm{RP}_0, \mathrm{M}_{\mathrm{G}}) \approx (0.95, 3.60)$
to
$(\mathrm{BP}_0 - \mathrm{RP}_0, \mathrm{M}_{\mathrm{G}}) = (0.86, 2.43)$
, whilst the third star (ra_1604-2712-s188) was shifted from
$(\mathrm{BP}_0 - \mathrm{RP}_0, \mathrm{M}_{\mathrm{G}}) \approx (0.66, 4.50)$
to
$(\mathrm{BP}_0 - \mathrm{RP}_0, \mathrm{M}_{\mathrm{G}}) = (0.62, 4.41)$
.


Figure 4. Distance comparison between those derived from inverting parallax (Dplx) and from isochrones (Diso), colour-coded by
$E(B-V)$
. Stars with parallax quality
$\pi \geq 3 \sigma$
is shown on the left panel. The right panel shows the comparison for stars with
$\pi \geq 5 \sigma$
.

Figure 5. HR diagram of the full sample using BP0 - RP0 (corrected for reddening, detailed in Section 3.1.1) and MG (using distances described in Section 3.2.2), sub-sectioned according to parallax quality. Stars are coloured by their Kernel Density Estimator density, where lighter colours indicate higher density. Our 15 vetted EMP stars (see Section 4 for description on the vetting process) are shown by the blue star symbols. Dartmouth isochrones transformed to the Gaia DR2 system are overplotted in blue from
$\left[\text{Fe/H}\right] =-1.00$
dex to
$\left[\text{Fe/H}\right] =-2.50$
dex in 0.50 dex step-size, assuming an age of 12 Gyr. Left panel: HR diagram for the full sample. Upper-middle panel: HR diagram for stars with parallax quality
$\leq 2\sigma$
. Lower-middle panel: HR diagram for stars with parallax quality
$3\sigma \lt \pi \leq 5\sigma$
. Upper-right panel: HR diagram for stars with parallax quality
$2\sigma \lt \pi \leq 3\sigma$
. Lower-right panel: HR diagram for stars with parallax quality
$\geq 5\sigma$
.
The location of three EMP stars on the HR diagram is inconsistent with their derived metallicities (see Table 5). The likely reason for this mismatch is the adoption of erroneous reddening values for the stars. We found that if we allow the reddening to be a free parameter, we are then able to locate the stars in the HR diagram at locations consistent with their metallicities using higher reddening values. The higher reddening implies hotter effective temperatures and slightly higher metallicities. With the larger reddenings, the first two stars increase their metallicities by
$\sim 0.4$
, with the third one increasing by
$\sim 0.02$
dex. The initial and shifted parameters for the three stars, together with the initial and final reddening values are shown in Table 3.
We note, however, that we do not use reddening as a free parameter for any other stars in the sample because of the difficulties in constraining it properly from spectral fits alone. Indeed the adjustments for these three stars highlights the difficulties in constraining reddening, and this uncertainty is kept into account in our error budget.
3.4.1. WiFeS
WiFeS is an Integral Field Unit (IFU) instrument on the 2.3 m telescope at SSO. We observed 20 relatively bright stars from our sample with WiFeS using the B3000 grating, which covers 3 200–5 900 Å at
$R \approx 3\,000$
. Stellar parameters were derived for these stars from the WiFeS spectra using the fitter code (see Norris et al. Reference Norris2013, with updates as described by Da Costa et al. Reference Da Costa2023). The accurate spectrophotometric calibration of the WiFeS spectra make them an ideal case to validate our stellar parameters. Fig. 6 shows the comparison for
$T_{\text{eff}}$
,
$\log g$
and
$\left[\text{Fe/H}\right]$
together with their differences. The WiFeS stars are listed in Table 4 with both sets of stellar parameters.
Table 4. First three stars observed with the WiFeS spectrograph ordered by RA. Source ID column refers to the Gaia DR3 source ID. If the star has a ‘good’ parallax of
$\pi \geq 3 \sigma$
, then it is given True. Otherwise it is given False. Stellar parameters from the WiFeS spectra are given as well as those from our analysis. Full table available on the electronic version of the paper.


Figure 6.
Upper panels: Direct comparison of
$T_{\text{eff}}$
,
$\log g$
and
$\left[\text{Fe/H}\right]$
values for 21 sample stars derived independently from the WiFeS and AAOmega spectra using different spectral analysis codes. The standard deviation and mean difference values are written into each subplot. The dashed line is the 1:1 relation. Lower panels: The difference between the parameters derived from the AAOmega and WiFeS spectra; the dashed line is for zero difference.
For
$T_{\text{eff}}$
the agreement between the WiFeS values and our spectral analysis is good (
$\sigma_w = 125$
K,
$\Delta T_{\mathrm{eff}, w} = 86$
K),Footnote
k
particularly for
$T_{\text{eff}} \lt 5\,500$
K, where the scatter is low. At higher temperatures, the scatter increases with increasing temperatures with the WiFeS data overestimating
$T_{\text{eff}}$
compared to our values. We note that Da Costa et al. (Reference Da Costa2023) also saw this when they compared WiFeS
$T_{\text{eff}}$
’s with GALAH DR3
$T_{\text{eff}}$
’s (see their fig. 2).
For
$\log g$
, the comparison is again good with
$\sigma_w = 0.34$
and
$\Delta \log g_w = -0.16$
dex. The WiFeS
$\log g$
values are quantised at the 0.125 dex level (so precision is at best half of this), but Da Costa et al. (Reference Da Costa2019) showed that the true estimate for the WiFeS
$\log g$
errors is likely around 0.3–0.35 dex. The
$\log g$
random errors for our sample are
$\sim$
0.3 dex also, so we would expect a scatter of
$\sim0.42$
dex in the differences. The
$\sigma_w$
value of 0.34 is close to this value from which we conclude that it is likely that our analysis yields similar precisions for
$\log g$
as the WiFeS analysis. It is worth noting that
$\log g$
values from WiFeS are independent from parallax (derived directly from spectrophotometric fits), so the lack of any systematic differences (
$\Delta \log g_w = -0.16$
) makes our approach reliable and consistent.
The
$\left[\text{Fe/H}\right]$
comparison between WiFeS and us (
$\sigma = 0.29$
dex,
$\Delta \left[\text{Fe/H}\right]_w = 0.07$
dex) is also good. In particular, the agreement is satisfactory across the entire metallicity range with no obvious systematic trend at the lowest metallicities.
3.4.2. Globular clusters
In Da Costa (Reference Da Costa2016), a study of RGB stars in Galactic globular and open clusters was performed using observational data collected using the same instrumental setup as this work (see Section 2.2). For consistency, we reduced the raw 2dF data from the Da Costa (Reference Da Costa2016) observations ourselves using the same version of 2dfdr (version 8.03b) as for our main sample. These globular cluster spectra therefore provide an excellent dataset on which to test our analysis code.

Figure 7. Stellar parameter results for globular clusters NGC 104 (47), NGC 362 (48), NGC 288 (30), NGC 1904 (17) and NGC 7099 (14). Each subplot has an accompanying metallicity distribution function (MDF) using the kernel density estimator of the cluster’s metallicity. The horizontal dashed lines represent the reference values provided by Carretta et al. (Reference Carretta, Bragaglia, Gratton, D’Orazi and Lucatello2009). The errors in this are shown by the shaded region. The weighted standard deviation (
$\sigma_W$
) and mean difference (
$\Delta \left[\text{Fe/H}\right]_w$
) for each cluster against the reference value is printed on the subplots.
The globular cluster results are shown in Fig. 7, with the addition of NGC 362, which was observed as part of our program for validation purposes. The Carretta et al. (Reference Carretta, Bragaglia, Gratton, D’Orazi and Lucatello2009) metallicities for the clusters are shown by the black dashed line.
Overall, we find that the metallicity bias and scatter for each cluster is less than 0.25 dex, which implies that our analysis is acceptably accurate. Crucially, we do not find significant biases for the warmest or coolest stars observed.
The results for all five of our sample clusters (NGC 104, NGC 362, NGC 288, NGC 1904 and NGC 7099) are in good agreement with literature. For any individual cluster star, the uncertainty we get from our analysis is a good representation of the true uncertainty in our main sample.
3.4.3. Gaia XP surveys
Surveys targeting metal-poor stars like Pristine or SkyMapper have proven to provide reliable estimates of metallicities thanks to their use of metallicity-sensitive photometric colours (e.g. Starkenburg et al. Reference Starkenburg2017; Da Costa et al. Reference Da Costa2019; Casagrande et al. Reference Casagrande2019; Chiti et al. Reference Chiti2021a). Newer surveys utilising Gaia XP data take advantage of more complex tools like forward modelling and machine learning to derive stellar parameters, which we explore in more detail in Appendix 1. Here we illustrate a comparison of our data to that of Martin et al. (Reference Martin2023), who used Gaia XP spectra to calculate synthetic Pristine photometry to infer metallicity estimates.

Figure 8. Metallicity comparison between our data and that of Martin et al. (Reference Martin2023) for 2 125 stars in common, 5 of which are identified in our sample as EMPs. Standard deviation and mean difference are shown on the plot. Colour represents density, as in Fig. 5.
A cross-match of their catalogue with our data revealed 2 125 stars in common, including 5 of our EMP stars. One of these is also an EMP star in Martin et al. (Reference Martin2023). Fig. 8 shows the metallicity estimates comparison. Overall, there is a good correlation between the two estimates with a low mean difference (
$\mu = 0.03$
) but a substantial standard deviation (
$\sigma = 0.53$
). Given the uncertainties in both metallicity determinations, the agreement is satisfactory. To understand how the Martin et al. (Reference Martin2023) metallicities inferred from the photometric indices derived Gaia XP spectra change with magnitude, we selected first stars with
$19 \leq S/N \leq 21$
from our spectra. We then separated these stars into ‘bright’ (
$m_G \leq 16$
; 24 stars) and ‘faint’ (
$m_G \gt 16$
; 59 stars) groups. We then saw that the scatter in the metallicity differences for the bright group (
$\sigma = 0.50$
) is lower than for the faint group (
$\sigma = 0.64$
). This implies that the metallicities inferred from Gaia XP colours downgrade with fainter magnitudes, whilst our metallicities remains consistent at fixed S/N. Our technique is apparently more robust for fainter stars than the XP spectra, driven by its degrading S/N quality at lower magnitudes.
Table 5. Stellar parameters for our 15 EMP stars, ordered by RA. The Gaia DR3 source ID follows our star IDs. The
$\left[\text{C/Fe}\right]$
values presented are our non-corrected values, with the evolutionary corrections,
$\Delta \left[\text{C/Fe}\right]$
, shown in the final column. The correction only applies to RGB stars (
$\log g \lt 3.0$
). Dwarfs automatically have a value of zero. Kinematic assignments are given (see Section 4.2), with HAL being halo, RET being retrograde disk, PRO being prograde disk, and GSE being Gaia Sausage-Enceladus.


Figure 9. Stellar parameter results for our sample, showing effective temperature (left), heliocentric radial velocity in km s
$^{-1}$
(centre) and [C/Fe] (right) results against
$\left[\text{Fe/H}\right]$
. In the right panel downward arrows indicate upper limits in cases where the CH G-band could not be detected. EMPs with
$\left[\text{Fe/H}\right]{} \lt -3$
have been individually vetted; the vertical dashed line indicates this limit for clarity. Note that the stars on the diagonal line seen in upper right of the
$\left[\text{C/Fe}\right]$
panel represents the synthetic grid edges.
4. Chemodynamics of the metal poor Galaxy
4.1. Chemistry
The stellar parameters for the stars in our sample cover a wide parameter space for
$T_{\text{eff}}$
,
$\log g$
,
$\left[\text{Fe/H}\right]$
and
$\left[\text{C/Fe}\right]$
. The derived values for the full sample are shown in Fig. 9 for effective temperature, radial velocity and [C/Fe], as a function of
$\left[\text{Fe/H}\right]$
. From the sample, we have identified 15 new candidate EMPs, vetted manually by visual inspection of the spectra and of their synthetic fits. Stars with
$\left[\text{Fe/H}\right] \lt -3.0$
dex that did not pass the vetting procedure were removed from the dataset due to poor fits. The lowest metallicity in the sample belongs to star ra_1633-2814_s284, with
$\left[\text{Fe/H}\right] = -4.0 \pm 0.20$
. Stellar parameters for the vetted EMP sample are given in Table 5. These stars are denoted with blue star symbols across the Figures in this paper; a vertical grey-dashed line drawn at
$\left[\text{Fe/H}\right] =-3.0$
dex in the panels of Fig. 9 shows the separation of the vetted EMPs from the rest of the dataset.

Figure 10. The [C/Fe] distribution for stars with
$\left[\text{Fe/H}\right]{} \leq -1.5$
based on their
$\left[\text{C/Fe}\right]$
values: carbon-depleted (
$\left[\text{C/Fe}\right] \lt 0$
) in black circles, carbon-normal (
$0 \leq \left[\text{C/Fe}\right] \lt 0.7$
) in red squares, and carbon-enhanced (
$\left[\text{C/Fe}\right] \geq 0.7$
) in blue triangles. Only stars with detectable CH G-band are shown; those with upper limits are ignored. Bins are 0.1 dex in size. The last bin (
$\lt -2.9$
) includes all stars below this metallicity. Shaded regions are errors on each frequency assuming Poisson statistics.
The left panel of Fig. 9 shows that in general, hotter stars are more metal-rich than their cooler counterparts. This trend is due to our survey selecting primarily RGB stars, with evolutionary effects prohibiting hot metal-poor stars being present. In general, hotter stars tend to be more massive and younger, therefore having higher metallicities (Nordström et al. Reference Nordström2004; Holmberg, Nordström, & Andersen Reference Holmberg, Nordström and Andersen2007; Holmberg et al. Reference Holmberg, Nordström and Andersen2009; Casagrande et al. Reference Casagrande2011). We see the largest concentration of stars at
$T_{\text{eff}} \sim 5\,100$
K, before the number of stars decreases with lower metallicities when keeping this
$T_{\text{eff}}$
fixed. A ‘typical’ star in our sample has
$T_{\text{eff}} \sim 5\,200$
K and
$\left[\text{Fe/H}\right] \sim -1.5$
(according to the highest concentrations in the plot). Our EMP candidates cover a sizeable temperature range from
$T_{\text{eff}} \approx 4\,600$
K to
$T_{\text{eff}} \approx 6\,200$
K with metallicities as low as
$\left[\text{Fe/H}\right]$
$\approx -4.0$
dex, emphasising the diversity of the population that our survey is sampling.
The heliocentric radial velocity results in the middle panel of Fig. 9 are broadly consistent with the expectations for a sample that moves from thick-disk dominance at higher metallicities with
$\sigma(rv) \sim 45\,$
km s
$^{-1}$
(e.g. Zwitter et al. Reference Zwitter2008; Kordopatis et al. Reference Kordopatis, Recio-Blanco, Schultheis and Hill2020), to an increasing halo dominance at lower metallicities with
$\sigma(rv) \sim 150$
km s
$^{-1}$
(e.g. Carollo et al. Reference Carollo2007; Carollo et al. Reference Carollo2010). At the lowest metallicities (
$\left[\text{Fe/H}\right] \lt -2.5$
) there are fewer stars but the dispersion apparently remains large.
Our [C/Fe] results, shown in the right panel of Fig. 9, highlights the significant variance we have in our carbon abundances. The distribution of
$\left[\text{C/Fe}\right]$
abundances reflects the range of evolutionary stages present in our sample, together with the uncertainties in the determinations, as well as any population dependent variations in intrinsic
$\left[\text{C/Fe}\right]$
values.

Figure 11. The [C/Fe] values for the 15 vetted EMP stars. Evolutionary mixing corrections have been applied to the carbon abundances for the RGB stars. Each star is marked either as a dwarf (log
$g \geq 3.0$
dex) or as a red giant (log
$g \lt 3.0$
dex). The plot is split into three regions based on [C/Fe] values. Uncertainties for both [C/Fe] and
$\left[\text{Fe/H}\right]$
are present. Downward arrows indicate
$2 \sigma$
upper limits on [C/Fe] abundances.
To further investigate the
$\left[\text{C/Fe}\right]$
results, the sample was split into regions based on their
$\left[\text{C/Fe}\right]$
values: carbon-depleted (
$\left[\text{C/Fe}\right] \lt 0$
), carbon-normal (
$0 \leq \left[\text{C/Fe}\right] \lt 0.7$
) and carbon-enhanced (
$\left[\text{C/Fe}\right] \geq 0.7$
). For the stars with
$\left[\text{Fe/H}\right] \leq -1.5$
, we show in Fig. 10 the distributions of measured
$\left[\text{C/Fe}\right]$
values as a function of [Fe/H] for each region. Stars with upper limits on
$\left[\text{C/Fe}\right]$
were excluded. In total, for those stars with
$\left[\text{Fe/H}\right] \leq -1.5$
, 2 672 stars are carbon-depleted, 2 735 are carbon-normal, and 388 are carbon-enhanced. We see that the portion of carbon-enhanced stars increases from 1.1% to 16.0% with decreasing metallicity. On the other-hand, the portion of carbon-depleted stars decreases from 71.3% to 13.0% when moving from
$\left[\text{Fe/H}\right] = -1.5$
to
$-3.0$
. This agrees with previous work that show similar trends (e.g. Placco et al. Reference Placco, Frebel, Beers and Stancliffe2014; Singh et al. Reference Singh, Hansen, Byrgesen, Reichert and Reggiani2020; Arentsen et al. Reference Arentsen2022). There are no obvious differences when separating the
$\left[\text{C/Fe}\right]$
results into our different kinematic groupings (defined in Section 4.2.1).
The [C/Fe] results for our EMP stars are explored further in Fig. 11. Defining dwarfs as
$\log g \geq 3.0$
and giants as
$\log g \lt 3.0$
, we have two dwarfs and three giants with only upper limits for the
$\left[\text{C/Fe}\right]$
values; these are represented by the downward arrows in the figure. Excluding these five non-detections, and after applying evolutionary corrections to the abundance ratios for the RGB stars,Footnote
l
we see that eight EMPs have normal carbon values (
$0 \leq \text{[C/Fe]} \lt 0.7$
), two are carbon-enhanced (
$\text{[C/Fe]} \geq 0.7$
), and none are carbon-depleted (
$\text{[C/Fe]} \leq 0$
). For the carbon-enhanced stars, one star is a giant with
$\left[\text{C/Fe}\right] = 0.79 \pm 0.03$
(the evolutionary correction is negligible for this star), and the other is a carbon-enhanced dwarf with
$\left[\text{C/Fe}\right] = 1.3 \pm 0.1$
. For the dwarf, given its low
$\left[\text{Fe/H}\right]$
abundance of
$-3.6 \pm 0.1$
, it may have obtained its abundances from a Pop. III progenitor that significantly enriched the surrounding gas with carbon. This is the explanation offered by, for example, Christlieb et al. (Reference Christlieb2002) for star HE0107–25240, and by Nordlander et al. (Reference Nordlander2019) for star SMSS J160540.18–144323.1, both of which are extremely carbon-rich ultra-metal-poor stars. In both cases, the enrichment in carbon may have arisen from a Pop. III ‘fallback’ supernova (e.g. Umeda & Nomoto Reference Umeda and Nomoto2002; Nomoto, Kobayashi, & Tominaga Reference Nomoto, Kobayashi and Tominaga2013). Both carbon-enhanced stars could also have received their carbon enhancement from mass transfer from a (now white dwarf) companion during its AGB star phase (e.g. Whitehouse et al. Reference Whitehouse, Farihi, Green, Wilson and Subasavage2018). Tests to investigate which of the two carbon-enhancement mechanisms is likely include measuring abundances of the slow neutron capture process elements like Ba, enhanced in the mass transfer scenario, but not in the Pop. III fallback scenario. The CH G-band fits to all vetted EMP stars are shown in Figs. B1 and B2
4.2. Orbital dynamics
To determine the dynamical properties of our sample, we used the Python code galpy
Footnote
m
that adopts the potentials in Bovy (Reference Bovy2015), a simple and accurate model for the gravitational potential of the Milky Way. We transform celestial coordinates to galactocentric coordinates by assuming that the Sun is 20.8 pc above the plane of the MW (Bennett & Bovy Reference Bennett and Bovy2019), and has a peculiar velocity of
$(U, V, W) = (11.1, 12.24, 7.25)$
km s
$^{-1}$
(Schönrich, Binney, & Dehnen Reference Schönrich, Binney and Dehnen2010). The Solar circular velocity was taken as 238 km s
$^{-1}$
at the Solar radius of 8.2 kpc (Bland-Hawthorn & Gerhard Reference Bland-Hawthorn and Gerhard2016). Orbits were then integrated using a 4th order symplectic integrator in galpy, given distances derived from Section 3.2.2, stellar coordinates and proper motions from Gaia DR3, and heliocentric radial velocities from Section 3.1.3. Orbital integration is computed over
$\pm 1$
Gyr using randomised distances, radial velocities and proper motions. The resulting distributions are then used to get uncertainties in the kinematic parameters.
4.2.1. Orbital selection
Based on the resulting orbital parameters, we classified each star in the full sample into prograde, retrograde, halo and Gaia-Sausage-Enceladus (GSE) (Helmi et al. 2018; Belokurov et al. Reference Belokurov, Erkal, Evans, Koposov and Deason2018) orbits using the action momentum plot shown in Fig. 12. This uses a combination of the azimuthal (
$J_\phi$
), vertical (
$J_z$
) and radial (
$J_r$
) components of the action vector (normalised by the total angular momentum,
$J_{\text{TOT}}$
). The equivalent angular moment vs energy plot is shown in Fig. 13, where we show that our current selection holds in Lz vs E space.

Figure 12. Action momentum space for the entire sample (upper left), then separated into metallicity bins below
$\left[\text{Fe/H}\right] = -1.5$
. The x-axis,
$J_\phi/J_{\text{TOT}}$
, is the azimuthal component of the action vector. The y-axis,
$(J_z - J_r)/J_{\text{TOT}}$
, is the difference between the star’s vertical action and its radial action. Both axes are normalised by the total angular momentum
$J_{\text{TOT}}$
. Stars with
$J_\phi/J_{\text{TOT}} \leq -0.75$
and
$z_{\mathrm{max}} \lt 3$
kpc are assigned as retrograde disk (cyan), whilst those with
$J_\phi/J_{\text{TOT}} \geq 0.75$
and
$z_{\mathrm{max}} \lt 3$
kpc are assigned as prograde disk (blue). Stars with
$-0.25 \leq J_\phi/J_{\text{TOT}} \leq 0.75$
and
$-0.75 \leq (J_{z} - J_{r})/J_{\text{TOT}} \leq 0.25$
are assigned to GSE orbits (purple). Everything else is assigned as halo (orange). ‘Disk’ stars that meet the
$J_\phi/J_{\text{TOT}}$
requirement but not the
$z_{\mathrm{max}} \lt 3$
requirement are assigned as halo. Note that in the final panel, two EMPs overlap with nearly identical values of
$J_\phi/J_{\text{TOT}} \sim 0.0$
and
$(J_{z} - J_{r})/J_{\text{TOT}} \sim -1.0$
.

Figure 13. The angular momentum against energy plot for our sample. Same layout as Fig. 12.
As was done in Sestito et al. (Reference Sestito2021), the plot is segmented into four different kinematic classifications based on values of
$J_\phi/J_{\text{TOT}}$
. Stars assigned with prograde disk orbits (
$\mathrm{N}_{\mathrm{pro}}/\mathrm{N}_{\mathrm{TOT}} = 62.9\%$
) have
$J_\phi/J_{\text{TOT}} \geq 0.75$
and
$z_{\mathrm{max}} \lt 3$
kpc.
The majority of these stars belong to the thick disk, with only 8.0% of the total prograde population have
$z_{\mathrm{max}} \lt 0.3$
kpc representative of the thin disk. Those assigned retrograde disk orbits (
$\mathrm{N}_{\mathrm{ret}}/\mathrm{N}_{\mathrm{TOT}} = 1.4\%$
) have
$J_\phi/J_{\text{TOT}} \leq -0.75$
and
$z_{\mathrm{max}} \lt 3$
kpc, with one star having
$z_{\mathrm{max}} \lt 0.3$
kpc.
The relative weight of retrograde against prograde disk populations is
$\mathrm{N}_{\mathrm{ret}}/(\mathrm{N}_{\mathrm{ret}}+\mathrm{N}_{\mathrm{pro}}) = 2.2\%$
across all metallicities in our sample. Stars with
$-0.25 \leq J_\phi/J_{\text{TOT}} \leq 0.75$
and
$-0.75 \leq (J_{z} - J_{r})/J_{\text{TOT}} \leq 0.25$
, regardless of their
$z_{\mathrm{max}}$
, are assigned to the GSE (
$\mathrm{N}_{\mathrm{gse}}/\mathrm{N}_{\mathrm{TOT}} = 3.9\%$
), with the remainder classified as halo stars (
$\mathrm{N}_{\mathrm{hal}}/\mathrm{N}_{\mathrm{TOT}} = 32.0\%$
).
The relative weight of GSE against halo populations is
$\mathrm{N}_{\mathrm{gse}}/(\mathrm{N}_{\mathrm{gse}}+\mathrm{N}_{\mathrm{hal}}) = 11.4\%$
. A large portion of the sample is located within the prograde disk, followed by halo stars passing through the disk. Kinematic classifications for our 15 EMPs are shown at the end of Table 5.

Figure 14. Orbital population distributions for stars with
$\left[\text{Fe/H}\right]{} \leq -1.5$
. Bins are 0.1 dex in size. Each bin is normalised by the total number of stars in that bin. Prograde disk is shown in blue circles, retrograde disk in cyan squares, halo in orange triangles, and GSE in purple diamonds. Layout and structure of plot is identical to Fig. 10.

Figure 15. Variation of the azimuthal action with respect to metallicity for our whole sample (left panel), the prograde disk (upper-middle panel), the retrograde disk (lower-middle panel), halo (upper-right panel) and GSE (lower-right panel) populations. The blue points over-plotted in each panel is the median and standard median error of each
$\left[\text{Fe/H}\right]$
bin (
$0.05$
in size) from
$-2.2$
to
$-0.1$
. For the halo and GSE panels, this was only done up to
$-1.2$
due to the lack of metal-rich stars. Given the lack of any stars in the retrograde sample, no medians were calculated. The grey points in the smaller panels is the whole data sample under-plotted.
The relative portions of these population groups change over decreasing metallicities, with clear trends present as shown in Fig. 14. At the
$\left[\text{Fe/H}\right]{}=-1.5$
bin, the prograde disk is dominant with 48.6% of the total population at this metallicity, before steadily decreasing to 20.7% in the lowest metallicity bin. Over the same metallicity range, the retrograde disk is initially at 1.9% in the higher metallicity bins, then peaks at 4.2% in the
$\left[\text{Fe/H}\right]{} = -2.1$
bin, but given the errors, the fraction remains relatively constant throughout. Taking the fractions of retrograde against prograde stars from our criteria above, in the highest metallicity bin we have
$\mathrm{N}_{\mathrm{ret}}/(\mathrm{N}_{\mathrm{ret}} + \mathrm{N}_{\mathrm{pro}}) = 3.8\%$
, increasing up to 13.1% in the
$\left[\text{Fe/H}\right] = -2.1$
bin, then remaining relatively constant at 10.3% for metallicity bins below this. The portion of retrograde disk stars against halo stars is relatively constant for all metallicities at
$\mathrm{N}_{\mathrm{ret}}/\mathrm{N}_{\mathrm{hal}} = 4.6\%$
, indicating similar formation origins.
The halo population is at its lowest in the
$\left[\text{Fe/H}\right] = -1.5$
bin at 45.3% of the total in this metallicity bin, then becoming increasingly prominent in lower metallicity bins, peaking at 65.5% in the
$\left[\text{Fe/H}\right] \lt -2.9$
bin. The GSE stars remain relatively consistent from
$\left[\text{Fe/H}\right] = -1.5$
to
$-2.3$
at 6.9%, then increasing to reach peak percentage at 14.8% in the
$\left[\text{Fe/H}\right] = -2.8$
bin. Again, the lack of stars in these low metallicity bins makes it hard to draw any definitive conclusions. The portion of GSE against halo stars is constant at
$\mathrm{N}_{\mathrm{gse}}/(\mathrm{N}_{\mathrm{gse}} + \mathrm{N}_{\mathrm{hal}}) = 11.7\%$
from
$\left[\text{Fe/H}\right] = -1.5$
to
$-2.3$
, then become more variable below this, peaking to 20% in the
$\left[\text{Fe/H}\right] = -2.8$
bin. Note that these ratios are likely dependent upon what selection cuts are used, so results may vary.
We investigate the evolution of the azimuthal action,
$J_{\phi}$
, in Fig. 15. By binning the data into
$\left[\text{Fe/H}\right]$
bins from
$-2.2$
to
$-0.1$
with
$0.05$
bin size, then taking the median and standard median error of each (blue points over-plotted in the figure), we see three distinct features in the sample: one at
$\left[\text{Fe/H}\right] \lt -1.5$
, with
$J_\phi$
consistent across metallicities from
$J_\phi = 300 \pm 80$
km s
$^{-1}$
kpc at
$\left[\text{Fe/H}\right] = -2.2$
, to
$440 \pm 70$
km s
$^{-1}$
kpc at
$\left[\text{Fe/H}\right] = -1.6$
. From
$\left[\text{Fe/H}\right] = -1.5$
to
$-1.1$
,
$J_\phi$
increases from
$380 \pm 50$
km s
$^{-1}$
kpc to
$1\,320 \pm 90$
km s
$^{-1}$
, before slowly increasing at the higher metallicities from
$J_\phi = 1\,300 \pm 100$
km s
$^{-1}$
at
$\left[\text{Fe/H}\right] = -1.0$
, to
$1\,670 \pm 70$
km s
$^{-1}$
at
$\left[\text{Fe/H}\right] = -0.1$
.
Looking at the specific kinematic groupings, the halo group has small
$J_\phi$
variation, from
$200 \pm 90$
km s
$^{-1}$
at
$\left[\text{Fe/H}\right] = -2.2$
, to
$300 \pm 200$
km s
$^{-1}$
at
$\left[\text{Fe/H}\right] = -1.2$
. Over the same range, the GSE also has small variation, from
$J_\phi = 0 \pm 20$
km s
$^{-1}$
to
$55 \pm 80$
km s
$^{-1}$
.
The prograde disk shows the most variation, particularly from
$\left[\text{Fe/H}\right] = -1.5$
to
$-1.1$
, where
$J_\phi$
increases from
$730 \pm 70$
km s
$^{-1}$
to
$1\,380 \pm 80$
km s
$^{-1}$
. At the higher metallicities, the
$J_\phi$
values steadily increase to
$1\,680 \pm 60$
km s
$^{-1}$
at
$\left[\text{Fe/H}\right] = -0.1$
. Given the lack of stars in the retrograde disk, we can not say anything about its behaviour.
The first feature is predominately seen in the halo group, with the second and third features shown in the prograde disk sample. These features suggests we are seeing the ‘spin-up’ of the disk, where it transitions from being chaotic to the ordered disk we see today. This transitional phase, shown by the significant increase in
$J_\phi$
with metallicity between
$-1.5$
and
$-1.1$
, is inline with the Aurora hypothesis suggested by Belokurov & Kravtsov (Reference Belokurov and Kravtsov2022), and the results shown by Viswanathan et al. (Reference Viswanathan, Horta, Price-Whelan and Starkenburg2024a). The details of this will be explored more in our discussion section (Section 5).
4.3. Metallicity Distribution Function
The metallicity distribution function (MDF) for our kinematically selected sub-samples is shown in Fig. 16 using stars with
$\left[\text{Fe/H}\right] = -1.5$
to
$-4.0$
. Stars with
$\left[\text{Fe/H}\right] \gt -1.5$
were excluded due to our original selection on Zhang et al. (Reference Zhang, Green and Rix2023) to only include stars with
$\left[\text{Fe/H}\right]_{\mathrm{Zhang}} \leq -1.5$
(see Section 2.1 for more detail). The population designations used assignments performed in Fig. 12, and the bin size is set to be 0.1 dex. This section will discuss both the apparent peak and the apparent slope in the MDFs for each kinematic group in detail, with comparisons made to literature if possible. All mention of ‘disk’ stars (both prograde and retrograde) are those meeting both the
$J_{\phi}/J_{\mathrm{TOT}}$
and
$z_{\mathrm{max}}$
requirements. Those just meeting the
$J_{\phi}/J_{\mathrm{TOT}}$
requirement are considered as halo.

Figure 16. The MDFs of our sample stars in the halo, prograde disk, retrograde disk and GSE classifications. The bin size is 0.1 dex. The slopes of the distributions from
$\left[\text{Fe/H}\right] = -2.0$
to
$-3.0$
are written within the legend. KDE fits are shown by the dot-dot, dash-dash, dot-dash and dot-dot-dash black lines for halo, prograde disk, GSE and retrograde disk populations, respectively. The faded distributions below
$\left[\text{Fe/H}\right] \leq -3.0$
are the stars that did not passing the vetting process, with solid histograms the 15 EMPs.
The first inference from Fig. 16 is that the halo MDF levels off and starts decreasing above
$\left[\text{Fe/H}\right] \sim -2.0$
, consistent with our current understanding of its MDF peaking at
$\left[\text{Fe/H}\right] \approx -1.7$
(e.g. Youakim et al. Reference Youakim2020).
Although not shown on Fig. 16, the prograde disk ‘peaks’ at
$\left[\text{Fe/H}\right] = -1.07 \pm 0.05$
, but given we deliberately selected metal-poor stars from Gaia XP with
$\left[\text{Fe/H}\right] \leq -1.5$
, we are systematically excluding the metal-rich stars, so this value is likely to be significantly biased. Regarding the retrograde disk population, we noticed that the MDF does not keep increasing above
$-2.0$
, an interesting result which suggests that the retrograde disk population is predominantly metal-poor. Thus, that the retrograde population may primarily be comprised of stars with
$\left[\text{Fe/H}\right] \sim -2.0$
, potentially also being members of the Aurora population (Belokurov & Kravtsov Reference Belokurov and Kravtsov2022).
For the GSE, we see that the MDF peaks at
$\left[\text{Fe/H}\right] = -1.75 \pm 0.02$
, inconsistent with studies like Feuillet et al. (Reference Feuillet, Feltzing, Sahlholdt and Casagrande2020) finding
$\left[\text{Fe/H}\right] = -1.17$
; Feuillet et al. (Reference Feuillet, Sahlholdt, Feltzing and Casagrande2021) getting
$\left[\text{Fe/H}\right] = -1.15$
, or Limberg et al. (Reference Limberg2022) finding the peak at
$\left[\text{Fe/H}\right] = -1.22$
. If we instead used the selection criteria defined in Feuillet et al. (Reference Feuillet, Sahlholdt, Feltzing and Casagrande2021) to define a different GSE sample, then again we find that the MDF peaks at
$-1.72 \pm 0.02$
. Therefore, the discrepancies are likely due to contamination from halo stars that we did not consider.
The slopes of the MDFs from
$\left[\text{Fe/H}\right] = -2.0$
to
$-3.0$
(binning the data into 7 bins) were calculated for all four kinematic groups. For the halo, we derived a slope of
$\Delta(\text{LogN})/\Delta\left[\text{Fe/H}\right] = 1.15 \pm 0.08$
,Footnote
n
and to be consistent with the literature, we also derived slopes for other metallicity ranges. For Youakim et al. (Reference Youakim2020), who used a metallicity range from
$\left[\text{Fe/H}\right] = -2.5$
to
$-3.4$
to derive a slope of
$1.0 \pm 0.1$
, in agreement with the value we find,
$1.3 \pm 0.2$
for the same metallicity range. Similarly, Da Costa et al. (Reference Da Costa2019), determined a slope of
$1.5 \pm 0.1$
over
$\left[\text{Fe/H}\right] = -2.75$
to
$-4.00$
, a value consistent within uncertainties from our slope of
$1.8 \pm 0.8$
for the same metallicity interval. However, the relative lack of EMP stars in our sample leads to larger uncertainty in the slopes making definitive comparisons.
For the prograde disk, the slope is
$1.4 \pm 0.1$
from
$\left[\text{Fe/H}\right] = -2.0$
to
$-3.0$
, which is within 2
$\sigma$
of the combined uncertainties of our halo value. It is also within the uncertainty of the retrograde disk slope at
$\Delta(\text{LogN})/\Delta\left[\text{Fe/H}\right] = 1.0 \pm 0.3$
. Recent studies have begun to reveal the unique nature of the disk-like kinematics of metal-poor stars in the prograde disk against those in the retrograde disk orbits (e.g. Sestito et al. Reference Sestito2019; Sestito et al. Reference Sestito2020; Cordoni et al. Reference Cordoni2021; Carter et al. Reference Carter2021; Carollo et al. Reference Carollo, Christlieb, Tissera and Sillero2023). It is suggested that the retrograde disk was formed via accretion, but the metal-poor component of the prograde disk may have formed this way too. Our data can not answer this from kinematics alone, but we can attempt to answer where these stars came from, particularly with our comparable MDF slopes, which we discuss more in Section 5.
The slope for the GSE is
$\Delta(\text{LogN})/\Delta\left[\text{Fe/H}\right] = 1.0 \pm 0.2$
, similar to both the halo and the retrograde disk. This suggests that these three groups underwent similar formation mechanisms, likely of an accreted origin with similar star formation efficiencies. To validate our slope of the GSE MDF, we used data provided by Bonifacio et al. (Reference Bonifacio2021) in their Table 2 to derive our own estimate of their MDF slope. At
$\Delta(\text{LogN})/\Delta\left[\text{Fe/H}\right] = 1.18 \pm 0.09$
over the same metallicity range, we agree within uncertainty with Bonifacio et al. (Reference Bonifacio2021).
5. Discussion
We have determined the chemodynamics for 5 795 stars observed with 2dF-AAOmega in regions close to the Galactic plane. Our chemical analysis reveals 2 571 stars with
$\left[\text{Fe/H}\right] \lt -1.5$
, whereof roughly half belong to the Galactic halo, half to the prograde disk, and smaller portions to the retrograde disk and to the GSE. On the tail end of the metallicity distribution are 15 previously unrecognised EMP stars with
$\left[\text{Fe/H}\right] \leq 3$
.
For stars with detections, the range of
$\left[\text{C/Fe}\right]$
values among the EMPs possibly indicates a range of processes for the origin of carbon relative to iron, independent of the difference in kinematic groups.
For our EMPs on metal-poor prograde disk orbits, there are two possible formation scenarios for their populations: either through in-situ (formed directly within the Galaxy), or via an in-plane accretion event (formed using material external to the Galaxy such as from a merger event). Previous studies like Sestito et al. (Reference Sestito2021) have shown that the retrograde disk was formed through accretion, but FIRE-2 simulations (Hopkins et al. Reference Hopkins2018) suggest something similar occurs for the prograde disk. By simulating Milky Way-mass galaxies, Santistevan et al. (Reference Santistevan2021) found that their stars with
$\left[\text{Fe/H}\right] \lt -2.0$
were predominately on prograde disk orbits, with their formation consistent with a merger with a LMC/SMC-mass gas rich merger 7–12.5 Gyr ago. Observationally, the study by Feuillet et al. (Reference Feuillet, Feltzing, Sahlholdt and Bensby2022) found a population of RGB and RR Lyare variable disk stars that match the chemical signatures of an accretion event. This has been done successfully for the GSE, with Buder et al. (Reference Buder2022) using chemistry to isolate the accreted structure (see their fig. 9). Obtaining detailed chemistry of our EMP prograde disk stars will therefore allow us to investigate whether their chemical signatures align with possible merger events in the metal-poor prograde disk.
Another interesting revelation is that the retrograde disk population is a relatively constant fraction of the halo population at 4.6% for all metallicities below
$\left[\text{Fe/H}\right] = -1.5$
, as shown in Fig. 14. This could suggest that the retrograde disk was part of the halo formation, which itself is dominated by accretion. Having detailed chemistry of the retrograde disk population for all metallicities below
$\left[\text{Fe/H}\right] = -1.5$
would allow us to compare with the halo at the same metallicities. The fraction of the retrograde disk population compared to the metal-poor prograde disk is also relatively constant at 10.3% for metallicities below
$\left[\text{Fe/H}\right] = -2.1$
. This may suggest that the metal-poor disk, both prograde and retrograde, likely shared similar formation processes as the halo.
Fig. 15 shows three features: one with low
$J_\phi$
at
$\left[\text{Fe/H}\right] \lt -1.5$
(ranging from
$300 \pm 80$
to
$440 \pm 70$
km s
$^{-1}$
kpc), seen in the halo and GSE sample. Another with
$J_\phi$
increasing from
$380 \pm 50$
to
$1\,320 \pm 90$
km s
$^{-1}$
kpc when
$-1.5 \leq \left[\text{Fe/H}\right] \lt -1.1$
, seen in the prograde disk sample. Then
$J_\phi$
flattening out when
$\left[\text{Fe/H}\right] \geq -1.1$
, reaching to
$1\,680 \pm 60$
km s
$^{-1}$
kpc at the highest metallicities, again seen in the prograde disk sample. We are seeing the ‘spin-up’ of the prograde disk as metallicity increases, first proposed by Belokurov & Kravtsov (Reference Belokurov and Kravtsov2022), whereby the Galaxy transitions from chaotic, majority-accreted systems (low
$J_\phi$
), to more structured and defined in-situ systems (high
$J_\phi$
) seen in the galaxy today. The Aurora population was defined as being the group of stars born before this period, occurring at consistent metallicity ranges seen in our sample. These results are also consistent with those in Viswanathan et al. (Reference Viswanathan, Horta, Price-Whelan and Starkenburg2024a), who used Gaia XP metallicities to see this ‘spin-up’ over similar metallicity ranges.
A further notable discovery is that between
$\left[\text{Fe/H}\right] = -2.0$
and
$-3.0$
, there are strong similarities in the MDFs in Fig. 16 amongst all four kinematic populations. The prograde disk has the steepest slope, which is within errors of the retrograde disk, but the difference between the halo and the GSE is small at
$2\sigma$
and
$1.8\sigma$
respectively. The inability to distinctively distinguish the four populations suggests that the star formation processes over this metallicity range is similar despite the different environments. GSE, halo and retrograde disk were suggested to form through accretion, and so we cannot rule out that the metal-poor end of the prograde disk was also formed through an accretion event. Programs are currently in progress to follow up on these EMPs using higher-resolution instruments, allowing us to perform detailed chemical analyses to confirm this.
6. Summary and conclusions
In this work, we have performed follow-up observations of 8 911 metal-poor star candidates selected from the Gaia DR3 Zhang et al. (Reference Zhang, Green and Rix2023) catalogue, using the medium resolution 2dF
$+$
AAOmega spectrograph, focusing towards the plane of the Galactic disk. Combining the capabilities of colte, isochrones and RVSpecFit, we have self-consistently derived the stellar parameters
$T_{\text{eff}}$
,
$\log g$
,
$\left[\text{Fe/H}\right]$
, [C/Fe] and radial velocity for 5 795 non-variable stars. Parameter uncertainties were found through Monte Carlo randomisations. Our parameters were tested and verified against parameters derived from WiFeS spectroscopy for 20 of our brightest stars, showing good agreement for all three stellar parameters (Fig. 6). Tests with RGB in five globular clusters also showed good agreement with literature
$\left[\text{Fe/H}\right]$
values from Carretta et al. (Reference Carretta, Bragaglia, Gratton, D’Orazi and Lucatello2009) (Fig. 7). Agreement with
$\left[\text{Fe/H}\right]$
values for 2 125 stars in common with Martin et al. (Reference Martin2023) was also excellent, with minimal scatter and small mean differences. We found that other, more complex approaches using Gaia XP did not perform as well as Martin et al. (Reference Martin2023).
The stellar parameters for the sample (Fig. 9) revealed the presence of 15 previously unknown candidate EMP stars (
$\left[\text{Fe/H}\right] \leq -3.0$
) that cover a variety of evolutionary stages from turn-off all the way to the top of the RGB (Fig. 5), with the lowest metallicity being
$\left[\text{Fe/H}\right] = -4.0 \pm 0.2$
dex. A closer look at the [C/Fe] results for our 15 EMPs (Fig. 11) revealed that most of our stars with detections are carbon-normal, with two carbon-enhanced stars. One is a giant with
$\left[\text{C/Fe}\right] = 0.79 \pm 0.03$
, and the other is a dwarf with
$\left[\text{C/Fe}\right] = 1.3 \pm 0.1$
. This dwarf, which has
$\left[\text{Fe/H}\right] = -3.6 \pm 0.1$
may have been enriched via a Pop. III ‘fallback’ supernovae or through mass transfer from a companion star (now white dwarf) during its AGB phase. High-resolution follow-up to detect (or not) slow neutron capture process elements, such as Ba, is required to first confirm, then differentiate these scenarios. For the whole sample, we saw that when we separated the stars into different regions of carbon enhancement (Fig. 10), the number of carbon-depleted stars decreases with lower metallicities, while the number of carbon-enhanced stars increases over the same range. This agrees with results shown by e.g. Placco et al. (Reference Placco, Frebel, Beers and Stancliffe2014).
We then analysed the orbital dynamics of the sample and used the selection cuts defined in Sestito et al. (Reference Sestito2021) to assign kinematic populations (Fig. 12). We select stars belonging to the prograde disk, retrograde disk (with both disks satisfying
$z_{\mathrm{max}} \lt 3$
kpc and their specific
$J_{\phi}/J{\mathrm{TOT}}$
requirements), halo and GSE. From this, the growth and decline of each group across various metallicity bins is revealed (Fig. 14). Of note is the decline of prograde star percentages with decreasing metallicity from
$\mathrm{N}_{\mathrm{pro}}/\mathrm{N}_{\mathrm{bin}} = 48.6\%$
in the
$\left[\text{Fe/H}\right] = -1.5$
bin, to 20.7% in the
$\left[\text{Fe/H}\right] \lt -2.9$
bin. The peak percentage of the total number of retrograde disk stars in the
$\left[\text{Fe/H}\right] = -2.1$
bin is 4.2%, then below this the portion of retrograde disk against prograde disk stars remains relatively constant at
$\mathrm{N}_{\mathrm{ret}}/(\mathrm{N}_{\mathrm{ret}} + \mathrm{N}_{\mathrm{pro}}) = 10.3\%$
. The portion of retrograde disk against halo stars is relatively consistent across all metallicities at
$\mathrm{N}_{\mathrm{ret}}/(\mathrm{N}_{\mathrm{ret}} + \mathrm{N}_{\mathrm{halo}}) = 4.6\%$
, indicating similar formations origins.
Using our
$\left[\text{Fe/H}\right]$
and
$J_\phi$
measurements (Fig. 15), we observe the ‘spin-up’ of the Milky Way from
$\left[\text{Fe/H}\right] = -1.5$
with
$J_\phi = 380 \pm 50$
km s
$^{-1}$
kpc, to
$\left[\text{Fe/H}\right] = -1.1$
with
$J_\phi = 1\,320 \pm 90$
km s
$^{-1}$
kpc. This is consistent with the Aurora in-situ population first hypothesised by Belokurov & Kravtsov (Reference Belokurov and Kravtsov2022).
The metallicity distribution function for the halo, prograde disk, retrograde disk and GSE stars (Fig. 16) were then evaluated. Between
$\left[\text{Fe/H}\right] = -2.0$
dex and
$\left[\text{Fe/H}\right] = -3.0$
dex, we derived the slopes (defined as d(logN)/d(
$\left[\text{Fe/H}\right]$
)) for the four regions, with the MDFs consistent amongst each other.
The retrograde slope agrees with the halo and GSE, confirming its likely accreted origins, though we see that within uncertainty, it also agrees with the metal-poor prograde disk. This suggests that the metal-poor prograde disk may also be formed through accretion. Future studies involving measuring
$\alpha$
-element abundances of disk EMP stars may unearth the true nature of the disk, particularly the metal-weak prograde disk.
Acknowledgements
We acknowledge the traditional owners of the land on which the AAT and ANU stand, the Gamilaraay, the Ngunnawal, and the Ngambri people. We pay our respects to Elders, past and present, and are proud to continue their tradition of surveying the night sky in the Southern hemisphere. We are grateful for the team operating and maintaining the AAT and 2dF
$+$
AAOmega spectrograph. Their efforts do not go unnoticed, as this work would not be possible without their dedication and commitment to keep the instruments and telescope running.
Parts of this research were supported by the Australian Research Council Centre of Excellence for All Sky Astrophysics in 3 Dimensions (ASTRO 3D), through project number CE170100013.
This work was supported by computational resources provided by the Australian Government through the National Computational Infrastructure (NCI) under the National Computational Merit Allocation Scheme (project y89). T.N. acknowledges support from the Knut and Alice Wallenberg Foundation.
We thank the anonymous referee for their insightful comments and suggestions. Their feedback has helped improve the quality and strength of this manuscript.
Data availability statement
The derived stellar and orbital parameters for our survey sample, the stellar parameters for the globular clusters, and full entries of Table 1 and Table 4 is available here https://zenodo.org/records/15321892.
Appendix A. Comparisons to Literature
Here, we compare our stellar parameters for the stars in common with those from the following surveys that make use of Gaia XP data: Zhang et al. (Reference Zhang, Green and Rix2023), Andrae et al. (Reference Andrae2023b), and Li et al. (Reference Li, Wong, Hogg, Rix and Chandra2024), noting the comparison with Martin et al. (Reference Martin2023) was discussed in Section 3.4.3). We also compare our data with that from the spectroscopic PIGS survey Ardern-Arentsen et al. (Reference Ardern-Arentsen2024), which used 2dF+AAOmega data like our sample.
Appendix A.1. Zhang et al. (Reference Zhang, Green and Rix2023)
We compare our spectroscopic stellar parameter estimates to those from the Zhang et al. (Reference Zhang, Green and Rix2023) catalogue in Fig. A1. Since Zhang et al. (Reference Zhang, Green and Rix2023) provide errors, we also show the weighted standard deviations and mean differences, which are written directly on the subplots. Note that the sample shown includes stars above the selection cut of
$\left[\text{Fe/H}\right]_{\mathrm{Zhang}} \leq -1.5$
dex. These were selected for the calibration field ra_0103-7050 containing globular cluster NGC 362.

Figure A1. Stellar parameter comparison between us and Zhang et al. (Reference Zhang, Green and Rix2023) for our sample of 5 783 stars. The weighted standard deviations and mean differences are shown directly on each subplot. The EMPs (blue star symbols) are found to be more metal-rich than our values suggest.
The
$T_{\text{eff}}$
correlation between the two datasets (
$\sigma_w = 317$
K,
$\Delta T_{\mathrm{eff}, w}=-164$
K) is reasonable for stars within the over-density region at
$T_{\text{eff}} = 5\,000$
K, and a smaller group of stars at
$T_{\text{eff}} = 6\,000$
K, but the majority are below the 1:1 line, showing that our
$T_{\text{eff}}$
’s are hotter than what Zhang et al. (Reference Zhang, Green and Rix2023) suggests. This is the consequence of reddening, as we can demonstrate that stars further to the right along the x-axis have higher reddening than those closer to the 1:1 line.
Similarly, the
$\log g$
relationship (
$\sigma_w = 0.42$
dex,
$\Delta \log g_w=-0.23$
dex) has two main over-density regions that agree with Zhang et al. (Reference Zhang, Green and Rix2023): largest at log
$g=4.0$
dex and at
$\log g = 2.5$
dex, but again, our analysis suggests that majority of our stars have greater
$\log g$
’s. The authors mentioned that a strong degeneracy between their inferred
$\log g$
’s and Gaia parallax was present, due to the inability to measure line widths in Gaia XP spectra as a result of the poor spectral resolution. The issue with parallax also increased uncertainty in our sample. Therefore, stars that are dwarfs in both datasets are generally in agreement with each other. For giants, the less precise parallax measurements result in larger differences in our results. We also see that between
$2.0 \leq \log g_{\text{Zhang}} \lt 2.5$
, the range in surface gravities from our approach is vast, covering every evolutionary stage with
$2.0 \leq \log g_{\text{This work}} \lt 4.5$
.

Figure A2. Stellar parameter comparison between us and Andrae et al. (Reference Andrae, Rix and Chandra2023a) for 7 659 stars. The comparison includes 13 of the 15 stars we identified as EMPs. Since errors were not provided, only the unweighted standard deviations and mean differences for each parameter are shown.
The
$\left[\text{Fe/H}\right]$
correlation (
$\sigma_w = 0.66$
dex,
$\Delta \left[\text{Fe/H}\right]_w=-0.61$
dex), again, has agreement in the over density region at
$\left[\text{Fe/H}\right] =-1.5$
dex, but no correlation is shown outside of this. The hard cut at
$\left[\text{Fe/H}\right] = -1.5$
in Zhang et al. (Reference Zhang, Green and Rix2023) is due to us selecting stars with metallicities below this. The stars between
$-1.0$
and
$-1.5$
result from filling out fibres for calibration field ra_0103-7050 containing globular cluster NGC 362. There may be a dependence on reddening, with stars with high
$E(B-V)$
values typically showing larger discrepancies than those less reddened, but it is not the main driving factor, making the
$T_{\text{eff}} - \left[\text{Fe/H}\right]$
correlation seen in Fig. 9 unclear. As with
$\log g$
, the reason is again most likely due to the low spectral resolution of the Gaia XP spectra.
Appendix A.2. Andrae et al. (2023)
Using the XGBoost algorithm trained on APOGEE stellar parameters, Andrae et al. (Reference Andrae, Rix and Chandra2023a) derived stellar parameters for
$\sim 175$
million Gaia XP stars. This catalogue includes 5 655 stars in common with our work (with variables removed), including 13 of the 15 stars we identify as EMPs. The stellar parameter comparison between for the stars in common is shown in Fig. A2.
The correlation between
$T_{\text{eff}}$
for the two datasets (
$\sigma = 493$
K,
$\Delta T_{\mathrm{eff}} = -250$
K) again shows the similar behaviour towards our
$T_{\text{eff}}$
’s seen before with Zhang et al. (Reference Zhang, Green and Rix2023), where for a certain
$T_{\text{eff, XGBoost}}$
, the range of
$T_{\text{eff, This work}}$
is large. Again, this is due to reddening. The 13 EMPs are close to the 1:1 line, showing that the agreement for these stars is good.

Figure A3. Stellar parameter comparison between us and Li et al. (Reference Li, Wong, Hogg, Rix and Chandra2024) for 1 454 stars, with one of them identified as an EMP. Weighted standard deviations and mean differences are also shown.
For surface gravity (
$\sigma = 0.70$
dex,
$\Delta \log g = -0.19$
dex), we see that a large portion of stars follow the 1:1 line, but a distinct population that has gathered below the line. Both analyses place these stars on the RGB (rather than on the horizontal branch). For this, reddening was again the main reason, with stars in the concentration of points below the 1:1 line (
$\log g_{\text{XGBoost}} \approx 2.3$
and
$\log g_{\text{This work}} \approx 3.3$
) overall having greater reddening (E(B-V)
$\geq 0.8$
) than those closer to the 1:1 line. Stars in the vertical streaks (the vertical lines at
$\log g_{\text{This work}} \approx 4.1$
and 4.2) are also highly reddened (E(B-V)
$\geq 1.0$
). The reason for these streaks is likely due to high
$T_{\text{eff}}$
estimates placing stars at the turn-off point. Because of this, we are selecting two distinct
$\log g$
values depending on their
$\left[\text{Fe/H}\right]$
values or on the age of the isochrones, causing the vertical bands seen in the data.
The metallicity comparison (
$\sigma = 0.54$
dex,
$\Delta \left[\text{Fe/H}\right] = 0.34$
dex) shows that from
$\left[\text{Fe/H}\right]_{\mathrm{This work}} = 0$
to
$-1.5$
, the metallicities from XGBoost remains relatively consistent at
$\left[\text{Fe/H}\right]_{\mathrm{XGBoost}} = -0.5$
, before they drop to lower values and closer to our values. Of note is very few of our EMPs have metallicities below
$\left[\text{Fe/H}\right]_{\mathrm{XGBoost}} = -2.0$
, with several even having values above
$\left[\text{Fe/H}\right]_{\mathrm{XGBoost}} = -1.0$
.
Appendix A.3. Li et al. (2023)
With the focus on RGB stars, Li et al. (Reference Li, Wong, Hogg, Rix and Chandra2024) estimated the metallicities of 23 million RGBs from Gaia XP as part of AspGap, a neural-network based regression system trained on high resolution spectra taken with APOGEE to infer stellar parameters from Gaia XP. Their catalogue includes 1 454 of our stars, including one EMP star. We show comparison of our stellar parameters to AspGap in Fig. A3.

Figure A4. Stellar parameter comparison between us and Ardern-Arentsen et al. (Reference Ardern-Arentsen2024) survey for 121 stars in common, with
$\left[\text{C/Fe}\right]$
shown as an additional comparison. Stars with non-detections in the
$\left[\text{C/Fe}\right]$
comparison are shown by leftward-facing arrows representing their upper limits. The weighted standard deviations and mean differences are shown directly on the plots.
For all stars in common, we find mean differences and standard deviations of (
$\sigma_w = 250$
K,
$\Delta T_{\mathrm{eff}, w}=15$
K) for
$T_{\text{eff}}$
, (
$\sigma_w = 0.65$
dex,
$\Delta \log g_w=-0.070$
dex) for
$\log g$
, and (
$\sigma_w = 0.40$
dex,
$\Delta \left[\text{Fe/H}\right]_w=0.53$
dex) for
$\left[\text{Fe/H}\right]$
. No obvious correlations are present.
For
$T_{\text{eff}}$
, we see a compact grouping at 5 000 K on both axes, consistent with typical
$T_{\text{eff}}$
’s for stars on the RGB. We also see a smattering of stars stretching towards high
$T_{\mathrm{eff, This work}}$
, again the result of reddening. For
$\log g$
, we see a similar grouping at 2.5 dex on both axes, but the distribution of
$\log g_{\mathrm{This work}}$
is wider than
$\log g_{\mathrm{AspGap}}$
. This is the result of the selection function used in AspGap from APOGEE to ensure their stars are RGB, which holds mostly true for our stars. For
$\left[\text{Fe/H}\right]$
, we again see a compact grouping, but interestingly, this one does not pass through the 1:1 line. For all our stars in common, the metallicities predicted by Li et al. (Reference Li, Wong, Hogg, Rix and Chandra2024) are all above the 1:1 line by
$\Delta \left[\text{Fe/H}\right]_w=0.53$
dex, showing that overall, they derive higher metallicities than what we find. This is particularly true for the one EMP star in common, having a difference of 1.8 dex in metallicity between our two analyses.
Appendix A.4. Ardern-Arentsen et al. (Reference Ardern-Arentsen2024)
The Pristine Inner Galaxy Survey (PIGS) (Starkenburg et al. Reference Starkenburg2017; Arentsen et al. Reference Arentsen2020) uses Ca H&K photometry from MegaCam on the 3.6m Canada-France-Hawaii Telescope (CFHT) to identify metal-poor stars within the inner Galactic bulge region. Candidate metal-poor stars are followed up on the AAT with 2dF+AAOmega, using a very similar instrumental setup to us. Ardern-Arentsen et al. (Reference Ardern-Arentsen2024) derived spectroscopic stellar parameters (including
$\left[\text{C/Fe}\right]$
) for 13 235 stars, with 117 of our stars in common. We show the comparison of stellar parameters in Fig. A4.
For all four stellar parameters, we see excellent consistency between our values and those of Ardern-Arentsen et al. (Reference Ardern-Arentsen2024). For
$T_{\text{eff}}$
(
$\sigma_w = 173$
K,
$\Delta T_{\mathrm{eff}, w}=168$
K), the scatter is minimal between us, with their temperatures on average slightly hotter. One star (the only one over
$T_{\text{eff}}{} = 6\,000$
K) has a discrepancy of 600 K, with it being hotter in our work. It shows significant differences in other stellar parameters, which mostly vanish if we adopted their
$T_{\text{eff}}$
value.
The comparison with
$\log g$
(
$\sigma_w = 0.50$
,
$\Delta \log g_{w}=0.24$
) is good, with majority of the stars having higher values in Ardern-Arentsen et al. (Reference Ardern-Arentsen2024). Their surface gravities were derived using a mix of spectroscopic and parallax information, so some temperature dependence is present. Given that their
$T_{\text{eff}}$
values are higher overall, the higher
$\log g$
’s may be the result of this.
For
$\left[\text{Fe/H}\right]$
(
$\sigma_w = 0.24$
,
$\Delta \left[\text{Fe/H}\right]{}_{w}=0.11$
), the comparison is excellent, showing the lowest scatter for all datasets we have compared to. With the WiFeS comparison in Fig. 6, we had
$\sigma_w = 0.29$
, and for our globular cluster analysis in Fig. 7, the weighted standard deviations varied from
$\sigma_w = 0.13$
to 0.23 dex. The
$\sigma_w$
values being similar to those from our WiFeS and globular cluster comparisons means our measurements and errors are reliable.
For the
$\left[\text{C/Fe}\right]$
comparison (
$\sigma_w = 0.25$
,
$\Delta \left[\text{C/Fe}\right]_{w}=0.10$
), we see that overall, Ardern-Arentsen et al. (Reference Ardern-Arentsen2024) has higher
$\left[\text{C/Fe}\right]$
abundances than us, but the trend is acceptable. The majority of the stars compared are carbon-depleted (
$\left[\text{C/Fe}\right] \lt 0.0$
), with a handful being carbon-normal (
$0.0 \leq \left[\text{C/Fe}\right] \lt 0.7$
).
Appendix B. CH G-Band EMP Fits
The CH G-band fits for the 15 EMP stars across wavelength region
$4\,150 \leq \lambda \leq 4\,450$
Å are shown in Figs. B1 and B2. Note, the two figures are zoomed in over the regions
$4\,260 \leq \lambda \leq 4\,370$
Å to highlight the CH G-band region. Stars with detections (red) have their best
$\left[\text{C/Fe}\right]$
value fitted, alongside their statistical fitting error represented by the shaded region. The
$\left[\text{C/Fe}\right]$
systematic abundance error estimates are shown in the legend. Stars with non-detections (blue) have their upper limit values fitted, which is then quoted in the legend. These values are shown in Table B1.
Table B1.
$\left[\text{C/Fe}\right]$
abundance values alongside their systematic abundance error estimates and
$1\sigma$
statistical fitting error for our EMP stars. Those with non-detections have their upper limits shown, with the fitting error left empty.


Figure B1. CH G-band fits for the 15 EMP stars across the wavelength region
$4\,260 \leq \lambda \leq 4\,370$
Å (using continuum regions 4 150–4 200 Å and 4 400–4 450 Å). Black line is the observed data, and for stars with detections: the thick red line is the best-fitted [C/Fe] value, and the shaded region is the
$1\sigma$
statistical error (values shown in Table B1). Stars with non-detections have a thick blue line showing their upper limits. Stellar parameters
$T_{\text{eff}}$
,
$\log g$
and
$\left[\text{Fe/H}\right]$
are in the legend, alongside the S/N measured in the red spectra. The total errors taking into account uncertainties in stellar parameters are quoted in the legend with the best-fitting
$\left[\text{C/Fe}\right]$
value. Stars ra_1604-2712_18 and ra_1633-2814_284 have non-detectable [C/Fe] measurements, with upper-limits quoted in legend. Continues onto 22.

Figure B2. Continuation of Fig. B1. Stars ra_1659-2154_114 and ra_1832-3457_438 also have non-detectable [C/Fe] measurements.