Extreme weather events and disasters continue to have devastating impacts on households, communities, and nations with increased morbidity, mortality, and infrastructural and economic damage. Disasters are unforeseen events that overwhelm local capacity, often resulting in a request at the national or international level for external assistance.1
The Federal Emergency Management Agency, Centers for Disease Control and Prevention, American Red Cross, and many local government agencies have socially marketed the importance of household disaster preparedness for years, yet the US public largely remains unprepared for disasters.2 Families must prepare for the conditions that all disasters can create such as loss of electricity, water, and being unable to obtain supplies for several days due to the need for sheltering in place or disruptions in the supply chain. A household is considered prepared if members have created a family communication and evacuation plan and assembled a disaster kit containing enough food and water to sustain each member for at least 1 week.Reference Heagele, McNeill and Adams3
In September, 2021, due to flooding from Hurricane Ida, at least 13 people died in New York City (NYC).Reference Kriegstein, Parnell and Parisienne4 Eleven community members died in their basement-level homes in Queens, and nearly all the victims were residents of Asian descent who lived in low-income Asian immigrant communities.Reference Yam and Venkatraman5 According to 2020 Census data,6 Asians represented about 14.3% of the population in NYC and 26% of Queens. Asians in NYC are more likely to be foreign born immigrants than other major racial groups. Undocumented immigrants are particularly vulnerable to social marginalization, low socioeconomic status, lack of societal resources, and limited English proficiency, resulting in difficulties seeking help during emergencies. However, Asian Americans’ emergency preparedness has rarely been measured in disaster research.Reference Cong and Chen7 In response to the challenges encountered during Hurricane Ida, members of the Asian immigrant community expressed the need for household disaster preparedness education and community-based interventions, which they subsequently requested from the study team.
Prior to implementing a disaster preparedness intervention study with community members with limited English proficiency, there first needs to be a reliable instrument that can validly assess baseline preparedness and measure the effectiveness of the intervention. The Household Emergency Preparedness Instrument (HEPI) was developed to create a comprehensive, valid, and reliable tool that can assess household preparedness for emergencies, extreme weather events, and disasters.Reference Heagele, McNeill and Adams3 The 51-item HEPI assesses preparedness actions and the availability of essential supplies of individuals or households. The instrument was pilot tested with a diverse sample in NYC and demonstrated high reliability and validity.Reference Heagele, Adams and McNeill8 Because the HEPI was designed (and its validity supported) across diverse populations, it is well suited for assessing household disaster preparedness among non-English-speaking populations. This instrument translation study seeks to allow researchers and others to evaluate household disaster preparedness among Korean-speaking communities, helping to fill a gap in knowledge of disaster preparedness of immigrant populations.
Purpose
This instrument translation study field and pilot tested a Korean version of the Household Emergency Preparedness Instrument (K-HEPI) to perform psychometric testing on the instrument, generating reliability and validity data for an examination of cross-cultural equivalence. “Establishing evidence for reliability and validity is essential for credibility of the measurement results. Comparing the psychometric properties of the source and target language versions provides additional data for assessing equivalence.”Reference Waltz, Strickland and Lenz9
Research Questions (RQs) and Hypotheses (H)
RQ 1 (content validity): Is there consensus among a sample of native Korean bilingual participants that the K-HEPI measures an optimal breadth of content for Korean households?
H 1.1: A sample of 30 native Korean participants will agree that the K-HEPI measures content centrally relevant to disaster preparedness for Korean-speaking households.
H 1.2: The participants will agree that no centrally relevant content to Korean-speaking household disaster preparedness is left out of the K-HEPI.
RQ 2 (construct validity and reliability): Does the K-HEPI demonstrate similar levels of inter-item and test-retest reliabilities and a similar latent structure as the English version?
H 2.1: The K-HEPI will demonstrate reasonable test-retest reliabilities.
H 2.2: The inter-item reliability of the K-HEPI will be similar to that of the English version.
H 2.3: Confirmatory factor analysis will find that data from the K-HEPI is well fit by the same factor structure as the English version.
RQ 3 (convergent validity): Do K-HEPI scores correlate with another measure of disaster preparedness?
H 3.1: K-HEPI scores will correlate significantly with scores from a version of the Readiness Quotient that is also translated into Korean and completed by these same participants.
Methods
The English HEPI was translated and field tested according to the recommended instrument translation procedures of Waltz et al.Reference Waltz, Strickland and Lenz9 and Sperber.Reference Sperber10 The English-to-Korean translation followed a symmetrical translation, utilizing a decentered process; this means both the source and target languages were considered equally important, with both versions of the instrument remaining loyal to meanings and open to revision.Reference Waltz, Strickland and Lenz9 Investigators from 2 different cultures (US and South Korea) collaborated on the translation through the following steps:
-
1. Translation from the source (English) to the target (Korean) language was completed by 2 bilingual members of the study team working independently. These team members are native Korean speakers, fluent in English, and are ethnically and culturally representative of the population among whom the instrument was deployed. The source language translators were knowledgeable about the constructs being measured and how the instrument will be used.
-
2. Back translation from the target to the source language was then completed by 3 bilingual participants working independently. The target language translators were not knowledgeable about the intent and concepts underlying the instrument.
-
3. The study team and a HEPI developer then reviewed the translation and back translation for clarity and linguistic appropriateness, with translation errors corrected by team consensus.
-
4. Both versions of the instrument were then administered to 30 bilingual participants, and psychometric equivalence for the original and target versions was assessed. To detect possible discrepancies, field testers were asked to rate each item’s equivalences between the source and target versions by using a 7-point Likert response scale ranging from 1 (extremely comparable/extremely similar) to 7 (not at all comparable/not at all similar). Items rated low on equivalence or items with discrepancies were revised via consultation with the HEPI developer and K-HEPI translators.
-
5. The K-HEPI scores of the participants were compared to their English HEPI scores. Any items with different responses were reexamined and possibly retranslated.
This study was approved as “exempt with limited review status” by the institutional review board (IRB) of Hunter College of the City University of New York (protocol #2022-0542-Hunter) on September 9, 2022. The data for this study has not been approved to be shared beyond the study team.
The recruitment of field testers and data collection commenced in October 2023 and concluded in November 2023. K-HEPI back translators and field testers were purposively recruited via direct email correspondence with known colleagues and university students who are bilingual English and native Korean speakers/readers utilizing an IRB-approved recruitment and consent script.
Following the K-HEPI field testing, the instrument was pilot tested in a controlled before-after study to measure the level of household disaster preparedness of an at-risk population both before and after receiving the Nurses Taking on Readiness Measures (N-TORM) intervention to measure the effectiveness of the intervention.Reference Heagele, Hyun and Park11 The K-HEPI scores of the experimental group were compared with those of the control group at baseline and at the 1-month follow-up.
The English HEPI was designed to be comprised of 5 constructs; confirmatory factor analyses conducted by Heagele et al.Reference Heagele, Adams and McNeill8 supported this structure. The first 2 constructs include the Preparedness Actions and Planning (PAP) and Disaster Supplies and Resources (DSR) subscales, which represent disaster preparedness activities and supplies relevant to all households. The Special Actions (SA) Part 1 and Part 2 subscales are relevant to households with specific characteristics, such as those who have infants, children, pets, prescription eyeglasses or contacts, utility connections within the home, or reside in a community that has an emergency alerting system. The final subscale, Access and Functional Needs (AFN), is relevant only to those who have a disability, are aged 65 years or older, are dependent on at least 1 prescription medication, or are pregnant. The original HEPI’s factor structure was compared to the K-HEPI’s factor structure using pretest data (N = 399) from the experimental and control groups collected in the intervention study.Reference Heagele, Hyun and Park11
Before the development of the HEPI, no gold standard household disaster preparedness instrument, supported by a dedicated instrument development publication, was available.Reference Heagele12 However, the Readiness Quotient13 has been used in disaster preparedness research. The study team also translated the Readiness Quotient into Korean, then administered it to the participants of the controlled before-after study. To examine convergent validity, commonly considered a type of criterion validity, the K-HEPI scores were compared to the Korean Readiness Quotient scores.
Results
H 1.1 and H 1.2 K-HEPI Field Testing
Table 1 presents the demographic characteristics of the 30 K-HEPI field testers. When coding feedback from the testers, the study team considered K-HEPI instruction and item rating scores of 1-3 as “favorable” for the instructions and items being comparable to the English version; 4 as “neutral;” and 5-7 as “unfavorable.” Field testers agreed that the K-HEPI instructions and items were comparable to the English HEPI, with all instructions and items receiving at least 77% (range 77-100) consensus for comparability. Some minor edits to the K-HEPI were made in response to the qualitative data about the testers’ reflections about the instrument to address spelling errors and word spacing. Participants agreed that the K-HEPI measured centrally relevant disaster preparedness content of Korean-speaking households and no additional content was recommended, supporting H 1.1 and H 1.2.
Table 1. Demographic characteristics of the K-HEPI field testers (N=30)

In thinking about their experiences completing the K-HEPI, 100% of the field testers rated the clarity of the instructions on the survey as “somewhat clear” or “extremely clear,” 97% (n = 29) rated the clarity of the items on the survey as “somewhat clear” or “extremely clear,” 87% (n = 26) rated the length of the K-HEPI as “somewhat reasonable” or “extremely reasonable,” and 93% (n = 28) rated the difficulty of the K-HEPI as “neither easy nor difficult,” “somewhat easy,” or “extremely easy.”
K-HEPI Pilot Testing
H 2.1 Test-retest reliability
Test-retest reliability was analyzed through correlations between HEPI scores before and after experimental-group participants completed the N-TORM intervention;Reference Heagele, Hyun and Park11 these correlations are presented in Table 2 for both the control and experimental groups at pre- and post-intervention.
Table 2. Pre-post correlations (test-retest reliabilities of K-HEPI scores and subscores for participants in the N-TORM InterventionReference Heagele, Hyun and Park11

Note. K-HEPI = Korean version of the Household Emergency Preparedness Instrument. N-TORM = Nurses Taking on Readiness Measures intervention.
These test-retest reliabilities were around .54 for control-group participants and around .30 for experimental-group participants. These values would be considered “low” (< .70) by Nunnally and Bernstein.Reference Nunnally and Bernstein14 Participants in the N-TORM intervention realized significantly greater HEPI scores at post-intervention,Reference Heagele, Hyun and Park11 which may account for the lower test-retest reliabilities among those group members.
H 2.2 Inter-item reliability
Table 3 presents coefficient αs for each of the K-HEPI subscale scores for the experimental group participants pre- and post-participation in the N-TORM intervention.Reference Heagele, Hyun and Park11 Table 3 also presents the coefficient αs from Heagele et al.’s8 similar analyses from the English HEPI.
Table 3. Coefficient αs for K-HEPI subscale scores at pre- and post-interventionReference Heagele, Adams and McNeill8, Reference Heagele, Hyun and Park11

Note. The English version statistics are reproduced from Heagele et al.Reference Heagele, Adams and McNeill8
The K-HEPI showed moderately good inter-item reliability at both pre- and post-intervention, especially for an inventory-like instrument where different items correspond to distinct preparedness actions or disaster kit supplies. The coefficient αs were around .75 at pretest and .80 at posttest. Coefficients were generally greater for the more unified PAP and DSR subscale scores; these values were quite similar to those reported by Heagele et al,Reference Heagele, Adams and McNeill8 supporting H 2.2.
Confirmatory Factor Analysis Models
The study team compared 3 models to assess the factor structure of the K-HEPI among the pretest data:
-
1. A 5-factor model, where items were grouped into 5 orthogonal factors: Preparedness Actions and Planning, Disaster Supplies and Resources, Special Actions Part 1, Special Actions Part 2, and Access and Functional Needs.
-
2. A 4-factor model, which mirrored the 5-factor model except that items from Special Actions Part 1 and Part 2 were combined into a single Special Actions factor.
-
3. A 1-factor model, where all items loaded onto a single, general factor.
All 3 models resolved normally after about 150 iterations each, indicating that each of the models was able to fit the data without great difficulty.
H 2.3 Model fit indices. Common fit indices for each model are presented in Table 4. The χ2s for the models were large, indicating that all 3 models did not fit the data exceptionally well. Most items on the K-HEPI are dichotomous, though, which would account for much of this misfit since there is less variability per item than, for example, Likert-response items.
Table 4. Model fit indices for the 3 confirmatory factor analysis models

Note. RMSEA = Root Mean Square Error of Approximation; SRMR = Standardized Root Mean Residual; CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; BIC = Bayesian Information Criterion.
This misfit is further reflected in the Root Mean Square Error of Approximation (RMSEA) and Standardized Root Mean Residual (SRMR) indices; neither of which were below the commonly accepted thresholds of < .05 and < .08, respectively,Reference Hu and Bentler15 for the 3 models.
Nonetheless, both RMSEA and SRMR demonstrated the least misfit to the data for the 5-factor model. The other fit indices, the Comparative Fit Index (CFI) and the Tucker-Lewis Index (TLI), further supported the better fit of the 5-factor model. Although none of the CFI or TLI values exceeded the ≥ .95 criterion commonly used for them,Reference Nunnally and Bernstein14 those for the 5-factor model came closer than the 4- or 1-factor model, supporting H 2.3.
H 2.3 Tests of model fits. The various indices for the 3 models are presented in Table 4. In that table, χ2/df, RMSEA, SRMR, and Bayesian Information Criterion (BIC) are discrepancy measures, evaluating how poorly a given model fits the data; smaller values for these indices denote better fit. CFI and TLI are relative fit measures, assessing how much better the given model fits than a null model that assumes no factor structure; larger values denote better fits. BIC is a data-dependent measure of absolute fit that accounts well for model complexity. All other measures are benched against general criterion (given in Table 4 for each column). Both χ2s and BICs allow direct testing between each of these models, including for significantly better fits of one model compared to another here.
The discrepancy and relative fit indices in Table 4 indicate that none of the models fit these data especially well. However, all indices also suggest that the 5-factor model (that breaks the SA section into 2 subscales) performed best, which has also been found for the English HEPI,Reference Heagele, Adams and McNeill8 again supporting H 2.3.
The study team tested this impression formally through comparisons of the differences in how well each model fit the data. The 5-factor model fit these data significantly better than the 4-factor model (Δχ2 = 23, Δdf = 4, p < .001), indicating that separating SAs into 2 distinct factors improved model fit. Note, though, that the difference in BICs between the 5- and 4-factor models was negligible (a difference of 1, which is not significant, p = 0.317). This suggests that a more conservative comparison (that accounts for the relative complexity of the models) finds little effect of separating the SA subscale into 2 parts.
Both the 4- and 5-factor models provided significantly better fits than the 1-factor model (Δχ2s = 469 and 492, respectively, both ps < .001; ΔBICs = 433 and 433, both ps < .001). Therefore, this study found support that the K-HEPI generates the intended factor structure, even if the version dividing the SA section into 2 parts may fit better than the version keeping it as 1 subscale.
H 3.1 Correlations Between K-HEPI and Readiness Quotient Scores
All correlations between K-HEPI scores and the overall Readiness Quotient13 scores were significant at both pre- and posttest (Table 5, found online in the supplemental material),Reference Heagele, Hyun and Park11 providing evidence that the K-HEPI validly measures the intended content, supporting H 3.1. The absolute magnitude of most of these correlations was not large; at pretest, correlations with overall Readiness Quotient scores for the K-HEPI total (with and without the SA and AFN subscales) were about 0.6 but went down to about 0.1 for the SA subscale. At posttest, correlations were around 0.2, but still significant. The items comprising the Readiness Quotient are rather specific, so relationships with parts of the K-HEPI dealing with, e.g., pet preparedness, were not surprisingly lower (rs = .11 & .04 between Readiness Quotient scores and total pet-related K-HEPI items 36 and 37, respectively). Note that the study team only had Readiness Quotient scores from 1 time (pretest), so comparisons could be made with K-HEPI pretest and posttest scores, but not with the intervention per se.
Limitations
External validity attends to how well the results of a study can be validly generalized to other populations, situations, and times. Females, young and middle-aged adults, and participants with higher education levels were over-represented in the field-testing portion of the study. Specifically, all field-testing participants (N=30) had completed at least some college, and the majority (n=20, 66.7%) achieved a graduate degree. In contrast, the number of male participants was low (approximately 10%), and no participants with less than a high school education provided input. These factors raise concerns about the representativeness of the field-testing sample, especially for adults over the age of 70 years who may have limited English proficiency and lower literacy levels.
The South Korean government recommends that documents disseminated to the general public should be written at or below a 9th-grade level.Reference Hyun-Cheol16 The readability of the K-HEPI was assessed using the Natmal Tool for Korean Text.Reference Park17 Overall, the difficulty level score of the K-HEPI was 1050, corresponding to a 10th-12th-grade reading level. This may limit its applicability among respondents with lower literacy levels. However, it is important to note that the adult literacy rate in South Korea is approximately 98%.18
Content validity concerns related to the representativeness of the K-HEPI field-test participants are partially alleviated when evaluating the responses to an open-ended question included in the data collection for the pilot-testing portion of the study, which had more diversity in sample characteristics. The pilot test participants were able to provide qualitative feedback on their experience with completing the K-HEPI and N-TORM intervention. Participants made no comments on missing home disaster preparedness items or actions on the K-HEPI but instead focused on the positive experience of completing the intervention and made recommendations for sourcing disaster supply kit items.Reference Heagele, Hyun and Park11
This study assessed the household disaster preparedness levels of Korean-speaking NYC residents, most of whom were born in Korea. The results may differ for populations with more US-born Korean Americans, as acculturation may affect their perceptions of the needs for household disaster preparedness. In addition, results might be different depending on geographic location. For example, results may differ for Korean immigrants who live in countries known for frequent disasters, such as Koreans living in Japan. Nonetheless, the results of this study were compared with those of similar studies conducted with different immigrant populations,Reference Bhandari, Rahman and Takahashi19–Reference Nishiyama and Glauberman21 and the findings were corroborated.
Discussion
Feedback from 30 field testers supports that the K-HEPI demonstrated good clarity in both the instructions and the items, aligning closely with the HEPI in its structure and content.
The confirmatory factor analyses supported a 5-factor model as the best-fitting model for the Korean version of the HEPI. This model, which separates SA into 2 parts (and distinguishes between PAP, DSR, and AFN subscales), provided a better fit than either the 4- or 1-factor model, although the 4-factor model fit nearly as well as the 5-factor model. This suggests that the K-HEPI follows a similar factor structure as the English version. This thus provides evidence for the valid construct of the Korean version.
The K-HEPI demonstrated reasonable inter-item reliabilities (coefficient αs), especially for the PAP and DSR subscores. The coefficient αs for the other subscores (SAs and AFN) were lower, perhaps because these subscales are both briefer (SA Part 1 has 6 items; SA Part 2 has 5; AFN has 9; PAP and DSR have 19 and 12, respectively) and cover more disparate content (e.g., pets, medical devises, etc.).
The test-retest reliability of the K-HEPI, however, was low. Among participants in the N-TORM disaster-preparedness intervention,Reference Heagele, Hyun and Park11 these correlations were around 0.30 (range: 0.18-0.57); among those who did not participate, they were around .54 (range: 0.50- 0.65). This finding is not unexpected, especially after a respondent has participated in the disaster preparedness intervention. The disaster preparedness of households is likely not a trait characteristic, but a state characteristic. The HEPI and K-HEPI measure the state characteristic of a household’s current disaster preparedness, which could change rather quickly with the creation of evacuation and communication plans and the purchasing of supplies, thus affecting the test-retest reliability of the instrument. Furthermore, as elaborated on below, simply completing the K-HEPI may constitute a sort of intervention. Although the scores of the control group participants did not significantly change,Reference Heagele, Hyun and Park11 the test-retest correlations among them were higher than among the experimental group participants, suggesting that the effect of the K-HEPI itself on attitudes and behaviors may also be affecting test-retest reliability.
Interestingly, the control group from the before-after study (who did not receive any formal preparedness intervention but merely completed the K-HEPI) also demonstrated slightly improved preparedness at posttest.Reference Heagele, Hyun and Park11 By presenting participants with content that prompts reflection on their preparedness behaviors and gaps thereof, the instrument appears to foster greater awareness and knowledge about disaster readiness. Consequently, completing the instrument may serve as an initial step toward enhancing preparedness, positioning it not only as a diagnostic tool but also as a potential catalyst for behavioral change.
Conclusions
The K-HEPI demonstrated satisfactory-to-good qualities as a valuable tool for assessing preparedness among Korean-speaking populations, enabling clinicians, emergency management professionals, researchers, and policymakers to obtain data that validly assesses readiness levels within these communities. Its psychometric comparability with the English version supports that the results are consistent and reliable across both language groups. By using these instruments, emergency management professionals can identify gaps in preparedness and develop targeted interventions. The K-HEPI can also be used to evaluate the effectiveness of disaster preparedness interventions over time, providing insights into how well interventions can improve Korean-speaking individuals’ readiness to prepare for emergencies and disasters.
As the K-HEPI has only been used in 1 feasibility studyReference Heagele, Hyun and Park11 to date, future research should include a larger and more diverse sample in terms of education level, gender, and regional background. Researchers and disaster preparedness interventionists may use the K-HEPI and HEPI for non-commercial purposes without cost under the following conditions: (a) the K-HEPI and HEPI developers are properly credited in publications and presentations; (b) if the K-HEPI or HEPI are modified in any way, these changes are disclosed in publications; and (c) psychometric data analysis for the instrument used is provided to the K-HEPI and HEPI developers to inform future modifications of the instruments. For a copy of the K-HEPI and instructions on how to use and score the K-HEPI, contact author JYS at js3149@hunter.cuny.edu. For the English version, contact author TH at th1591@hunter.cuny.edu.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/dmp.2025.10150.
Data availability statement
The data for this study has not been approved to be shared beyond the study team.
Use of artificial intelligence tool
Artificial intelligence tools were not used in the research and writing processes of this study.
Author contribution
WES – methodology development, data analysis, interpretation of results, writing the manuscript, reviewing and editing, validation; TH – conceptualization, methodology development, data analysis, interpretation of results, writing the manuscript, reviewing and editing, project administration, funding acquisition, supervision; JMH – methodology development, data collection, interpretation of results, writing the manuscript, reviewing and editing; SYP – methodology development, interpretation of results, writing the manuscript, reviewing and editing; JYS – conceptualization, methodology development, data collection, data analysis, interpretation of results, writing the manuscript, reviewing and editing, project administration, funding acquisition, supervision.
ClinicalTrials.gov identifier
NCT0554478
Hunter College of City University of New York Protocol Record 2022-0542-Hunter, Korean Translation and Validation of the K-HEPI by a Phase 1 Feasibility Study in NYC, is registered and posted on the ClinicalTrials.gov website.
Acknowledgments
We would like to thank our community partner, Korean Community Services of Metropolitan New York, and our study participants, who generously shared their time and experience for the purposes of this project. We would like to acknowledge the work of our back translators, Dr. Kyungra Yang PhD, Dr. Chilsook Kwon DNP, and Dr. Mikyung Lee PhD. We would also like to acknowledge the work of our research assistants Elaine Au, Kristen Cho, Amy Ding, Amber Javonero, Suyeon Lee, and Joyce Yeh, who contributed meaningfully to this important research while maintaining academic success in a rigorous Bachelor of Science in Nursing program. We would also like to acknowledge the work of Sarah Kaplan MSN, RN-BC, NPD-BC who contributed meaningfully to this project while completing her research practicum hours for a rigorous PhD in Nursing program.
Funding statement
This study was funded by 2 grants from the NIH Clinical and Translation Science Center HBSON-CTSC with a Community Engagement Project grant in the amount of $11,700 and a Pilot Award Seed Funding grant in the amount of $30,000. A third grant was received from the PSC-CUNY Research Award - Enhanced in the amount of $11,050. The funding organizations had no role in the design, implementation, interpretation, or reporting of this study.
Competing interests
None. The authors have no financial or personal relationships with other people or organizations that could inappropriately influence or bias this work. The authors have no commercial associations that could pose a conflict of interest or financial bias.
Institutional review board
This study was approved as “exempt with limited review status” by the institutional review board (IRB) of Hunter College of the City University of New York (protocol #2022-0542-Hunter) on September 9, 2022.