Some guidance for the choice of priors for Bayesian structural models in economic experiments

James R. Bland

doi:10.1017/esa.2025.6

Some guidance for the choice of priors for Bayesian structural models in economic experiments

Published online by Cambridge University Press: 18 July 2025

James R. Bland

Show author details

James R. Bland*: Affiliation:
Department of Economics, The University of Toledo, 2801 Bancroft St, Toledo, Ohio 43606, USA
*: Email: james.bland@utoledo.edu

Article contents

Abstract
Introduction
Example experiment and model
Considerations for choosing a prior
Conclusion
Supplementary material
Competing interests
Footnotes
References

Rights & Permissions

Abstract

Bayesian estimates from experimental data can be influenced by highly diffuse or “uninformative” priors. This paper gives examples of how diffuse priors can affect estimates, and discusses how practitioners can use their expertise to critique and select a prior.1

Keywords

Bayesian estimation Prior Structural estimation Economic Experiment C11 C18 C51 C90

Information

Type: Original Paper
Information: Journal of the Economic Science Association , First View , pp. 1 - 10

DOI: https://doi.org/10.1017/esa.2025.6 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2025. Published by Cambridge University Press on behalf of the Economic Science Association.

1. Introduction

It is becoming more common for Bayesian techniques to be used to analyze data from economic experiments.Footnote ² While their applications often aim to estimate similar parameters to their maximum-likelihood counterparts, such as parameters in utility functions, to estimate a Bayesian model the practitioner has one additional modeling choice: the prior. Here, we express our beliefs about the values of our model’s parameters before we observe any data. For Bayesian models, the prior is as much a part of the model as the likelihood: the likelihood is our formal statement of how our model’s parameters generate data conditional on parameter values, and the prior is our formal statement of our beliefs about the values of the model’s parameters before we observe new data.

Although we may have a good idea about what are likely and unlikely values for our model’s parameters, it is tempting to assign highly diffuse priors. If we want our estimates to be “driven mostly by our data”, and not be too influenced by our priors, then we may be tempted to choose priors that do not place too much mass on any small region of the parameter’s support. In the limit, this could be a uniform prior, but in practice this inclination typically manifests itself in the choice of an astronomically large prior variance. In addition to the obvious conflict between such diffuse priors and our domain expertise (e.g., our understanding of typical parameter values for our subject pool), these priors can also imply implausible distributions of economically meaningful quantities that we intend to calculate from our estimates.

In order to understand the implications of assigning highly diffuse priors, consider estimating a constant-only probit model:

\begin{equation*} \begin{array}{ll} \text{likelihood:} & y\mid \theta \sim \mathrm{Bernoulli}\left(\Phi(\theta)\right)\\ \text{prior: }& \theta\sim N(m_\theta,s^2_\theta) \end{array} \end{equation*}

where $\Phi(\cdot)$ is the standard normal cumulative density function, and m_θ and s_θ are the prior mean and standard deviation of the model’s parameter θ, respectively. Since our data are binary, we might be interested in reporting the posterior distribution of $\Phi(\theta)$, which is the probability of success in the Bernoulli process. Suppose that we are happy setting $m_\theta=0$, which sets the median of $\Phi(\theta)$ to 0.5. However since we do not want our prior for θ to influence our estimates too much, we decide to set s_θ to a large number. True, this will mean that the distribution of θ will be nicely spread out, but what does this say about $\Phi(\theta)$, which is the quantity we actually care about? Figure 1 shows 10,000 draws from this distribution, varying the prior standard deviation s_θ. Note that even for modest choices of this variable, there are large accumulations of probability mass around 0 and 1. A highly diffuse prior for θ actually provides a lot of information about $\Phi(\theta)$: it is unlikely that the probability of success is anything other than 0 or 1. Put differently, if our data-generating process was a series of coin flips, our prior belief with $s_\theta=16$ means that we believe we are either flipping a coin with two heads, or a coin with two tails!Footnote ³

Fig. 1 Draws from the prior distribution of $\Phi(\theta)$, varying the standard deviation of θ

This may seem like a rather convoluted example, however note that many of the structural models we take to our data are also models of coin-flipping. Our “coin” is a participant’s choice between two options, and our model describes how this coin is weighted based on conditions of the experiment (e.g. two lotteries that a participant is choosing between) and parameters that we aim to estimate (e.g. a parameter in a utility function). These models often involve non-linear transformations of data and parameters in ways which, without careful consideration, can make predictions as implausible as the above example.

Even if we are comfortable with dissonance between our actual priors as experts in the field, and the priors we use in estimation, highly diffuse priors can greatly influence posterior estimates of economically meaningful quantities, such as welfare measures (e.g. Monroe, Reference Monroe, Harrison and Ross2023). These quantities are typically not fundamental parameters of our model, but can be easily computed once we have estimated a model’s parameters. As such, it is unlikely that any prior we choose will have negligible effect on our posterior estimates.

This calls for the use of the “Principled Bayesian Workflow”, as described by Betancourt Reference Betancourt(2020) and demonstrated in Schad et al. Reference Schad, Betancourt and Vasishth(2021). This means, among other things, careful introspection and testing of our priors, not only as to what they imply about the distribution of our model’s fundamental parameters, but also what they imply about other, perhaps more important, quantities that we intend to investigate.Footnote ⁴ For example, our model’s fundamental parameter may describe risk preferences, but our aim might be to use this parameter to compute a certainty equivalent. We should then investigate our prior’s implications for both (i) the distribution of risk preferences, and (ii) the distribution of the certainty equivalents of interest. The former is important because calibrating our prior to plausible properties of our participants’ preferences will provide us with an appropriate sense of scale for the distribution of a parameter. The latter is important because it is the inferential goal: we may even be more interested in this distribution.

Using the task of estimating a popular model of choice under risk as an example, I demonstrate through simulation that assigning highly diffuse priors to parameters can imply prior distributions on economically meaningful quantities that all but rule out plausible values. Even moderately diffuse priors can introduce substantial noise in the point estimates of a model’s structural parameters. Put differently, for a typical number of decisions made by a participant in an economic experiment, priors are always going to weigh somewhat in the posterior estimates. It is therefore imperative that we (i) check that priors over our model’s parameters imply reasonable distributions over the quantities we care about, and (ii) check that our models, which include the prior, are going to recover the desired parameters or quantities of interest.

2. Example experiment and model

For the purposes of this exercise, I will use Harrison & Ng Reference Harrison and Ng(2016) as an example experiment. In this experiment, 111 participants made 80 choices over pairs of lotteries. Each lottery is characterized by three monetary prizes ($10, $30, and $50 for all decisions), and a probability distribution over these prizes. I use data from the first 80 decisions made in this experiment, which are representative of many binary choice experiments on decision-making under risk.

Perhaps the most common application of structural models to economic experiments is in estimating risk preferences assuming Expected Utility Theory (EUT). A simple one-parameter specification for the utility function over certain amounts of money is the constant relative risk-aversion (CRRA) utility function:

\begin{equation*} u_i(x)=\frac{x^{1-r_i}}{1-r_i} \end{equation*}

Participant i’s expected utility over lottery $\mathcal L$ is then:

\begin{equation*} U_i(L)=E\left[\frac{u_i(X)}{u_i(\overline x)-u_i(\underline x)}\mid \mathcal L\right] \end{equation*}

where $X\mid \mathcal L$ is the distribution of monetary prizes conditional on lottery $\mathcal L$ being chosen, and $\overline x$ and $\underline x$ are the maximum and minimum prizes possible. The denominator of this expression makes the “contextual utility” (Wilcox, Reference Wilcox2011) normalization.Footnote ⁵

We then combine this with a probabilistic choice rule, which assigns a probability distribution over the lotteries that participant i has to choose from. A popular choice for this is the (Fechner) logistic choice rule, or “softmax”, which for a binary choice between lotteries L(eft) and R(ight) is:

\begin{equation*} p(L\mid r_i,\lambda_i)=\frac{\exp(\lambda_i U_i(\mathcal L))}{\exp(\lambda_i U_i(\mathcal R))+\exp(\lambda_i U_i(\mathcal L))} \end{equation*}

Note that the above expression is the likelihood if we were to just observe one decision. So for a dataset of T decisions made by participant i, the likelihood of all of their choices $\{y_{i,t}\}_{t=1}^{T}$ over lottery pairs $\{\mathcal L_{i,t},\mathcal R_{i,t}\}_{t=1}^{T}$ is (assuming that each choice is independently distributed):

\begin{equation*} p(y_i\mid r_i,\lambda_i)=\prod_{t=1}^{T}p(y_{i,t}\mid r_i,\lambda_i) \end{equation*}

The above model and parametric functions, coupled with decisions from an appropriately-designed experiment, is all the information we need to estimate r_i and λ_i if we were using maximum likelihood estimation. Bayesian estimation also requires a prior for r_i and λ_i. That is, instead of maximizing $p(y_i\mid r_i,\lambda_i)$, we wish to determine the distribution:

\begin{equation*} p(r_i,\lambda_i\mid y_i)\propto p(y_i\mid r_i,\lambda_i)p(r_i,\lambda_i) \end{equation*}

and so we must also specify $p(r_i,\lambda_i)$.

In the next section, I focus on the task of estimating one model for each participant separately. This is in contrast to focusing on hierarchical models, which are where Bayesian techniques seem to be making the most headway into analysis of experimental data. This is not because hierarchical models are impervious to this prior selection problem.Footnote ⁶ Rather, I focus on participant-level estimation to demonstrate the techniques in the simplest situation where one needs to carefully scrutinize and calibrate their priors. Furthermore, given the computational burden of hierarchical models, this focus on individual-level estimation permits a deeper look into the sampling properties of the estimators under consideration.

3. Considerations for choosing a prior

3.1. Priors informed by our understanding of the parameters

Our expert understanding of our model’s parameters should help us to form our priors. This could be possible because the theory behind our model can help us interpret these parameters, or because we have some knowledge of reasonable values for this parameter from past experiments. In order to choose our prior, it is useful to break down the decision-making problem into two parts:

1. What is an appropriate family of distribution for the prior? This usually comes down to choosing a family of distributions with an appropriate support. In this part, we should rule out nonsensical realizations of our parameters, but we should also be careful not to rule out realizations that are possible, however unlikely.
2. Within this family of distributions, what distribution should we choose?

To answer (1), note that any $r_i\in \mathbb R$ could constitute a valid utility function, so it is important that we choose a family of distributions that covers the real number line.Footnote ⁷ When this is the case, it is common to choose the normal family of distributions, so assume a prior as follows:

\begin{equation*} r_i\sim N(m_r,s_r^2) \end{equation*}

For λ_i, which measures the choice precision of the probabilistic decision rule, negative values are nonsensical: this would mean that the participant will choose the lottery that yields less utility more than 50% of the time. Additionally, we do not want to completely rule out very large positive values of λ_i, as these approximate the special case where the participant always chooses the action with greatest utility. Therefore we seek a family of distributions that covers the positive real numbers.Footnote ⁸ A common choice in these situations is the log-normal distribution:

\begin{equation*} \log\lambda_i\sim N(m_\lambda,s_\lambda^2) \end{equation*}

At this point, we can move to the second question: choosing the prior parameters m_r, s_r, m_λ, and s_λ. Start with priors for the risk-aversion parameter r. Since we are using a normal prior for this parameter, m_r is its mean and median, and s_r is its standard deviation. We can use our understanding of previous economic experiments to guide our choices of these parameters. For example Holt & Laury Reference Holt and Laury(2002) (see their Table 3), for their “Low [incentives] real” treatment, place 75% of participants in the range $-0.15 \lt r \lt 0.68$. Setting $m_r=0.27$, the midpoint of this region, seems as good a place as any to start with.

Now consider the scale parameter s_r. If we believed that the participants in Holt & Laury Reference Holt and Laury(2002) and our experiment are drawn from a similar population, and that the experiments are similar enough, we could choose s_r to match the 75% of participants who fall in this range, that is:Footnote ⁹

\begin{equation*} 0.75=\Phi\left(\frac{0.68-0.27}{s_r}\right)-\Phi\left(\frac{-0.15-0.27}{s_r}\right) \end{equation*}

which yields $s_r=0.36$. However one may be worried that this prior is “too informative”, and hence wish to choose a larger value of s_r. The flip-side of this, choosing s_r to be smaller than this, understandably may also lead to problems with our estimator: in this case we would be presuming too much information about r before observing the data. Taken to its extreme, setting $s_r\to 0$ would negate any reason for running an experiment: we already know a lot about r_i, and so should spend our experimental budget elsewhere! But there is also a problem with setting s_r too large. To see this, Table 1 shows the distribution of risk preferences implied by three different prior scale parameters s_r, holding location m_r constant. Holt & Laury Reference Holt and Laury(2002) classify participants into nine groups based on their risk preferences. This table shows the prior distribution of these categories implied by different prior standard deviations. Note that even for $s_r=3.6$, a ten-fold increase from the suggested value, almost all of the prior is classified into the most extreme categories of “highly risk loving” and “stay in bed”. That is, by assuming these highly diffuse priors, we are assuming that we will observe very few participants in the range of risk preferences that experimental economists would typically expect. Therefore, $r_i\sim N(0.27,0.36^2)$ is a reasonable choice for the risk aversion parameter.

Table 1 Distribution across risk preference categories. The rightmost three columns show the implied prior distribution accorss these categories for three different prior standard deviations. The “Holt and Laury (2002)” column shows the estimated distribution from the “Low real” treatment of this experiment

3.2. Priors informed by our understanding of transformations of the parameters

While we may have some good intuition to help us formalize our prior for some parameters, for others we may need to look elsewhere for guidance. In our example model, one might have trouble expressing prior uncertainty over the choice precision parameter λ. This parameter may be difficult to interpret because it is a nuisance parameter (for particular inferential goals): we may not be interested in estimating it, but it is necessary to include it in the model in order to estimate the parameter(s) we do care about. Additionally, the units (inverse utility) may be difficult to interpret. As such, we may know less about what constitutes “reasonable”, “likely” and “unreasonable” values for λ than we do for r. Fortunately all structural models with a likelihood necessarily make predictions about behavior, and we can explore how different values of our parameters affect our model’s predictions.

For λ, the probability of choosing the utility-maximizing is equal to $(1+\exp(-\lambda |\Delta|))^{-1}$, where Δ is the difference in expected utility between the two lotteries. We can use this to explore the implications of different values of λ for a given lottery pair. In Figure 2 I plot this probability as a function of λ for a choice between $30 for sure, and a 50-50 mix between $10 and $50. This Figure shows how the prediction changes with λ for three different values of r. Note that based on our calibrated prior for r, r = 0.27 (dotted line) is the prior mean, and $r=-0.09$ (solid line) and r = 0.63 (dashed line) are one prior standard deviation either side of this. Looking at the dotted line, which corresponds to our calibrated prior mean of r = 0.27, most of the plausible choice probabilities lie in the region $\lambda\in(10,100)$. That is for λ > 100 choices seem very precise, and for λ < 10 choices seem very noisy. Given our choice of a log-normal prior for λ, we can calibrate this prior to these endpoints so that the prior probability that $\lambda\in(10,100)$ is (say) 95% as follows:

\begin{equation*} \begin{array}{ll} \log 100 &=m_\lambda +1.96 s_\lambda\\ \log 10 &=m_\lambda -1.96 s_\lambda\\ \implies m_\lambda &\approx 3.45,\quad s_\lambda\approx 0.59 \end{array} \end{equation*}

For perspective, this implies a prior mean of $E(\lambda)=\exp(3.45+0.5\times 0.59^2)\approx 37$, which is shown as the vertical red line in Figure 2.

Fig. 2 Probability of choosing the utility-maximizing lottery between (i) $30 for sure and (ii) a 50-50 mix of $10 and $50

Even if we have a good understanding of how to interpret our parameters without transforming them into predictions or other quantities, investigating the implications of our prior on quantities of interest can be helpful. For example with an experiment investigating risk preferences like Harrison & Ng Reference Harrison and Ng(2016), we may be more interested in what our parameters imply about a certainty equivalent than on the parameters themselves. To this end, in Figure 3 I plot the prior distribution of the certainty equivalent of a lottery that is a 50-50 mix of $10 and $50. I do this for the calibrated prior (left panel) and an “inflated” prior (right panel), where I increase the prior standard deviations by a factor of 10 to demonstrate the effect of a highly diffuse prior. For the inflated prior, the certainty equivalent’s prior distribution has modes close to the endpoints of their allowable values ($10 and $50). This may not be a desirable implication of our prior, so checking for these implications is important.

Fig. 3 Prior distributions of a certainty equivalent for the calibrated prior and a prior inflating the standard deviations by a factor of 10

3.3. Priors guided by the performance of our estimator

We should also evaluate our prior, coupled with the rest of our model and experiment design, on how well it achieves our inferential goal. Since our structural model includes a likelihood, we can simulate data from our experiment for known parameter values, and since we also have a prior, we can also simulate data from our experiment consistent with our prior beliefs. It is therefore possible, before running the experiment, to investigate the sampling properties of our estimator. This might not only identify problems with our prior if they exist, but also provides us with an opportunity to re-design the experiment if this exercise reveals that we are not estimating our desired quantities as well as we would like.

To demonstrate this in a Monte Carlo exercise, I draw 1,000 sets of parameters from the calibrated prior. For each of these draws, I then simulate a dataset using these parameters and estimate the model four times: (i) with the calibrated prior, (ii) with both prior standard deviations inflated by a factor of 2, (iii) with both prior standard deviations inflated by a factor of 8, and (iv) using maximum likelihood estimation. The root mean squared error of posterior means of the model’s parameters and the certainty equivalent are shown in Table 2.Footnote ¹⁰ The maximum likelihood estimator (right-most column) performs by far the worst of the five estimators under consideration, followed by the estimator using the most inflated prior. The Bayesian estimator with the calibrated prior estimates these quantities with smaller error.

Table 2 Root mean squared error of posterior mean estimates and maximum likelihood estimates. The certainty equivalent is for a lottery that mixes 50-50 over prizes $ \$ 10$ and $ \$ 50$

Figure 4 shows the estimated certainty equivalents against their true values. A model that recovers its parameters well will have posterior means (dots) close to the 45^∘ line (red dashed line). A good example of this is the top-left panel of this Figure, which shows the results when using the calibrated prior. Estimates are slightly pulled toward to prior mean, which is to be expected for a Bayesian estimator. Looking at the bottom two panels, we can see that posterior estimates are more noisy (posterior means are more spread out vertically), less precise (credible regions are larger), and exhibit more bias (the smoothed means are further away from the 45^∘ line).

Fig. 4 Simulated estimates of a certainty equivalent (vertical axis) against their true values (horizontal axis) for various prior specifications and maximum likelihood estimates. Dots show posterior means, vertical lines show a 50% Bayesian credible region (25th-75th percentile). No expression of uncertainty is shown for the maximum likelihood estimates. The red dashed line is a 45^∘ line. The blue curve shows a smoothed mean of the posterior means

4. Conclusion

This paper provides guidance for choosing priors when estimating structural models in economic experiments using Bayesian techniques. The choice of prior is not inconsequential for sample sizes used in typical economic experiments, and so care must be taken to calibrate this prior so that (i) it reflects our underlying knowledge of our parameters and other quantities before we observe the data, and (ii) we are satisfied with our model’s ability to estimate these parameters from our experiment. However one should not see this as always being in favor of highly informative priors: the exercise of prior calibration should help us with expressing our increased uncertainty without implying implausible predictions.

Prior calibration plays (at least) two useful functions. Firstly, we formally state and justify what we believe is the economically meaningful range of our models’ parameters. Secondly, through simulating parameters, then data from the calibrated prior, we explore the implications of our estimators for datasets which are hopefully similar to those we will see in our experiment. In doing so, we are able to assess our experiment’s ability to provide us with data that will answer our research question, and whether we should modify our design before collecting data.

This paper focuses on estimating models of behavior at the individual participant level. However the techniques it develops need not be limited to these applications. In particular, when estimating models that aggregate information from more than one individual, such as hierarchical models, the econometrician must choose priors over the model’s population-level “hyperparameters”, rather than the individual-level parameters.Footnote ¹¹ Highly diffuse priors on these hyperparameters can also imply implausible prior distributions of individual-level parameters and the quantities of interest that can be derived from them. Hierarchical models estimated in such studies provide us with suggestions about what reasonable priors are for individual-level parameters. Using posterior mean estimates from Gao et al. Reference Gao, Harrison and Tchernis(2020) suggests that $r\sim N(0.338,0.268^2)$ could be an appropriate prior for the CRRA parameter r.Footnote ¹² This is not too dissimilar from the prior $r\sim N(0.27,0.36^2)$ I calibrated to Holt & Laury Reference Holt and Laury(2002), and is in fact more “informative”.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/esa.2025.6.

Competing interests

The author declares no conflicts of interest.

Footnotes

¹ This paper benefited greatly from comments by Glenn Harrison, Lionel Page, and three anonymous referees.

² See for example Alam et al. Reference Alam, Georgalos and Rolls(2022); Baillon et al. Reference Baillon, Bleichrodt and Spinu(2020); Balcombe & Fraser Reference Balcombe and Fraser2015; Bland (Reference Bland2019a, Reference Bland2019 Reference Blandb, Reference Bland2023); Bland & Rosokha (Reference Bland and Rosokha2021); Gao et al. Reference Gao, Harrison and Tchernis(2020, Reference Gao, Harrison and Tchernis2023; Nilsson et al. Reference Nilsson, Rieskamp and Wagenmakers(2011); Pachur et al. Reference Pachur, Schulte-Mecklenbeck, Murphy and Hertwig(2018).

³ If we in fact suspect that numbers close to zero or one are very likely, then a large value for s_θ may be an appropriate prior.

⁴ See Gelman et al. Reference Gelman, Simpson and Betancourt(2017) for a general discussion of considerations for choosing a prior, and the importance of choosing a prior.

⁵ Aside from being empirically relevant, this normalization also permits thinking about the choice precision parameter λ_i, discussed below, independently of the CRRA parameter r_i. This is because normalized utility differences are scaled so they must be between zero and one in absolute value.

⁶ Consider, for example, assigning a prior to a “hyperparameter” in a hierarchical model that governs the mean of risk aversion in a CRRA utility function. Choosing a highly diffuse prior for this parameter will imply a prior belief that the mean level of risk aversion in the population will be well outside the values that we would generally expect with probability close to one. This problem arises for exactly the same reasons as discussed in the example related to Table 1.

⁷ This is the case for Harrison & Ng Reference Harrison and Ng(2016), where the smallest lottery prize is strictly greater than zero. In other experiments where prizes can be zero, one can either restrict the support of the prior or use a different utility function.

⁸ Bayesians often refer to this as a “weakly informative” prior.

⁹ Balcombe & Fraser (Reference Balcombe and Fraser2015) calibrate a prior for a similar utility function parameter so that 75% of the probability mass lies in the concave, risk-averse range.

¹⁰ Further details of the simulation process can be found in Section 1 of the Online Appendix. Section 2 of the Online Appendix reports the results of a second simulation where simulated datasets are drawn from an inflated prior. This serves to demonstrate sampling properties of the estimators when the prior is too informative relative to the true distribution of parameters. Code to replicate these simulations can be found at https://github.com/JamesBlandEcon/ChoiceOfPriors.

¹¹ For example, see Gao et al. (Reference Gao, Harrison and Tchernis2023); Gao et al. Reference Gao, Harrison and Tchernis(2020).

¹² See their Table 5, Specification V.

References

Alam, J., Georgalos, K., & Rolls, H. (2022). Risk preferences, gender effects and bayesian econometrics. Journal of Economic Behavior & Organization, 202, 168–183.10.1016/j.jebo.2022.08.013CrossRef Google Scholar

Baillon, A., Bleichrodt, H., & Spinu, V. (2020). Searching for the reference point. Management Science, 66(1), 93–112.10.1287/mnsc.2018.3224CrossRef Google Scholar

Balcombe, K., & Fraser, I. (2015). Parametric preference functionals under risk in the gain domain: a bayesian analysis. Journal of Risk and Uncertainty, 50(2), .10.1007/s11166-015-9213-8CrossRef Google Scholar

Betancourt, M. 2020. https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html.Google Scholar

Bland, J. R. (2019a). How many games are we playing? An experimental analysis of choice bracketing in games. Journal of Behavioral and Experimental Economics, 80, 80–91. https://doi.org/10.1016/j.socec.2019.03.011.CrossRef Google Scholar

Bland, J. R. (2019b). Measuring and comparing two kinds of rationalizable opportunity cost in mixture models. Games, 11(1), .10.3390/g11010001CrossRef Google Scholar

Bland, J. R. (2023). Bayesian inference for quantal response equilibrium in normal-form games. Games and Economic Behavior..10.1016/j.geb.2023.05.005CrossRef Google Scholar

Bland, J. R., & Rosokha, Y. (2021). Learning under uncertainty with multiple priors: experimental investigation. Journal of Risk and Uncertainty, 62(2), .10.1007/s11166-021-09351-yCrossRef Google Scholar

Gao, X. S., Harrison, G. W. and Tchernis, R. (2020). Estimating risk preferences for individuals: a bayesian analysis. CEAR Working Paper 2020-15. Center for the Economic Analysis of Risk, Robinson College of Business, Georgia State University, https://cear.gsu.edu/wp-2020-15-estimating-risk-preferences-for-individuals-a-bayesian-approach/).Google Scholar

Gao, X. S., Harrison, G. W., & Tchernis, R. (2023). Behavioral welfare economics and risk preferences: a bayesian approach. Experimental Economics, 26(2), 273–303. https://doi.org/10.1007/s10683-022-09751-0.CrossRef Google Scholar

Gelman, A., Simpson, D., & Betancourt, M. (2017). The prior can often only be understood in the context of the likelihood. Entropy, 19(10), 555.10.3390/e19100555CrossRef Google Scholar

Harrison, G. W., & Ng, J. M. (2016). Evaluating the expected welfare gain from insurance. Journal of Risk and Insurance, 83(1), 91–120.10.1111/jori.12142CrossRef Google Scholar

Holt, C. A., & Laury, S. K. (2002). Risk aversion and incentive effects. American Economic Review, 92(5), .10.1257/000282802762024700CrossRef Google Scholar

Monroe, B. A. (2023). The welfare consequences of individual-level risk preference estimation. In Harrison, G. W., & Ross, D. (eds), Models of Risk Preferences: Descriptive and Normative Challenges. Bingley, UK, Emerald.Google Scholar

Nilsson, H., Rieskamp, J., & Wagenmakers, E. -J. (2011). Hierarchical bayesian parameter estimation for cumulative prospect theory. Journal of Mathematical Psychology, 55(1), 84–93. http://dx.doi.org/10.1016/j.jmp.2010.08.006.CrossRef Google Scholar

Pachur, T., Schulte-Mecklenbeck, M., Murphy, R. O., & Hertwig, R. (2018). Prospect theory reflects selective allocation of attention. Journal of Experimental Psychology: General, 147(2), 147.10.1037/xge0000406CrossRef Google Scholar PubMed

Schad, D. J., Betancourt, M., & Vasishth, S. (2021). Toward a principled bayesian workflow in cognitive science. Psychological Methods, 26(1), 103.10.1037/met0000275CrossRef Google Scholar

Wilcox, N. T. (2011). Stochastically more risk averse: a contextual theory of stochastic discrete choice under risk. Journal of Econometrics, 162(1), 89–104.10.1016/j.jeconom.2009.10.012CrossRef Google Scholar

Fig. 1 Draws from the prior distribution of $\Phi(\theta)$, varying the standard deviation of θ

Fig. 2 Probability of choosing the utility-maximizing lottery between (i) $30 for sure and (ii) a 50-50 mix of $10 and $50

Fig. 3 Prior distributions of a certainty equivalent for the calibrated prior and a prior inflating the standard deviations by a factor of 10

Table 2 Root mean squared error of posterior mean estimates and maximum likelihood estimates. The certainty equivalent is for a lottery that mixes 50-50 over prizes $ \$ 10$ and $ \$ 50$

Fig. 4 Simulated estimates of a certainty equivalent (vertical axis) against their true values (horizontal axis) for various prior specifications and maximum likelihood estimates. Dots show posterior means, vertical lines show a 50% Bayesian credible region (25th-75th percentile). No expression of uncertainty is shown for the maximum likelihood estimates. The red dashed line is a 45∘ line. The blue curve shows a smoothed mean of the posterior means

Bland supplementary material

File 1.4 MB

Article contents

Some guidance for the choice of priors for Bayesian structural models in economic experiments

Abstract

Keywords

Information

1. Introduction

2. Example experiment and model

3. Considerations for choosing a prior

3.1. Priors informed by our understanding of the parameters

3.2. Priors informed by our understanding of transformations of the parameters

3.3. Priors guided by the performance of our estimator

4. Conclusion

Supplementary material

Competing interests

Footnotes

References

Bland supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests