Hierarchical Bayes Estimation

Necessity to estimate interaction terms in aggregated choice models has been long discussed in literature. In the late nineties when HB – Hierarchical Bayes techniques allowing robust estimation of individual utilities were introduced those discussions vanished. Most of the earlier interaction terms have been recognized as due to heterogeneity in the sample. It did not take long, and the estimation method designed specifically for CBC - Choice Based Conjoint have become commercially available from Sawtooth Software, Inc.

Hierarchical Bayes approach

A choice is an event conditioned by the choice-set. The choice probability is therefore conditional. At the same time it is supposed the behavior of each respondent is different but comes from some distribution of behavior in the sample. In this view, there are two hierarchical levels of conditioning. The upper level is related to the sample distribution and the lower level to the characteristic of the individual. Each of the relations has some likelihood. The main process is maximization of the likelihood obtained (usually) as product of the two likelihoods.

If we drew a respondent from the sample by random and knew the distribution of the behavior in the sample, we could predict the behavior of the individual with some probability. The highest probability we would expect at the maximum probability density of the sample behavior. The behavior probability density for an individual based on the sample distribution is called prior (pre-experimental) likelihood. At the beginning of an estimation we can start with some arbitrary, but reasonable sample distribution. Using this distribution we can modify parameters of the model for an individual so that the posterior likelihood is maximized. The important property of this step is that the likelihood maximum for an individual is shifted from the maximum for the sample. The more 'unique' the respondent is, the farther the shift will be from the sample mean. The density of the new distribution for an individual is termed posterior (post-experimental) likelihood. The new maximum is adopted as the point estimate for the respondent. The new point estimates are used to update the distribution in the sample, that serves for estimation of the prior likelihood for each individual. The process is repeated until no change in point estimates for all individuals is seen.

This general approach is known as the empirical Bayes method. The prior distribution in HB method may be thought of as a flexible constraint that gets stronger whenever the estimated parameters for an individual get farther from the estimated parameters for the sample. The ability of HB techniques to utilize the sample properties, and keep the individual estimates inside the credible interval of the sample, is sometimes ascribed as a "borrowing information from the sample".

HB-MCMC – Hierarchical Bayes Markov Chain Monte Carlo method

HB-MCMC, a top-level variant of the empirical Bayes method, is the workhorse in estimation of DCM models. It has many advantages compared to the traditional (non-Bayes) aggregate maximum likelihood estimation methods.

Strengths

Weaknesses

As aside

A potentially underestimated aspect of a conjoint study is the parameter identification problem. In plain language, the model, design, number of parameters, the sample, and type and number of collected data must match each other so that the parameters are estimable with a satisfactory accuracy and precision. HB-MCMC software is very robust and has built-in provisions to seemingly converge even with very bad data. This can happen when researcher yields to the pressure of the client and accepts too many attributes, excessively complicated design, too many prohibitions between attribute levels, short interview, small or overly inhomogeneous sample, etc. The software usually sets the inestimable values to zero or some other value, e.g. an average over the sample. The latter case is due to the "borrowing" data from other respondents and may lead to an excessive aggregation. As distribution of preferences in the sample is nearly never Gaussian (or at least symmetrical) such results may be strongly biased and even in discrepancy with the collected data.

Restricted HB-MLE – Hierarchical Bayes Maximum Likelihood Estimation

A number of methods for posterior choice likelihood maximization has been suggested. Some of variants are closely related to the methods of latent classes or factors, e.g. mixture regression models, that have been shown to give results similar to HB-MCMC technique. An objection may be a lower certainty of reaching the global optimum due to presence of local maximums or saddle points. Several restarts are usually required.

A simple variant of restricted HB-MLE, the restriction being the diagonal variance matrix of the sample part-worth estimates, has been found useful in case of nonlinear formulations of part-worths of value-based attributes. As it is less robust than a method with full sample variance matrix, it is not suitable for sparse data, e.g. from a short study with only a few choices. The estimation procedure can be implemented using programming tools available in standard statistical packages but must be programmed for each study individually.

Problem

Hierarchical Bayes approach

HB-MCMC – Hierarchical Bayes Markov Chain Monte Carlo method

Strengths

Weaknesses

Restricted HB-MLE – Hierarchical Bayes Maximum Likelihood Estimation