Whatever sophisticated numerical method cannot redeem a deficient design.

[ Continually refreshed experience ]



Necessity to estimate interaction terms in aggregated choice models has been long discussed in literature. In the late nineties when HB – Hierarchical Bayes techniques allowing robust estimation of individual utilities were introduced those discussions vanished. Most of the earlier interaction terms have been recognized as due to heterogeneity in the sample. It did not take long, and the estimation method designed specifically for CBC - Choice Based Conjoint have become commercially available from Sawtooth Software, Inc.

Hierarchical Bayes approach

A choice is an event conditioned by the choice-set. The choice probability is therefore conditional. At the same time it is supposed the behavior of each respondent is different but comes from some distribution of behavior in the sample. In this view, there are two hierarchical levels of conditioning. The upper level is related to the sample distribution and the lower level to the characteristic of the individual. Each of the relations has some likelihood. The main process is maximization of the likelihood obtained (usually) as product of the two likelihoods.

If we drew a respondent from the sample by random and knew the distribution of the behavior in the sample, we could predict the behavior of the individual with some probability. The highest probability we would expect at the maximum probability density of the sample behavior. The behavior probability density for an individual based on the sample distribution is called prior (pre-experimental) likelihood. At the beginning of an estimation we can start with some arbitrary, but reasonable sample distribution. Using this distribution we can modify parameters of the model for an individual so that the posterior likelihood is maximized. The important property of this step is that the likelihood maximum for an individual is shifted from the maximum for the sample. The more 'unique' the respondent is, the farther the shift will be from the sample mean. The density of the new distribution for an individual is termed posterior (post-experimental) likelihood. The new maximum is adopted as the point estimate for the respondent. The new point estimates are used to update the distribution in the sample, that serves for estimation of the prior likelihood for each individual. The process is repeated until no change in point estimates for all individuals is seen.

This general approach is known as the empirical Bayes method. The prior distribution in HB method may be thought of as a flexible constraint that gets stronger whenever the estimated parameters for an individual get farther from the estimated parameters for the sample. The ability of HB techniques to utilize the sample properties, and keep the individual estimates inside the credible interval of the sample, is sometimes ascribed as a "borrowing information from the sample".

HB-MCMC – Hierarchical Bayes Markov Chain Monte Carlo method

HB-MCMC, a top-level variant of the empirical Bayes method, is the workhorse in estimation of DCM models. It has many advantages compared to the traditional (non-Bayes) aggregate maximum likelihood estimation methods.


  • HB-MCMC is very resistant against getting stuck to a local maximum. The popular spur 'restart with different initial estimates' known from direct optimization routines for nonlinear problems is seldom needed.
  • Estimation programs for conjoint-related problems are commercially available (e.g. from Sawtooth Software, Inc.).
  • The method is based on random draws. Tens of thousands of draws are required. Time needed to converge is strongly dependent on the problem. It may be from several minutes up to several days for some exceptional studies with "soft, ill-behaved" data.
  • Commercially available software can handle only discrete and linear value-based attributes.
  • Estimation of individual utilities is standard.
    • Estimation of interactions is seldom needed.
    • However, if appropriate, interactions can still be estimated provided a sufficient amount of data is available.
  • Estimates are robust. In addition, robustness can be controlled by setting prior distribution parameter values.
  • Selectivity corrections can often be completely avoided since the population variance influences individual utilities only indirectly.


HB-MCMC estimation technique has only marginal disadvantages compared to its advantages. Most of them come from the data rather than from the technique itself.
  • Estimation by may take very long time (days or weeks) for some real MR problems. 
    • A simpler, but less competent method must be used. It has been shown that even very simple HB approach such as Empirical Bayes Data Augmentation can give results much better than those obtained from aggregate approach.
  • Estimated part-worths of (unacceptable) levels that never appeared in chosen profiles are usually too low. As this is due to the estimation algorithm, remedy is possible only through a design of the DCM-based exercise .
  • Commercial SW does not allow estimation parameters of non-linear attribute functions. An approximation as a piecewise function composed of linear segments must be used with the disadvantage of many more estimated parameters. 
  • In some real MR problems selectivity differences among individuals would lead either to (a) too broad, and therefore little efficient prior distribution or (b) to excessive influence of prior distribution leading to too similar individual estimates. There are several possibilities to avoid this problem. 
    • If the range of products is too broad,  CBS – Choice Based Sampling can be recommended to decrease the scope of the CBC - Choice Based Conjoint exercise for a respondent in order to obtain more focused individual data. Use of an HB-based estimation allows for merging the data from several DCM blocks.
    • An advanced alternative to the CBS (see above) is ACBC - Adaptive Choice Based Conjoint introduced by Sawtooth Software, Inc. (2010). The author has no hands-on experience with this method.
  • Using demographics as a segment covariate in estimation (or even a predictor of an expected behavior) is not reliable and often completely fails. Better results can be obtained by first finding the existing segments from analysis of behavioral data, e.g. group utilities, and only then to estimate individual estimates. A latent class approach is a direct way to the solution.
As aside

Restricted HB-MLE – Hierarchical Bayes Maximum Likelihood Estimation

A number of methods for posterior choice likelihood maximization has been suggested. Some of variants are closely related to the methods of latent classes or factors, e.g. mixture regression models, that have been shown to give results similar to HB-MCMC technique. An objection may be a lower certainty of reaching the global optimum due to presence of local maximums or saddle points. Several restarts are usually required.

A simple variant of restricted HB-MLE, the restriction being the diagonal variance matrix of the sample part-worth estimates, has been found useful in case of nonlinear formulations of part-worths of value-based attributes. As it is less robust than a method with full sample variance matrix, it is not suitable for sparse data, e.g. from a short study with only a few choices. The estimation procedure can be implemented using programming tools available in standard statistical packages but must be programmed for each study individually.