Whatever sophisticated numerical method cannot redeem a deficient design.

[ Continually refreshed experience ]

## Problem

Necessity to estimate interaction terms in aggregated choice models has been long discussed in literature. In the late nineties when HB – Hierarchical Bayes techniques allowing robust estimation of individual utilities were introduced those discussions vanished. Most of the earlier interaction terms have been recognized as due to heterogeneity in the sample. It did not take long, and the estimation method designed specifically for CBC - Choice Based Conjoint have become commercially available from Sawtooth Software, Inc.

## Hierarchical Bayes approach

A choice is an event conditioned by the choice-set. The choice probability is therefore conditional. At the same time it is supposed the behavior of each respondent is different but comes from some distribution of behavior in the sample. In this view, there are two hierarchical levels of conditioning. The upper level is related to the sample distribution and the lower level to the characteristic of the individual. Each of the relations has some likelihood. The main process is maximization of the likelihood obtained (usually) as product of the two likelihoods.

If we drew a respondent from the sample by random and knew the distribution of the behavior in the sample, we could predict the behavior of the individual with some probability. The highest probability we would expect at the maximum probability density of the sample behavior. The behavior probability density for an individual based on the sample distribution is called prior (pre-experimental) likelihood. At the beginning of an estimation we can start with some arbitrary, but reasonable sample distribution. Using this distribution we can modify parameters of the model for an individual so that the posterior likelihood is maximized. The important property of this step is that the likelihood maximum for an individual is shifted from the maximum for the sample. The more 'unique' the respondent is, the farther the shift will be from the sample mean. The density of the new distribution for an individual is termed posterior (post-experimental) likelihood. The new maximum is adopted as the point estimate for the respondent. The new point estimates are used to update the distribution in the sample, that serves for estimation of the prior likelihood for each individual. The process is repeated until no change in point estimates for all individuals is seen.

This general approach is known as the empirical Bayes method. The prior distribution in HB method may be thought of as a flexible constraint that gets stronger whenever the estimated parameters for an individual get farther from the estimated parameters for the sample. The ability of HB techniques to utilize the sample properties, and keep the individual estimates inside the credible interval of the sample, is sometimes ascribed as a "borrowing information from the sample".

## HB-MCMC – Hierarchical Bayes Markov Chain Monte Carlo method

HB-MCMC, a top-level variant of the empirical Bayes method, is the workhorse in estimation of DCM models. It has many advantages compared to the traditional (non-Bayes) aggregate maximum likelihood estimation methods.

### Strengths

 HB-MCMC is very resistant against getting stuck to a local maximum. The popular spur 'restart with different initial estimates' known from direct optimization routines for nonlinear problems is seldom needed. Estimation programs for conjoint-related problems are commercially available (e.g. from Sawtooth Software, Inc.). The method is based on random draws. Tens of thousands of draws are required. Time needed to converge is strongly dependent on the problem. It may be from several minutes up to several days for some exceptional studies with "soft, ill-behaved" data. Commercially available software can handle only discrete and linear value-based attributes. Estimation of individual utilities is standard. Estimation of interactions is seldom needed. However, if appropriate, interactions can still be estimated provided a sufficient amount of data is available. Estimates are robust. In addition, robustness can be controlled by setting prior distribution parameter values. Selectivity corrections can often be completely avoided since the population variance influences individual utilities only indirectly.

### Weaknesses

 HB-MCMC estimation technique has only marginal disadvantages compared to its advantages. Most of them come from the data rather than from the technique itself. Estimation by may take very long time (days or weeks) for some real MR problems.  A simpler, but less competent method must be used. It has been shown that even very simple HB approach such as Empirical Bayes Data Augmentation can give results much better than those obtained from aggregate approach. Estimated part-worths of (unacceptable) levels that never appeared in chosen profiles are usually too low. As this is due to the estimation algorithm, remedy is possible only through a design of the DCM-based exercise . Commercial SW does not allow estimation parameters of non-linear attribute functions. An approximation as a piecewise function composed of linear segments must be used with the disadvantage of many more estimated parameters.  In some real MR problems selectivity differences among individuals would lead either to (a) too broad, and therefore little efficient prior distribution or (b) to excessive influence of prior distribution leading to too similar individual estimates. There are several possibilities to avoid this problem.  If the range of products is too broad,  CBS – Choice Based Sampling can be recommended to decrease the scope of the CBC - Choice Based Conjoint exercise for a respondent in order to obtain more focused individual data. Use of an HB-based estimation allows for merging the data from several DCM blocks. An advanced alternative to the CBS (see above) is ACBC - Adaptive Choice Based Conjoint introduced by Sawtooth Software, Inc. (2010). The author has no hands-on experience with this method. Using demographics as a segment covariate in estimation (or even a predictor of an expected behavior) is not reliable and often completely fails. Better results can be obtained by first finding the existing segments from analysis of behavioral data, e.g. group utilities, and only then to estimate individual estimates. A latent class approach is a direct way to the solution.
As aside
• A potentially underestimated aspect of a conjoint study is the parameter identification problem. In plain language, the model, design, number of parameters, the sample, and type and number of collected data must match each other so that the parameters are estimable with a satisfactory accuracy and precision. HB-MCMC software is very robust and has built-in provisions to seemingly converge even with very bad data. This can happen when researcher yields to the pressure of the client and accepts too many attributes, excessively complicated design, too many prohibitions between attribute levels, short interview, small or overly inhomogeneous sample, etc. The software usually sets the inestimable values to zero or some other value, e.g. an average over the sample. The latter case is due to the "borrowing" data from other respondents and may lead to an excessive aggregation. As distribution of preferences in the sample is nearly never Gaussian (or at least symmetrical) such results may be strongly biased and even in discrepancy with the collected data.

## Restricted HB-MLE – Hierarchical Bayes Maximum Likelihood Estimation

A number of methods for posterior choice likelihood maximization has been suggested. Some of variants are closely related to the methods of latent classes or factors, e.g. mixture regression models, that have been shown to give results similar to HB-MCMC technique. An objection may be a lower certainty of reaching the global optimum due to presence of local maximums or saddle points. Several restarts are usually required.

A simple variant of restricted HB-MLE, the restriction being the diagonal variance matrix of the sample part-worth estimates, has been found useful in case of nonlinear formulations of part-worths of value-based attributes. As it is less robust than a method with full sample variance matrix, it is not suitable for sparse data, e.g. from a short study with only a few choices. The estimation procedure can be implemented using programming tools available in standard statistical packages but must be programmed for each study individually.