## Problem

Product utility values obtained from an analysis of a DCM experiment, are interval scaled "raw" utilities. The differences in utilities reflect the preferences among the products they represent but the utility value provide no measure of willingness of the individual to purchase the product. The question inviting to a choice is often understood as "if the shown products were the only available and you had to choose one" leads to a conditional choice having only loose linkage with the potential of the product on the market.

Use of raw utilities in a simulation is useful for comparison of competing products in a preference share simulation. However, share models do not account for market acceptability of the products in the choice set. If there is no product acceptable for an individual, the assumption of 100% total share for the individual will make the unacceptable products influence the computed shares. The lowest utilities are always estimated with the highest error and this error is projected in the simulation. Weighting individuals by their average consumption does not help in this case.

 In general, DCM analytic methods without additional non-DCM assumptions or without accounting external market data, lead to relative preference values one can call "raw" utilities. Raw utilities are interval scaled with the reference (zero) point chosen arbitrarily. Only relative comparisons among simulated values are meaningful. All estimates are conditional. They represent "trials" given the conditions the individuals were interviewed at. The estimates are subject to error due to the unknown willingness to buy under the real market conditions.

## Solution

Should a utility reflect probability of some event or intention, typically purchase, it must be calibrated. A calibration transforms the "raw" utility to a value that reflects probability of an event, typically purchase, or an intention to act, as stated by the individual. This allows to compute the "as if stated" acceptance of any simulated product and thus estimate a more appropriate contribution of an individual to the simulated aggregate values. The most important aspect is that utilities get the reference (zero) value that reflects probability of a reference event or intention. The transformation of raw utilities is usually linear with two parameters determining the location and scaling.

 Calibrated utilities differ from raw utilities. Calibrated utilities are ratio scaled with the reference (zero) utility defined for a reference event, most often the respondent's statement "cannot decide" with acceptance equal to 50%.  The acceptance computed from a calibrated utility adopts the meaning of the calibration statement. It is conditional in respect to the question formulation in the calibration task. In most cases, the acceptance can be understood as a choice probability given "there is nothing else available". While a calibration is not necessary for a preference share estimation it may improve it thanks to a more realistic scaling of the individual utilities. A calibration is essential in the estimation of the competitive potential of a product. It can be omitted if a new product is intended as a replacement for an old one when the old product can serve as a reference.

As a rule of thumb, a change in the transformation location factor (a shift of values) is often much more important than that of the scaling (changing differences between utility values, i.e. sensitivity).

A calibration is sometimes replaced by aggregate weights of the sample segments obtained from an external source such as market data. This is equivalent to shifting utilities by a value common to all individuals in the segment. However, if differences in acceptance values of products between individuals in the segment are not uniform, the results of calibration may be misleading.

Some examples of the importance of utility calibration for marketing decisions can be found in a public source.

## Calibration assumptions and execution

The probabilities entering a calibration may be the revealed ones (based on the market or diary data) but are more often obtained in the form of stated acceptances. They must be probabilities of the assumed actions. This underlines the importance of wording of the question in a calibration task. The concerned action is often conditional, e.g. "Given you were to purchase a product of the kind, how likely would you ...".

The selection of calibration profiles should be made with a good knowledge of the current and future market expectations. Especially profiles with an excessive attractiveness should be avoided as they would make respondent overly refuse all other profiles. The actual implementation of asking calibration questions is very variable. The most common format is, for example, implemented in Sawtooth conjoint module. Provided managerially designed profiles are available, an efficient format of calibration is SCE - Sequential Choice Exercise that relies on ranking. The ranking fully eliminates tied (equal) answer values without an excessive prolongation of the interview.

 A calibration can be omitted from the study based on a DCM model in the following cases. All subjects in the study are users of the simulated product(s). Only differences in preferences of the products rather than some actual values are searched for. Only a preference share simulation is desired and satisfactory. The range of the tested products covers nearly the whole market category.
As aside
• Calibration procedures implemented in most commercial programs are based on linear regression of stated utilities using least squares (OLS - Ordinary Least Squares). OLS method too often fails with calibration data, and experienced analysts avoid it. More reliable is regression of stated utilities weighted by multinomial densities of the calibration profiles, or, better, an equivalent Bayesian regression.
• Non-linear calibration transformations of product utilities are possible but hardly ever used. On the other hand, nonlinear models for part-worths of quantitative attribute levels are quite common.
• No information about differences between levels of two different attributes can be obtained by any calibration method or technique. If this is desirable the MXD - Maximum Difference Scaling method for product aspects should be used.

## Calibration parameters

The calibration process introduces positional and scaling factors into multinomial logit model. If appropriate, additional components such as the outer goods may be added to the model.

• Positional (location) factor is essentially a constant to be added to the raw utilities. It reflects the probability above which a product is accepted as worth consideration by the respondent. It can be based on a neutral attitude of the subject to an event or intention such as the statement "cannot decide", with the probability equal to 1/2, i.e. 50%. This probability is conditional and should be understood in a way close to "given there is nothing else available to be purchased". The corresponding utility has the zero logit value. If an acceptance threshold is to be applied in a competitive potential estimation, the zero value should reflect a much higher stated probability value to match the fact that there are many acceptable products on the market.
• Scaling factor is essentially a constant the raw utility is multiplied with. Sensitivity of a respondent to changes in the attributes of the offered stimuli in the laboratory situation differs from that on the market. It may be corrected by asking additional questions on a metric scale convertible to a probability measure.
• Choice-based utilities often have scaling factor that is close to the market value and only a subtle modification is needed.
• Metric-based utilities have a vague scaling as the factor depends on the metrics of the answer variable.
• Utility of the outer goods, sometimes named "synthetic none", is a value related to the products not included in the test. It can be estimated only in rare instances of well defined calibration data and their relationship to the market data.

Whatever experimental method is used for a calibration, stated data are always censored. E.g., in a purchase intention question using a 5-step Likert scale, all concepts below certain utility value will get the the answer "definitely no", and all concepts above some (high) utility value the answer "definitely yes". Traditional estimation technique (such as OLS - Ordinary Least Squares) cannot fit the data correctly even if the data were completely noiseless because (1) the distribution of calibration answers is truncated and (2) ties between experimental values are common. Both issues can be rectified in a calibration carried out as a SCE - Sequential Choice Exercise and using Bayesian regression. Bayesian priors of the Likert scale steps estimated from choice orders in SCE have proved especially useful in cases with many ties.

As aside
• Experimenting with extended Likert or "percent of likelihood" scales did not show any observable improvement in calibration fit. Respondents had tendency to cumulate their answers into a relatively narrow region of a broader scale with no improvement in discrimination between tested items. Neither the "dual-response none" method has proved useful as it often leads to evasive choices such as he cheapest item in the choice set. A symmetrical 5-point Likert scale (with the alternatives "definitely yes", "rather yes", "neither-nor", "rather no" and "definitely no") bounded in the interval [-4, 4] (logit units) seems to be a fixed star. Non-response is usually omitted from the answering options as it is supposed anybody can have beliefs about the calibrated items.
• The wording of a calibration question is crucial. The formulation is usually conditional as it should reflect some predetermined action, typically purchase.
• The "none" alternative often used in CBC - Choice Based Conjoint  is ignored in calibration as it has an undefined meaning.
• The numerical procedure is based on two-parameter logistic model of IRT - Item Response Theory where the "ability" trait is the stated willingness of the person to make a decision (such as a purchase, switch of brand or provider, churn, etc.) conditional on the item. To make up for the truncated data distribution, a version of weighted empirical Bayes regression is used to estimate the IRT parameters.

## Properties of calibrated utilities

 The most distinctive value of a calibration is the possibility to estimate market acceptance and competitive potential of a product in a what-if simulation or separate estimation procedure. Calibrated utilities are ratio scaled, i.e. have a defined zero value set at the selected statement level. In case of purchase intention, the reference statement is usually "cannot decide". Only scaling factor of attribute level part-worths can be calibrated. Positional factor is related to the the utility of the product profile, i.e. the sum of the part-worths. Acceptances computed in a simulation correspond to evaluations obtained in the interview from the calibration statements. Averaged acceptances of the simulated profiles can be compared one to another. This is in contrast to utilities the mean of which cannot be computed as an average. A calibration is usually based on statements rather than on market data (e.g. panel or scanner). The results of a simulation are therefore also "stated" rather than "revealed".

A calibration allows to reflect conditions of external effects, and may substantially change the results of a simulation compared to the simulation based on "raw" utilities. However, wrong assumptions, question conditions, or a numerical procedure, may invalidate results of a study.

An error implicit to any calibration procedure is the assumption that the stated purchase probability is proportional to the expected purchase probability. Both measures are conditional given the choice set, the question condition, assumed situation, and the choice mode in respect to an assumed action (e.g. intentional, occasional or impulsive purchase, trial or repeat) which influence both the test and market events. If possible, it is useful to correct the calibration bias at least in part using additional data for the subjects.

## Future development

Calibration methods have evolved as a reaction to the demand for more market-like numbers obtained from a simulation. The inherent disadvantage of many calibration methods is use of acceptance as the target variable. Acceptance, in principle, is not a measure directly related to the expected sales. It is just a characteristic of a product as seen and stated by respondents in the interview. The value 50% reflecting hesitation if to buy or not is fallacious. Usually, there are many other products on the market with much higher acceptance. A product with stated acceptance 50% or less has, in most real cases, nearly no chance to be successful.

In contrast, the CSDCA - Common Scale Discrete Choice Analysis, with the possibility to determine and include acceptability threshold of product aspects, allows for non-compensatory estimation and simulation using nested logit model. A perceptance, being 0% or negative if the aspect or product is unacceptable, can be estimated. It is believed this approach can replace the standard calibration approach and give more realistic view of the expected behavior of customers.