"One man’s constant is another man’s variable."
Alan Perlis
The answer to the title might be "the more the better". As CBC tasks belong to the most hated by respondents, designers tend to take the opposite direction. Based on some own experience and a lot of others, two rules of thumb for a set of attributes of about the same length (cardinality, to be precise) and up to 5 alternatives per choice set have been suggested and used for a minimal number of tasks since about 2002.
The rule is based on showing each level (after a correction for number of constraints, if any) twice and multiplying the obtained number of task by the number of attributes. For example, for 6 attributes of which the longest one has 5 (effective) levels, and choice sets are made of 4 alternatives, the number of tasks should be (5 x 2) / 4 x 6 = 15 or more. In case of a MXD (MaxDiff with best choices only) for 16 statements, number of 4statement tasks should be at least (16 x 2) / 4 x 1 = 8.
In a choice task with J alternatives, the number of preferences of the chosen alternative over the other alternatives is (J  1). The number of estimated parameters in CBC is the total number of effective levels (after a correction for number of constraints, if any) minus number of attributes. The rule is based on the number of preferences in a way analogous to degrees of freedom in linear regression. Number of tasks should be at least the number of parameters divided by the number of preferences, and doubled. For the same examples as above the number of tasks would be at least 16 or 10, respectively.
As aside
 Of the two rules above, this rule is preferred and used by us for simple selfcontained CBC exercises, nevertheless with some upward corrections for number of alternatives higher than 5.
Both rules are only indicative for tasks with not more than 5 alternatives and should be used with caution. Reasons are many. The weight of choice in a maximum likelihood estimation does not grow linearly with the number of alternatives in a choice set. Attributes have different lengths and different importance in real cases. For designs with many alternatives featuring FMCG/CPG products, specific alternatives, overlapping classes of products and, namely, problems exploiting several independent discrete choice blocks serving as soft constraints, the simple rules are not satisfactory. A more substantiated way of controlling the number of various types of tasks belonging to a DCM study had to be devised.
Let consider a choice task with J alternatives and indicator variable y_{j} for
the alternative j. The multinomial model requires a choice always happens. If a jth
alternative is chosen, y_{j} = 1, and y_{¬j} = 0. Estimation of parameters
the probabilities depend on is done by maximization of LL (loglikelihood) function for all choices
obtained in the exercise. For our purpose, the part of the conventional multinomial LL related to a
choice task can be written as
LL =
J
∑ j = 1 y_{j} × ln(p_{j})

: 
J
∑ j = 1 p_{j} = 1;
J
∑ j = 1 y_{j} = 1

(1) 
where p_{j} are the instantaneous values of choice probabilities being estimated for the items from the choice task. Note that the probabilities always sum to 1. However, only the value for the actual choice is used in estimation while all the other values are neglected.
The null hypothesis at the state of designing a CBC exercise is our belief that all the items in the choice task
have the same chance to be selected. The initial LL value for a CBC choice task is therefore
LL_{ini} =
J
∑ j = 1 1/J × ln(1/J)

: 
J
∑ j = 1 (1/J) = 1

(2) 
When the CBC exercise is answered and the conventional LL maximized, the true LL becomes
LL =
J
∑ j = 1 p_{j} × ln(p_{j})

: 
J
∑ j = 1 p_{j} = 1

(3) 
where p_{j} values are the estimated choice probabilities.
Values of these LL functions differ according to the state they represent. 

In discrete phenomena problems, maximum likelihood estimation is actually minimization of information entropy as a measure of information disorder. Eq. 3 is the basic formula for negatively taken Shanon information entropy expressed in nat units. To obtain entropy in more comprehensible bit units, a value in nat units must be divided with ln(2) = 0.693147. For a given settings of a CBC exercise, we need entropy values representing two states, namely those before and after a choice.
Entropy of the initial state of a choice task is described by the null hypothesis and given as LL_{ini} from eq. 2. Entropy of the final state of a choice task (eq. 3) is unknown. However, we can compute the highest achievable value. This is obtained for a hypothetical alternative j chosen with probability p_{j} = 1. As predicted probability for all other alternatives must be p_{¬j} = 0, the corresponding LL = 0, and so is the information entropy. The maximal gain of information H(J) obtainable from a CBC task with J alternatives is the difference between the two values. Since entropy of the final state is set to zero, to get the obtainable information value we need to consider only the number of alternatives in the task.
H(J) = LL_{ini} / ln(2)  (4) 
The amount of information H(J) computed this way is as if the decision maker had an absolute preference among the J alternatives in each task. We call this value "preference bits" to distinguish it from number of preferences mentioned above. Values of H(J) for choice set sizes up to 10 alternatives are in the table below.
J  Alternatives in a CBC task:  2  3  4  5  6  7  8  9  10 
H(J)  Obtainable preference bits:  1.00  1.58  2.00  2.32  2.58  2.81  3.00  3.17  3.32 
The increase in information value with number of alternatives in a task has diminishing tendency. The values are easy to remember. Doubling the number of alternatives increases the information preference by 1 bit. With the knowledge of these values, the power of a CBC exercise can be assessed for some number of tasks and alternatives per task. A reasonable approach is to to ensure the necessary number of preference bits per estimated parameter.
Existence of maximum likelihood estimate requires parameter separability, finiteness and uniqueness. Separability is achieved by a choice from uncorrelated alternatives in a choice set, uniqueness by the linear model, and finiteness by Bayesian estimation. The simplest case of a single CBC choice set with two alternatives is equivalent to estimation of 1 individualbased parameter by gaining 1 preference bit from the answer. When there are P parameters to be estimated with the same credibility from several choice tasks, it seems natural that the total information gain should be P preference bits. The minimal number T_{min}(J) of choice tasks each with J orthogonal alternatives should be
T_{min}(J) = P / H(J)  (5) 
This result is intuitive and needs to be confirmed.
Quality comparison of models with different number of parameters estimated with ML method is possible using an information criterion. Most often cited are Akaike (AIC), Schwartz Bayesian (BIC) and HannanQuinn (HQC) criteria. In the format related to a single observation point, all the criteria can be written as
c(P) = 2 × LL(P) / n + φ × P / n  (6) 
where n is number of observations and φ is penalty coefficient for the number of parameters P. The penalty coefficients are φ_{AIC} = 2 : n → ☞, φ_{BIC} = ln(n), and φ_{HQC} = 2×ln(ln(n)). In our view of an individual having answered T(J) independent tasks with J alternatives, the loglikelihood is expressed in terms of information gain.
LL(P, J) = T(J) × H(J) × ln(2)  (7) 
Our aim is to obtain the minimal number of observations, i.e. the number of choice sets with J alternatives described with P parameters that would provide the same quality of parameter estimates as a choice from a choice set with 2 alternatives described with 1 parameter. For these two options we have these criteria:
c(1) = 2 × 2 × 1 × ln(2) / 1 + φ × 1 / 1  :  P = 1, J = 2, n = 1  (8) 
c(PJ) = 2 × T(J) × H(J) × ln(2) / T(J) + φ × P / T(J)  :  n = T(J)  (9) 
Setting penalty coefficient to a constant φ = 2×ln(2) ≅ 1.386 and assuming the same quality of both estimates c(1) = c(PJ), eq. 5 is obtained.
Value of the used φ reflects n_{BIC} = 4 or n_{HQC} ≅ 7.39, both the values being in the range used in marketing research practice for an individual.
The minimal number of CBC tasks estimated from eq. 5 for the former two examples (5^6 and 16^1 designs, 4 alternatives in task) is 12 and 7.5, respectively. The numbers are lower than from the rules of thumb (15 and 8 or 16 and 10). A more comprehensive comparison of the three rules is in the table below, where just saturated regular orthogonal arrays with strength 2 are used as examples.
Orthogonal array OA(N, L^A) 
Number of estimated parameters P 
Number of CBC alternatives J 
Number of CBC tasks by rule of  

Longest attribute  Number of preferences  Preference bits  
OA(1, 2^1)  1  2  2  2  1 
OA(4, 2^3)  3  2  6  6  3 
OA(8, 2^7)  7  2  14  14  7 
OA(8, 2^7)  7  4  7  4.7  3.5 
OA(9, 3^4)  8  3  8  8  5.0 
OA(12, 2^11)  11  4  11  7.3  5.5 
OA(16, 4^5)  15  4  10  10  7.5 
OA(18, 3^7)  14  3  14  14  8.8 
OA(25, 5^6)  24  5  12  12  10.3 
OA(64, 4^21)  63  8  21  18  21 
OA(81, 9^10)  80  9  20  20  25.2 
OA(121, 11^12)  120  12  22  21.8  33.5 
For small CBC studies with 15 or less parameters, the rules of thumb (by the longest attribute and by number of preferences) seem to suggest unnecessarily large numbers of tasks. For larger studies with not more than about 60 parameters and up to 5 alternatives in choice set, the rules of thumb give only slightly higher numbers than the information preference rule. If attributes are of substantially different lengths, the rule using the longest attribute length might be preferred.
Insufficient number of tasks generated with the rules of thumb can be seen for CBC studies with number of parameters higher than 60 or more than 5 alternatives in choice set. Demand for such studies, nearly always with substantially differing attribute lengths, has become quite common. Huber and Zwerina (1996) show that choice design efficiency requires orthogonality, level balance, minimal overlap and utility balance. Even if these requirements are met in the standard CBC design, level partworths of long attributes are always determined with lower precision than those of short attributes, simply because of their less frequent appearance in tasks. A remedy, known as soft constraints, is to create auxiliary blocks of choice questions for levels of long attributes and add them to the CBC tasks to get an extended set of tasks for estimation.
Using the soft constraints is not without caveats. Design of choice tasks from each block of questions should be set so that no attribute (or some of its levels) would prevail over other attributes. Because product utility in the standard DCM model is a sum of partworths, precision and accuracy of all determined parameters in the study should be approximately equal. This can be achieved by setting the information gain from each contributing data block so that the required total information gain is balanced over all attributes. This approach is possible on the assumption of mutual independence of attributes (with no or negligible interactions), independence of choices and orthogonality of alternatives, which are the conditions essential for additivity of information gain from each choice. This idea was the main motivator for using the theory of information .
The standard method of DCM parameter estimation is the hierarchical Bayesian maximum likelihood method. The number of tasks obtained from eq. 5 should be viewed as a threshold under which the sample means will get excessive influence and sample covariances might get ill conditioned. It is important to realize that some substantive conditions such as known or expected narrow consideration set of items, typically classes of products, products with special features, etc., make the effective length of attributes shorter.
It is impossible to develop generalized instructions for estimation of attribute effective length. A proper knowledge of the studied problem is essential. Let consider a simple brandprice study. If it is supposed an individual selects at most 5 brands from the tested brands, the effective length of the brand attribute can be taken as 8 to have some reserve. If the brands belong to different price categories and/or several package sizes are tested, 5 CBC design levels (e.g. 10%, 5%, +/0%, +5% and +10%) may span over a broad range of prices which might require about 20 (or more) distinct price values. To have a separate partworth for each of the values, an unconstrained estimate for each value is unnecessary. Price partworth in a conjoint study can be always constrained as nothing like "too cheap to be good" exists. The effective length of the price attribute can be guessed as a presumed "virtual" number of linear segments with which the curvature of partworths on logarithm of price could be approximated. We usually set the effective length between 4 and 8.
In our experience, estimates with satisfactory ability to discriminate between individuals are obtained with the number of tasks from eq. 5 if the effective length of attributes is respected. The discrimination can be improved with increasing the number of tasks by 50%. Higher increase is believed to have negligible effect, but no systematic research has been done in this respect.
Information entropy has the advantage of being additive not only over a single exercise, but also over several choice exercises and, in principle, over the whole interviewed sample provided the choices are independent stochastic and alternatives in choice sets are uncorrelated.
Application of the theory of information allowed for some extensions and refinements of DCM approach: 
