DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

transcript

Aus dem Institut für Tierzucht und Tierhaltung

der Agrar- und Ernährungswissenschaftlichen Fakultät

der Christian-Albrechts-Universität zu Kiel

___________________________________________________________________

DEVELOPMENT OF A MULTI-CRITERIA EVALUATION

SYSTEM TO ASSESS ANIMAL WELFARE

Dissertation

zur Erlangung des Doktorgrades

der Agrar- und Ernährungswissenschaftlichen Fakultät

der Christian-Albrechts-Universität zu Kiel

vorgelegt von Ing. agr. Paula Martín Fernández

aus Madrid, Spanien

Dekan: Prof. Dr. Eberhard Hartung

Erster Berichterstatter: Prof. Dr. Joachim Krieter

Zweiter Berichterstatter: Prof. Dr. Eberhard Hartung

Tag der mündlichen Prüfung: 23.01.2015

___________________________________________________________________

Die Dissertation wurde mit dankenswerter finanzieller Unterstützung aus Mitteln des

Bundesministeriums für Bildung und Forschung im Rahmen des Kompetenznetzes der Agrar-

und Ernährungsforschung PHÄNOMICS angefertigt.

A Mis Padres

TABLE OF CONTENTS GENERAL INTRODUCTION………………………………………………………………..1 CHAPTER ONE

Comparison of methods to develop a multi-criteria evaluation system to assess animal welfare…………………………………………………………………………………...........5 CHAPTER TWO

Development of a multi-criteria evaluation system to assess growing pig welfare…………33

CHAPTER THREE

Validation of a multi-criteria evaluation model for animal welfare…………………………61

Annex………………………………………………………………………………………...89

GENERAL DISCUSSION………………………………………………………………….121

GENERAL SUMMARY……………………………………………………………………128 ZUSAMMENFASSUNG…………………………………………………………………...131 ACKNOWLEDGMENTS…………………………………………………………………..134 CURRICULUM VITAE…………………………………………………………………....135

GENERAL INTRODUCTION Concern about livestock living conditions has increased considerably in the last few

years. Consumers are increasingly linking animal welfare indicators with food safety

and quality. These consumers’ preferences create economic incentives for stakeholders

to meet animal welfare standards, as established by legislation or voluntary certification

schemes (Vapnek and Chapman, 2010). It is a generally accepted fact that animal

welfare is a multi-dimensional concept which compromises several aspects such as the

absence of thirst, hunger, discomfort, disease, pain, injuries and stress, and the presence

of normal behavioural expressions (the classical five freedoms (Farm Animal Welfare

Council (FAWC), 1992)). The EU Welfare Quality® (WQ) project developed several

protocols for the assessment of welfare of cattle, pigs and poultry (Botreau et al., 2009).

The inputs for the WQ protocols are on farm welfare measures described in the

protocols. Information at measure level may be useful for farm management purposes;

however, labelling purposes require a certain level of aggregation of the measures into

overall scores. Due to this fact, a multi-criteria evaluation model is required for the

evaluation of an animal unit (farm, slaughterhouse). The WQ protocols proposed a

multi-criteria evaluation system to aggregate the information of the welfare measures

into an overall assessment. Different operators (e.g., I-spline functions, decision trees,

weighted sums or Choquet integrals) were used for this purpose (Botreau et al., 2008).

The main drawback of the multi-criteria evaluation system proposed in the WQ

protocols is that it lacks of transparency and flexibility with respect to the I-spline

functions and the different aggregation operators used. There are other ways of

approaching the multi-criteria evaluation problem that differ from the ones used by the

WQ multi-criteria evaluation model, e.g., the multi-attribute utility theory (MAUT),

ELECTRE or the Analytic Hierarchy Process (AHP). In the MAUT, uni-dimensional

utility functions, which correspond to each criterion, are aggregated into a single global

utility function combining the whole of the criteria (Keeney and Raiffa, 1976), whereas

by using ELECTRE (outranking procedure) only the preference relations of pairs of

alternatives are aggregated (Roy, 1971); whilst in the Analytic Hierarchy Process

‘children’ nodes of a common ‘parent’ are aggregated using pair-wise comparisons

(Saaty, 1980). This thesis focuses on the MAUT. The application of MAUT consists of

two separated steps, the utility function determination and the aggregation function

determination. A large number of methods have been proposed to determine the utility

function in MAUT, for instance the standard sequences method described by Bouyssou

et al., (2000) and the MACBETH method described by Bana e Costa et al., (1999).

Examples of aggregation functions in MAUT are the weighted sum, the ordered

weighted average (Yager, 1989) and the Choquet integral (Choquet, 1953, Murofushi

and Sugeno, 1989, Grabisch, 1997).

Chapter One contains a comparison of different MAUT methods which can be applied

to produce an overall evaluation of animal welfare in the context of certification

schemes. This was performed with regard to the potential of these methodologies to

solve the main difficulties found in the literature faced by such a model, which are that

criteria may have different importance, and interactions may exist between them. This is

a key aspect since the welfare criteria may not fully compensate for each other (Botreau

et al., 2007). Two utility function determination methods (the standard sequences

method and the MACBETH method), and two aggregation functions (the weighted sum

and the Choquet integral (CI)) were compared. In the framework of MAUT, the use of

the MACBETH method together with the CI seemed to be the model which better

solved the difficulties presented.

In order to compare the different methodologies which could be used in the context of

MAUT, a theoretical model of a welfare assessment for growing pigs was used

considering only four criteria, good feeding, good housing, good health and appropriate

behaviour. Due to this fact, in Chapter Two, the application of the MACBETH method

together with the CI based on a real welfare assessment, such as the WQ protocol for

growing pigs (Welfare Quality, 2009), was presented by means of examples.

Throughout this study the different multi-criteria methods used in the WQ protocol

were also compared with the unique methodology proposed in this study.

After the development of any multi-criteria evaluation system, a validation of the model

must be carried out in order to prove that it works as intended in practical conditions

(Qureshi et al., 1999). In Chapter Three, the MAUT methodology proposed in

Chapter Two was implemented to aggregate welfare data which was collected in

different growing pig farms in Schleswig-Holstein, Germany. In total, 44 observations

were carried out. The whole WQ assessment protocol for growing pig farms was

implemented in each observation. The results obtained for each observation were

compared with the results obtained by implementing the multi-criteria methodology

proposed in the WQ protocol. Also, the influence of variations in the welfare measure

values was estimated in order to assess the sensitivity of the model.

Overall, the thesis provides a multi-criteria evaluation model for animal welfare, the use

of which has been implemented in the context of the Welfare Quality® protocol for

growing pigs.

References

Bana e Costa CA, de Corte JM and Vansnick JC 1999. The MACBETH approach:

Basic ideas, software, and an application, in: Meskens, N., Roubens, M., (Eds.),

Advances in Decision Analysis. Kluwer Academic Publishers, Book Series:

Mathematical Modelling: Theory and Applications, vol. 4, pp.131-157.

Botreau R, Bracke MBM, Perny P, Butterworth A, Capdeville J, van Reenen CG and

Veissier I 2007. Aggregation of measures to produce an overall assessment of

animal welfare. Part 2: Analysis of constraints. Animal 1, 1188-1197.

Botreau R, Capdeville J, Perny P and Veissier I 2008. Multi-criteria evaluation of

animal welfare at farm level: an application of MCDA methodologies.

Foundations of Computing and Decision Science. 33, 1-18.

Botreau R, Veissier I and Perny P 2009. Overall assessment of animal welfare: Strategy

adopted in Welfare Quality. Animal Welfare. 18, 363-370.

Bouyssou D, Marchant T, Perny P, Pirlot M, Tsoukias A and Vincke P 2000. Evaluation

and decision models: A critical perspective. Kluwer, Dordrecht.

Choquet G 1953. Theory of capacities. Annales de l’Institut Fourier. 5, 131-295.

Farm Animal Welfare Council 1992. FAWC updates the five freedoms. The Veterinary

Record, 17, 357.

Grabisch M 1997. K-Order additive discrete fuzzy measures and their interpretation.

Fuzzy sets and systems. 92, 167-189.

Keeney LR and Raiffa H 1976. Decisions with multiple objectives: Preferences and

values tradeoffs. Wiley, New York.

Murofushi T and Sugeno M 1989. An interpretation of fuzzy measure and the Choquet

integral as an integral with respect to a fuzzy measure. Fuzzy sets and systems.

29, 201-227.

Qureshi ME, Harrison SR & Wegener MK 1999. Validation of multi-criteria analysis

models. Agricultural Systems. 62, 105-116.

Roy B 1971. Problems and methods with multiple objective functions. Mathematical

Programming. 1, 239-266.

Saaty TL 1980. The Analytic Hierarchy Process: Planning, priority setting, resource

allocation. McGraw-Hill, New York.

Vapnek, J and Chapman M 2010. Legislative and regulatory options for animal welfare.

FAO Legislative study 104, FAO, Rome.

Welfare Quality 2009. Welfare Quality® Assessment Protocol for Growing Pigs.

Lelystad: Welfare Quality® Consortium.

Yager R 1988. On ordered weighted averaging operators in multi-criteria decision

making. IEEE Transactions on Systems, Man and Cybernetics. 18, 183-190.

CHAPTER ONE

Comparison of methods to develop a multi-criteria

evaluation system to assess animal welfare

P. Martín 1, I. Traulsen 1, C. Buxadé 2 and J. Krieter 1

1 Institute of Animal Breeding and Husbandry, Christian-Albrechts-University, Kiel, Germany

2 Animal Production Department, Polytechnic University, Madrid, Spain

Abstract

The aim of this paper was to create a model to review different methodologies which

can be applied to produce an overall evaluation of animal welfare in the context of

certification schemes. This was performed with regard to the potential of these

methodologies to solve the main difficulties found in the literature faced by such a

model. Welfare Quality® distinguishes four welfare criteria (good feeding, good

housing, good health and appropriate behaviour). Data for growing pigs farms was

generated, with each farm receiving one score for each welfare criteria. Ten farms were

used as learning data and the complete dataset generated was used to exemplify the

differences between the methods. The multi-attribute utility theory (MAUT) was used

to produce an overall value of welfare. The utility functions and the aggregation

function were constructed in two separated steps. First, utility functions for each

criterion were determined in two different ways, using the standard sequences method

(SS) and the MACBETH software. In the second step, the weighted sum (WS) and the

Choquet integral (CI) were used as aggregation functions. The utilities derived from

MACBETH allowed us to model more adequately the preferences of the decision-maker

regarding the different importance of the criteria and the interaction between them than

the SS method. A comparison of the WS and the CI results obtained from each method

was carried out. The results showed that there were interactions between the criteria,

assuming independence among the criteria led to important differences in the

classification of the farms.

Keywords: Animal Welfare, assessment, methods, pigs.

1 Introduction

Concern about livestock living conditions has increased considerably in the last few

years and consumers have also been increasingly linking animal welfare indicators with

food safety and quality. These consumer preferences create economic incentives for

stakeholders to meet animal welfare standards, as established by legislation or voluntary

certification schemes (Vapnek and Chapman, 2010). It is a generally accepted fact that

animal welfare is a multidimensional concept which compromises several aspects such

as the absence of thirst, hunger, discomfort, disease, pain, injuries and stress, and the

presence of normal behavioural expressions (the classical five freedoms (Farm Animal

Welfare Council (FAWC), 1992)). Due to this fact the assessment of animal welfare

must be based on several measures. Information at measure level may be useful for farm

management purposes; however, labelling purposes require a certain level of

aggregation into overall scores (Blokhuis et al., 2010). To determine an overall level of

animal welfare, measures need to be combined. Although it has been argued that

science should not attempt to perform overall welfare assessments because value

judgements are inherently involved (Fraser, 1995), others state that an overall welfare

assessment is not arbitrary and a high level of accuracy can be achieved (Bracke et al.,

1999). In spite of the different viewpoints, various models have been developed to

assess overall levels of animal welfare. More recently, Welfare Quality (WQ) has

developed several protocols for the overall assessment of the welfare of cattle, pigs and

poultry (Welfare Quality, 2009).

A common feature of all the approaches in multi-criteria decision-making is the need

for an aggregation operator. In the multi-attribute utility theory (MAUT), uni-

dimensional utility functions which correspond to each criterion are aggregated into a

single global utility function combining all the criteria (Keeney and Raiffa, 1976),

whereas in ELECTRE (outranking procedure) the preference relations on pairs of

alternatives are aggregated (Roy, 1971) and in the Analytic Hierarchy Process (AHP)

(Saaty, 1980). Examples of aggregation functions in MAUT are the weighted sum

(WS), the ordered weighted average (Yager, 1988) and the Choquet integral (CI)

(Murofushi and Sugeno, 1989). The most common aggregation tool still used today is

the WS, with all its well-known drawbacks. The WS can be used as an aggregator when

mutual preferential independence among criteria is assumed. However, in practice, this

mutual preferential independence is rarely verified. In order to be able to take into

account the interaction between the criteria, Sugeno (1974) proposed substituting the

weight vector involved in the calculation of the WS for a fuzzy measure (also called

capacity). The fuzzy integrals, such as the CI, are defined from the concept of a fuzzy

measure. The capacity with respect to the CI can be seen as an extension of the weight

vector with respect to the WS (Grabisch et al., 2008). The distinguishing feature of a CI

is that it is able to represent a certain kind of interaction, ranging from redundancy

(negative interaction) to synergy (positive interaction) (Grabish, 1996).

The aim of this study was to create a model to compare different methodologies which

can be used in the context of the MAUT. These could then be applied to develop a

multidimensional estimation system in order to produce an overall evaluation of animal

welfare in the context of certification schemes. In the framework of MAUT, a

comparison was undertaken between two methods of utility function determination (the

standard sequences method and the MACBETH method) and two aggregation methods

(the WS and the CI). These different methods were used with the objective of finding

the method which better solves the main difficulties found in the literature faced by

such a model. The main difficulties the model faces are that criteria may have different

levels of importance, and interactions may exist between them, this being a key aspect

that the welfare criteria may not fully compensate for each other (Botreau et al., 2007b).

2 Material and methods

2.1 Data

In order to compare the different methodologies which can be used in the context of

MAUT, a theoretical model of a welfare assessment for growing pigs was used

considering four criteria, good feeding (F), good housing (Ho), good health (He) and

appropriate behaviour (B), corresponding to the four main WQ principles. Each of these

criteria was assessed by a different number of measures. Values of the measures were

established which check whether each criteria could be 0 and 1 (absence or presence).

In this way, and considering a linear combination (sum) of the values of the measures to

produce the criteria value, if a criterion is assessed by three measures, it can take four

different values: 0, 1, 2 and 3. Thus, good feeding was defined by four measures, and

thus could vary between 0 (worst) and 4 (best), good housing by 7 measures varying

between 0 and 7, good health by 13 measures, varying between 0 and 13, and

appropriate behaviour between 0 and 4, assessed by 4 measures. These scales were

elicited in this way instead of establishing intervals between 0 and 100, so they

represent raw data which was not interpreted in terms of welfare and can allow the

study of the potential of the different methods to work in a future step of the project

with measures collected in different units or scales.

Data from ten farms regarding the four criteria were selected as learning data (Table 1)

from which the decision-maker (DM) had to express his preferences. These consisted of

giving a partial weak order over the set of weights related to each criterion (W in Table

1), the sign of interaction between the 6 pairs of criteria ((F, Ho), (F, He), (F,B), (Ho,

He), (Ho, B), (He, B)) and a partial weak order (R) over the farms (Table 1) taking into

account both the different importance of the criteria and the interactions between them.

Farms a, b, c, d and e were selected to assess how the DM perceived the different

importance of the criteria. For these 5 farms, 3 of the criteria were assigned a good

value and only one of the criteria corresponded to a medium value. Farms g, h, i and j

were selected to assess how the DM perceived the interaction between a bad grade in

one criterion and medium values in the other criteria.

A second dataset consisting of 2,800 farms from the combination of all the possible

values for the four criteria was generated in order to obtain an absolute impression of

the influence of using the different methods not limited to the relative comparison of a

small dataset (learning data).

Table 1. Criteria values for each selected farm (learning data) and initial preferences of

the decision-maker.

Farm Feeding¹ Housing² Health³ Behaviour4 R

a 2 5 10 3 1

b 3 3 10 3 2

c 3 5 7 3 3

d 3 5 10 2 4

e 3 5 6 3 5

f 2 3 6 2 6

g 0 3 6 2 7

h 2 1 6 2 8

i 2 3 4 2 9

j 2 3 6 1 10

W + ++ +++ +++

¹Feeding values can vary between 0 (worst) and 4 (best).

² Housing values can vary between 0 (worst) and 7 (best).

³ Health values can vary between 0 (worst) and 13(best). 4 Behavioural values can vary between 0 (worst) and 4 (best).

R: DM’s ranking over the farms

W: Initial notions of the DM about the importance of the weights.

Bad grade; medium grade; good grade.

2.2 General methodology

The MAUT was used to produce an overall value of welfare starting from the data

regarding the four main criteria. The utility functions and the aggregation functions

were constructed in two separated steps (Figure 1). A comparison was made between

the two methods of utility function determination, i.e. the standard sequences method

(SS) described by Bouyssou et al. (2000) and the MACBETH method described by

Bana e Costa et al. (1999), and also two aggregation methods, i.e. the WS and the CI.

Figure 1. General methodology followed in the study.

The results obtained via the different utility function determination methods and the

aggregation operators were also compared. The rankings of the overall utilities obtained

for the 10 farms selected as learning data were compared. However, in order to obtain

an absolute impression - not limited to the relative comparison of a small dataset - of the

influence of taking the interactions between the criteria into account, four welfare

categories were defined which match the ones proposed by Welfare Quality (2009):

unacceptable (overall utility < 20), acceptable (overall utility >20 but < 55), enhanced

(overall utility > 55 and < 80) and excellent (overall utility > 80). The MACBETH

overall utilities obtained for the complete dataset (2,800 farms) through the WS and the

CI were classified into one of the four categories and the number of farms assigned to

each welfare category were compared between aggregation methods.

2.3 MAUT - Utilities determination

For the utility function determination, each criterion was considered separately. The

utility function ui represents the preferences of the DM over the criteria Xi. The utilities

can be seen as providing numerical representation of the attractiveness of the different

values of the criteria for the DM. A large number of methods have been proposed to

Feeding Utility

MACBETH

Housing Utility

MACBETH

Health Utility

MACBETH

Behaviour Utility

MACBETH

Choquet integral

Weighted Sum

Overall Utility

Individual utilities determination Aggregation into an overall utility

determine the utility functions in an additive multi-attribute utility model, see von

Winterfeltd and Edwars, (1986) for an accessible account of such methods. There are

essentially two families of methods, one based on direct numerical estimations and the

other on indifference judgements. We chose two methods from the latter category, the

MACBETH method (Bana e Costa et al., 1999) and the SS method (Kranztz et al.

(1971), von Winterfeldt and Edwards (1986), Wakker (1989), Bouyssou et al (2000));

since utilities which are spontaneous might not be as reliable as utilities which follow a

methodology to construct them, see Bouyssou et al. (2006) and Bana e Costa et al.

(2004) for a deeper review. These two methods were chosen for two reasons; first we

wanted to compare a methodology based on qualitative judgements (MACBETH) with

a method based on quantitative judgements (SS), and second due to the extensive

literature available on these two methods.

2.3.1 Standard sequences method

To elicit a utility function (ui), for example uHo corresponding to Housing, the SS

method starts by considering two hypothetical farms which differ only in the feeding

and housing criteria. Ceteris paribus is considered for the performance levels of the

other criteria. Then, it is assumed that the two farms differ in Feeding by a noticeable

amount (1 point for instance). An interval of this amplitude is located in the middle of

the range for Feeding; say for example 1-2. Then, a value for Housing is also set in the

middle of the range, say 3. Then, the DM is asked to assess a value of Housing (XHo)

such that he would be indifferent towards the two farms (1, 3) and (2, XHo). The second

question to the DM uses his answer to the first question (he is asked to assess the value

X’Ho of Housing that would leave him indifferent towards the two farms (1, 2) and (2,

X’Ho). Continuing along the same line would lead for instance to the following

sequences of indifference:

(1, 3) ~ (2, 2)

(1, 2) ~ (2, 0)

Then, similar questions are asked for the upper half of the range of the Housing, which

may lead to the following sequences of indifference.

(1, 5) ~ (2, 3)

(1, 7) ~ (2, 5)

In other words, the DM considers that a farm with a score of 1 in Feeding and a score of

3 in Housing (considering ceteris paribus in the other two criteria) is equal in terms of

preference to a farm with a score of 2 in Feeding and a score of 2 in Housing. A farm

with a score of 1 in Feeding and a score of 2 in Housing is thus considered equal to a

farm with a score of 2 in Feeding and a score of 0 in Housing. A farm with a score of 1

in Feeding and a score of 5 in Housing is considered equal to a farm with a score of 2 in

Feeding and a score of 3 in Housing, and finally a farm with a score of 1 in Feeding and

a score of 7 in Housing is considered equal to a farm with a score of 2 in Feeding and a

score of 5 in Housing. Such a sequence gives the analyst an approximation of the

single-attribute utility function for Housing uHo. The final step is to normalise the

individual utility function of each criterion in a (0-100) interval in order to be able to

aggregate the marginal utility functions for the different criteria.

To determine uHe (Health) and uB (Behaviour) in the same way as for Housing, a

successive search was carried out for intervals on the Health and Behaviour scales

which would exactly compensate the Feeding interval 1 - 2 in terms of preference.

Finally, the same recording was made for Feeding itself (uF), fixing an interval for

instance on the Housing of 2 - 3.

2.3.2 MACBETH

MACBETH (Measuring Attractiveness by a Categorical Based Evaluation Technique)

is a methodology described by Bana e Costa et al. (1999), which requires only

qualitative judgements to quantify the relative attractiveness (utilities) of options

(farms). To elicit a marginal utility function (ui) using the MACBETH software, for

example uHo corresponding to Housing, the first step is to fill in a matrix, giving

qualitative judgements regarding the difference of attractiveness between the different

quantitative performance levels of the criterion. For instance, for Housing, the

quantitative performance levels vary between 0 and 7. The qualitative judgements of

difference can be rated as ‘very weak’, ‘weak’, ‘moderate’, ‘strong’, ‘very strong’ or

‘extreme’, (Figure 2a).

Figure 2a. MACBETH matrix of qualitative judgements. Quantitative performance

levels for Housing.

As each judgement is given, the software automatically verifies the matrix’s consistency

(Figure 2b), and suggests judgement modifications which can be made to fix any

detected inconsistency (Figure 2c).

Figure 2b. MACBETH matrix of qualitative judgements. Example of building a

consistent matrix.

Figure 2c. MACBETH matrix of qualitative judgements. Example of inconsistency.

From the complete and consistent matrix of judgements, MACBETH creates a

numerical scale (Figure 2d). With the numerical scale, MACBETH produces the

marginal utility function (u) for each criterion. The range in which the utilities vary was

defined in this study as 0-100 in order to be in accordance with the SS method.

Figure 2d. MACBETH matrix of qualitative judgements. Complete matrix of

judgments and numerical scale.

2.4 MAUT - Aggregation methods

In the second step, all the criteria were considered together. Here, the weighted sum

(WS) and the Choquet integral (CI) were used as aggregation functions in order to

evaluate the differences in the output of taking the interactions between criteria into

account and considering that the welfare criteria behaved as independent criteria.

2.4.1 Weighted sum

After the SS technique and the MACBETH method, 4 utility functions were present

where 0 was the worst performance and 100 was the best performance for each

criterion. Weights would have had to be used to additively combine these values using

the WS. The DM was asked to provide some initial notions on the importance of the

weights (W in Table 1). Thus, a test was performed to determine whether the same

weighting vector was obtained when two different methods were implemented to elicit

Firstly, following a method suggested by Bouyssou et al. (2006), and described first by

Keeney and Raiffa (1976). The interest in this technique is that the weights are not

obtained by asking the DM to give the value of the parameters (direct rating procedure).

Instead, the DM is asked to rank alternatives, and the different importance values of the

criteria are determined from this ranking, following a determined procedure which uses

the utility functions previously determined.

Secondly, the weighting of the criteria was performed within the MACBETH software

following the same procedure as described for the elicitation of the utilities, in other

words, giving qualitative judgements regarding the difference of attractiveness between

criteria.

The same weighting vector was obtained using the Keeney and Raiffa (1976) technique

and the MACBETH methodology. The utilities calculated by the SS method and the

MACBETH methodology were aggregated with the WS using the weighting vector

obtained.

2.4.2 Choquet integral

In order to combine the 4 utility functions calculated by the SS technique or by the

MACBETH method using the CI, the first step was the capacity identification.

Capacities can be regarded as a weighting vector involved in the calculation of weighted

sums. Seen as an aggregation operator, the CI with respect to the capacity can be

considered as taking into account the different importance of the criteria and the

interaction between criteria. The overall importance of a criterion can be measured by

its Shapley value and the interaction between criteria can be measured by the interaction

indices. The interaction phenomena among criteria can be very complex and difficult to

identify. Different forms of dependence exist, for instance, correlation,

substitutive/complementary, and preferential dependence (Marichal, 2000). In this

study, the DM regarded the criteria as complementary (positive interaction) or

substitutive (negative interaction). According to the definition of Marichal, (2000)

subtitutiveness between criteria can be understood as when a decision maker demands

that the satisfaction of only one criterion produces almost the same effect than

satisfaction of both. Of course, it is better that they be good on both directions, but it is

less important. For instance, in this study and considering two criteria i and j, they

would be regarded as substitutive when it is important that farms are good at criterion i

or j, in other words, compensation is allowed between them, but they will be considered

complementary when for the DM the satisfaction of only one criterion produces a very

weak effect compared with the satisfaction of both.

The number of variables involved in the CI increases exponentially with the

coefficients, which define a capacity. For reasons of simplicity, it may be preferable to

restrict to 2-additive or 3-additive solutions (Gabrisch et al., 2008), which in this study

corresponded to the definition of 10 or 14 coefficients respectively. We proposed

restricting the model to the 2nd order, thus assuming that interaction between more than

2 criteria does not exist. Due to the fact that in this example although only 4 criteria

were considered and the difference in coefficients to be determined between a 2nd and a

3rd order was small, more criteria in a further step of the project may have to be

considered. If, for instance, 6 health criteria are aggregated, 21 coefficients will be

needed with a 2-additive model and 41 with a 3-additive one. The number of variables

involved in the CI increases exponentially with the coefficients which define a capacity.

Let us consider a decision problem involving a set X of n elements, here

(criteria). Defining a capacity on X requires the definition of coefficients. This

could be too complex to handle if n goes beyond, say 8 (Grabisch, 1997). As a

consequence it is frequent to consider that the capacity is additive, what identifies the

Choquet integral with the weighted arithmetic mean (Marichal, 2000), and that can be

defined with only n coefficients, at the price of a very poor modelling tool, avoiding in

this way the complexity of using non-additive capacities but also losing their richness

(Kojadinovic, 2007). The fundamental notion of k-additive proposed by Grabisch

(1997) enables to find an intermediate solution between the complexity of

representation and the richness of the model. K-additive measures for need less

than coefficients to be defined. Only n coefficients are needed for (additive

capacity), for , and in general for k-additive measures.

According to Mayag et al. (2011) given (x1,…, xn) the individual utilities for the

criteria, in this study (xF, xHo, xHe, xB) the individual utilities for Feeding, Housing,

Health and Behaviour respectively, the CI with respect to a 2-additive capacity can be

written as follows:

Where vi represents the importance of the criterion i and Iij represent the interaction

between criteria i and j.

There are different methods for capacity identification proposed in the literature. Most

of them can be stated as optimisation problems. The main differences between them are

the objective function and the preferential information they require as input. The

minimum variance approach was used, which requires only a partial order over the

farms as preference information. Capacity identification was implemented within the

Kappalab R package following the method described by Grabisch et al. (2008). The

utilities calculated using MACBETH and the SS method corresponding to the criteria

data for the 10 farms were used as subsets against which the capacity was to be

identified, in order for the CI to numerically represent the preferences of the DM with

respect to this capacity. The partial weak order over the farms (R in Table 1) given by

the DM was used for the implementation of the minimum variance approach (MV). A

non-negative indifference threshold for the ranking over the farms was defined so the

partial weak orders previously mentioned were translated into partial semi-orders with

fixed indifference thresholds, see Grabisch et al. (2008) for a deeper review. The values

of the thresholds had to be chosen carefully, since a very large indifference threshold

could have made the program infeasible, see Marichal and Roubens (2000) for a deeper

review. The indifference threshold for the ranking of the alternatives was established as

After an initial calculation of the CI with the MV, a progressive interactive approach

was developed in order to be in accordance with the DM’s initial preferences regarding

the importance of the criteria (Shapley values) and the interaction indices (MV’). Non-

negative indifference thresholds for the Shapley values and for the interaction indices

were defined. The indifference threshold established to regard the criteria as different

was 0.05 and the minimal absolute value of an interaction index to be considered as

significantly different from zero was established as 0.05.

Additional constraints on the Shapley values were imposed, so the importance of the

criteria followed the order determined before following the DM preferences (W in

Table 1) and additional constraints on the interactions indices were imposed so the

criteria were regarded as complementary and compensation was limited between them

(positive interaction between the 6 pairs of criteria (F, Ho), (F, He), (F, B), (Ho, He),

(Ho, B), (He, B).

2.5 Estimation of the importance of the interactions between criteria

The utility functions determined before with MACBETH were used to produce a utility

value for each criterion for the 2,800 farms. In order to demonstrate the importance of

taking into account the interaction between the criteria to produce an overall assessment

of farm animal welfare, the individual utilities for each criterion were aggregated

additively (with a weighting vector, WS) and non-additively (with the CI). For the CI

aggregation, the coefficients (Shapley values and interaction indices) obtained by the

MV’ approach for the MACBETH method were used and for the WS aggregation only

the Shapley values of the MV’ approach (WSMV’) were used as weights. The objective

was to estimate the number of farms that changed their welfare category due to the

inclusion in the model of the interactions between the criteria and the limitation of the

compensation between them. Each one of the 2,800 farms was assigned to a welfare

category (unacceptable, acceptable, enhanced and excellent). The number of farms

assigned to each welfare category were compared when the criteria were considered as

independent criteria and when the interactions between the criteria were taken into

account limiting the compensation between them.

3 Results

3.1 Utility function determination methods

The differences between the utility functions calculated using the SS method and the

MACBETH method were in general minor, except for the lowest value of Behaviour,

where a difference between the utilities of both methods greater than 10 was found

(Figure 3).

Figure 3. Utility functions calculated using the SS method (−−−) and the MACBETH

method (───) for Feeding, Housing, Health and Behaviour

3.2 Aggregation methods - Weighted sum

The resulting weighting vector following the Keeney and Raiffa technique and the

MACBETH method matched well, and were in accordance with the initial preferences

of the DM (W in Table 1). For both methods, the importance of the criteria conformed

to the following sequence:

Health (0.3333) = Behaviour (0.3333) > Housing (0.2223) > Feeding (0.1111)

These weights were used for the aggregation of the individual utilities calculated using

the SS and the MACBETH methods.

3.2.1 Standard sequences

The ranking of the 10 farms’ utilities obtained after aggregating with the WS, i.e. the

individual utilities calculated with the SS method (Table 2), was different from the

ranking over the farms given by the DM as initial preferences (Table 1). For farms a, b,

c, d, e and f, the ranking of the utilities was coincident with the initial DM preferences,

but completely different for farms g, h, i and j.

Table 2. Partial utilities calculated with the standard sequences method and overall

utilities and rankings (R) computed using the weighted sum (WS) and the Choquet

integral (CI) with the different approaches implemented, the minimum variance (MV)

and the minimum variance with Shapley value and interaction indices constrains (MV’).

Partial utilities WS CI

F Ho He B Overall

utility

R Overall utility

(MV’)

a 50 75 70 66.66 67.78 1 67.70 1 NS

b 75 50 70 66.66 65 2 66.38 2 NS

c 75 75 40 66.66 60.55 3 61.86 3 NS

d 75 75 70 33.33 59.44 4 59.02 4 NS

e 75 75 30 66.66 57.22 5 58.97 5 NS

f 50 50 30 33.33 37.78 6 38.18 6 NS

g 0 50 30 33.33 32.22 7=8 32.50 7 NS

h 50 12.5 30 33.33 29.44 10 32.45 8 NS

i 50 50 10 33.33 31.11 9 32.40 9 NS

j 50 50 30 16.66 32.22 7=8 32.35 10 NS

F: Feeding; Ho: Housing; He: Health; B: Behaviour. NS: No solution.

3.2.2 MACBETH

The ranking obtained after aggregating with the WS, the individual utilities calculated

with the MACBETH method (Table 3) and the ranking over the farms provided by the

DM as initial preferences (Table 1) were equal except for farms c and d. MACBETH

did not distinguish between them whereas the DM preferred farm c to farm d.

Table 3 Partial utilities calculated with MACBETH and overall utilities and rankings

(R) computed using the weighted sum (WS) and the Choquet integral (CI) with the

different approaches implemented, the minimum variance (MV) and the minimum

variance with Shapley values and interaction indices constraints (MV’).

Partial utilities WS CI

F Ho He B Overall

utility

R Overall utility

(MV’)

a 55 75 65 65 66.11 1 65.21 1 64.52 1

b 80 50 65 65 63.33 2 65.16 2 61.22 2

c 80 75 40 65 60.56 3=4 63.07 3 58.51 3

d 80 75 65 40 60.56 3=4 63.02 4 58.46 4

e 80 75 30 65 57.22 5 60.12 5 54.67 5

f 55 50 30 40 40.56 6 42.49 6 39.27 6

g 0 50 30 40 34.45 7 35.22 7 29.77 7

h 55 15 30 40 32.78 8 35.17 8 29.72 8

i 55 50 5 40 32.22 9 35.12 9 29.67 9

j 55 50 30 5 28.89 10 32.31 10 26.26 10

F: Feeding; Ho: Housing; He: Health; B: Behaviour.

3.3 Aggregation methods - Choquet integral.

3.3.1 Standard sequences

The overall utilities for the 10 farms computed using the CI with respect to the 2-

additive solutions are given in Table 2. For the MV approach, the results follow the

partial weak order provided by the DM at the beginning, and comply with the

indifference threshold established by the DM (0.05). Note that the differences between

the overall utilities of farms g, h, i, and j, are exactly equal to 0.05, which is exactly the

indifference threshold. The Shapley values and the interaction indices of the 2-additive

solution obtained by means of the MV approach are given in Table 4.

Table 4 Coefficients of the weighted sum (WS) the Choquet integral obtained by the

minimum variance approach (MV) and by the minimum variance approach with

constraints on the Shapley values and on the interaction indices (MV’), to aggregate

individual utilities calculated using the SS method and the MACBETH method.

Shapley values Interaction indices

F* Ho* He* B* F,Ho F,He F,B Ho,He’ Ho,B He,B

WS 0.111 0.222 0.333 0.333 - - - - - -

MV 0.183 0.226 0.278 0.312 -0.151 0.008 -0.029 0.027 0.054 -0.014

MV’ NS NS NS NS NS NS NS NS NS NS

MACBETH

WS 0.111 0.222 0.333 0.333 - - - - - -

MV 0.228 0.233 0.266 0.273 -0.048 0.019 0.019 0.018 0.007 0.022

MV’ 0.139 0.241 0.309 0.311 0.05 0.05 0.05 0.05 0.05 0.05

F: Feeding; Ho: Housing; He: Health; B: Behaviour. NS: No solution.

The importance of the criteria followed the next order: Behaviour > Health > Housing >

Feeding. This order over the overall importance of the criteria was not completely in

accordance with the initial preferences of the DM. In the interaction indices, it should be

noted that there was a strong negative interaction between Feeding and Housing (-

0.151). Feeding also negatively interacted with Behaviour, and Health interacted

negatively with Behaviour. There was no solution for the MV’ approach, due to the fact

that the model was not compatible with the three constraints imposed: ranking over the

farms, Behaviour = Health > Housing > Feeding, and all criteria regarded as

complementary (with indifference thresholds of 0.05, 0.05 and 0.05 respectively).

3.3.2 MACBETH

The overall utilities computed using the CI with respect to the 2-additive solutions for

the 10 farms are given in Table 3. Note that, as expected, for the MV approach the

results follow the partial weak order provided by the DM as an initial preference. It

should also be noted that the differences between the overall utilities of farms a and b, c

and d, g and h, and between h and i, are exactly equal to 0.05, which is the indifference

threshold. The Shapley values and the interaction indices of the 2-additive solutions for

MACBETH are given in Table 4. For the MV approach, the importance of the criteria

followed the order: Behaviour > Health > Housing > Feeding, which was not

completely in accordance with the initial preferences of the DM. All pairs of criteria

interacted positively except for Feeding and Housing, which interacted negatively.

For the MV’ approach, the constraints for both the interaction indices (indifference

threshold 0.05), and the Shapley values (indifference threshold 0.05) imposed by the

DM were satisfied, these being all the criteria complementary (positive interaction) and

following the Shapley values, the order: Health = Behaviour > Housing > Feeding

(Table 4). If these utilities (MV’) were compared with the initial ones without any

constraint (MV), three main facts could be noticed: first, the ranking over the farms

remained equal; second, the farms had lower values, an effect that was even more

marked in farms g, h, i and j; and third, the utilities of the MV approach decreased when

the compensation between the criteria was limited (MV’), this effect was stronger for

farms g, h, i and j, which are the farms that were elicited to evaluate compensation

between good and bad grades.

3.4 General dataset

When the number of farms assigned to each welfare category by using the MV’ Shapley

values and interaction indices and using the MV’ Shapley values as if they were the

coefficients of a weighted sum (WSMV’) were compared, it was noted that in the first

case 485 farms were classified as unacceptable, 1,788 farms as acceptable, 475 as

enhanced and 52 as excellent, whereas in the second case 407 farms were classified as

unacceptable, 1,574 as acceptable, 697 as enhanced and 122 as excellent. The number

of farms which changed to a higher or lower classification when the interaction indices

were not used in the aggregation are shown in Table 5.

Table 5 Number of farms changing to a higher or lower classification when the Shapley

values of the MV’ approach were used as the coefficients of a weighted sum instead of

the minimum variance with Shapley values and interaction indices constraints (WSMV’).

Original class Farms changed to class:

Unacceptable Acceptable Enhanced Excellent

Unacceptable (n=485) 373 112 0 0

Acceptable (n=1788) 34 1455 299 0

Enhanced (n=475) 0 7 398 70

Excellent (n=52) 0 0 0 52

4 Discussion

The animal welfare multi-criteria evaluation was constructed in two separated steps.

First, utility functions for each criterion were determined in two different ways, using

the SS method and the MACBETH software. In the second step, the WS and the CI

were used as aggregation functions. For the CI capacity identification, minimum

variance (MV) and minimum variance with constraints (MV’) approaches were used.

The main problem found in the utility functions determination with the SS method was

that they are determined on the basis of a linear transformation. For the utility function

of Behaviour (Figure 3), an increase in Behaviour from a score of 2 to a score of 3 had a

utility for the DM of one unit, an increase from 3 to 4 also had a utility of one unit, and

an increase from a score from 0 to 2 was corresponded by a utility of one. Due to the

linear transformation which follows the model, an increase in Behaviour from 0 to 1

passively corresponds to an increase of half a unit. However, there is no opportunity to

assign a lower or a higher value, which can lead to overestimating or underestimating

the utility values a DM would like to assign to a determine performance of one criteria.

It must be pointed out that when the number of performance levels of a criteria

decreases, this under/over estimation can become higher even making the model

unfeasible.

The rankings obtained after aggregating with the WS, i.e. the individual utilities

calculated using the SS method (Table 2) and the MACBETH method (Table 3) were

very different. Compared to the ranking over the farms given by the DM as initial

preferences, MACBETH was the method that better fitted the DM preferences, with

only a different ranking for farms c and d, whereas the SS method presented several

ranking reversals between farms g, h, i and j, which were the farms that were elicited to

estimate how the methods behave when a criterion had a very low value and the other

criteria presented medium-high values. In other words, they were elicited to study the

preferences of the DM regarding the compensation between good grades and bad

grades. This difference between the rankings appeared to be related to the problem

presented above, i.e. the SS method did not allow the DM to assign lower values for

Housing and Behaviour, and this led to a non-accurate interpretation of the DM’s

preferences, implying that the ranking over the overall utilities differed from the DM’s

initial preferences’ ranking.

The results of the MV approach, both for the SS method (Table 2) and the MACBETH

method (Table 3), followed the partial weak order provided at the beginning. The

Shapley values obtained using both methods conformed to the same sequence which

was not completely in accordance with the DM preferences although the differences

were minor. However, the major difference between the methods and the DM

preferences were the values of the interaction indices. The DM considered all the

criteria as complementary; however, there was a negative interaction between Feeding

and Housing for the MACBETH method and a strong negative interaction between

Feeding and Housing for the SS method. There were also interactions between Feeding

and Behaviour, and Health and Behaviour. In an initial calculation of the capacity with

no additional constraints imposed on the model it is usual that the results do not

completely fit the preferences of the DM due to the small dataset from which the

capacity is determined. This issue can be solved by imposing additional constraints on

the Shapley values and on the interaction indices. However, for the SS method, there

was no solution compatible with the constraints (MV’), whereas for the MACBETH

method there was a compatible solution. In the case of the SS method, both the poor

fitting of the DM preferences in the MV approach and the inconsistency of the MV’

model appears to be related to the problem with the SS utility function determination

method. In the case of the MACBETH method, the fact that the preferences of the DM

regarding the Shapley values and the interaction indices in the first approach (MV) were

not completely satisfactory appeared to be more related to the limited learning data than

to a poor interpretation of the DM preferences, since there was a compatible solution

after imposing the constrains.

In summary, the problem in the determination of the utility functions with the SS lay in

the quantitative performances of the criteria. These performances were a mere

simulation. Real welfare measures, as proposed in Welfare Quality® (2009), may be

used in a further step of the project. The quantitative performances of WQ measures

vary, for instance, between 0 and 100 percentage animals with the presence of the

measure. In this scenario, it could be assumed that the utility functions determined using

the SS method would fit the DM preferences as well as the MACBETH method would.

However, we prefer the use of MACBETH to the use of the SS method for several

reasons: first, due to the available information on how to use this method to facilitate a

consensus between stakeholders (Parnell et al., 2013, Bana e Costa et al., 2014), which

may be one of the difficulties when a panel of different DMs is consulted to determine

the utility functions and the aggregation parameters in a further step of the project.

Second, due to the fact that this method makes it easier to judge the different

attractiveness of options with an increasing number of criteria, due to its interactive

software, and due to the use of qualitative judgments, and moreover, a scale of

indifferent categories (‘very weak’, ‘weak’, ‘moderate’, ‘strong’, ‘very strong’ or

‘extreme’), Bana e Costa et al. (2004). Third, the determination of the utilities process

remained more transparent with the MACBETH method and it is easier to explain to the

stakeholders due to its interactive software than the SS method. Fourth, MACBETH

allows for a comparison of not only qualitative performance levels but quantitative

performances too, with no need for a previous conversion of the quantitative scales into

a qualitative scale, allowing a solution to one of the problems presented by Botreau et

al. (2007b).

What the results of the MV and MV’ approaches corroborated is that using MAUT,

whose aggregation process is based on the WS, is not a valid method to develop an

overall assessment of animal welfare due to the fact that the criteria do not behave as

independent criteria, which is an assumption when using this aggregator (Vincke,

1992). The estimation of the different classification of the farms obtained if the DM

decided to use an additive value model (WSMV’) in spite of all its well-known

drawbacks showed that the main differences occurred in the number of farms classified

as unacceptable and enhanced. 112 of the 485 farms classified with the MV' as

unacceptable were classified as acceptable with the WSMV, and 299 farms of the 1,788

farms classified as acceptable with the MV’ approach were classified as enhanced with

the WSMV’. In other words, not taking the interaction between the criteria into account

led to a considerable decrease in the number of farms classified as unacceptable (from

17.3% of the farms to 14.5%) and acceptable (from 63.9% to 56.2%) and a noticeable

increase in the number of farms classified as enhanced and excellent (from 17% to

24.9% and from 1.9% to 4.4% respectively). Note that the percentage of farms in each

welfare category may vary if the thresholds established for each category are modified.

The large difference in the number of farms classified as unacceptable appeared to be

related to the limitation of compensation between bad and good grades. This revealed

the potential impact of not taking into account the interactions between the criteria to

produce an overall assessment of animal welfare in the context of certification schemes,

which might have been unnoticed had the differences between the aggregation methods

for a small subset of farms as the initial dataset been considered.

5 Conclusions

In summary, in the aggregation of animal welfare criteria it is of major importance to

choose an aggregation method which allows an interaction between the criteria to be

taken into account, such as the CI, and allows the limitation of these interactions when

the criteria are considered complementary by the DMs. Choosing a simpler aggregation

method, such as the WS, which allows compensation between the criteria would lead to

an important misclassification of farms in the context of certification schemes, as

demonstrated here. In this study, it was concluded that MACBETH method better

represented the preferences of the DM than the SS method. The interpretation of the

DM preferences through the utility functions was found crucial in the determination of

the CI aggregation coefficients. A utility function which does not reflect the preferences

of the DM adequately would lead to an incompatible solution when additional

constraints are imposed on the capacity determination model.

6 Acknowledgements

The present study is part of the PHENOMICS research project which is funded by the

German Federal Ministry of education and research.

7 References

Basic ideas, software, and an application. In Advances in Decision Analysis (eds

N Meskens and M Roubens), vol. 4, pp.131-157. Kluwer Academic Publishers,

Dordrecht, Netherlands.

Bana e Costa CA, de Corte JM and Vansnick JC 2004. On the mathematical

foundations of MACBETH. In MCDA, Multiple Criteria Decision Analysis (eds J

Figueira, S Greco and M Ehrgott), pp. 409 - 442. Kluwer Academic Publishers,

Bana e Costa CA, Lourenço JC, Oliveira MD and Bana e Costa JC 2014. A socio-

technical approach for group decision support in public strategic planning: The

Pernambuco PPA case. Group decision and negotiation 23, 5-29.

Blokhuis HJ, Veissier I, Miele M and Jones B 2010. The Welfare Quality® project and

beyond: Safeguarding farm animal well-being. Acta Agriculturae Scandinava,

Section A, Animal Science 60, 129-140.

Botreau R, Bonde M, Butterworth A, Perny P, Bracke MBM, Capdeville J and Veissier

I 2007a. Aggregation of measures to produce an overall assessment of animal

welfare. Part 1: A review of existing methods. Animal 1, 1179-1187.

Veissier I 2007b. Aggregation of measures to produce an overall assessment of

Botreau R, Butterworth A, Engel B, Frokman B, Jones B, Keeling L, Kjærnes U,

Manteca X, Miele M, Perny P, van Reenen CG and Veissier I 2009. An Overview

of the Development of the Welfare Quality® Assessment Systems. In Welfare

Quality Reports® no. 12 (eds L Keeling). Cardiff University, UK.

Foundations of Computing and Decision Science 33, 1-18.

Bracke MBM, Spruijt BM and Metz JHM 1999. Overall animal welfare assessment

reviewed. Part 1: Is it possible? Journal of Agricultural Science 47, 279-291.

and decision models: A critical perspective. Kluwer Academic Publishers,

and decision models with multiple criteria: Stepping stones for the analyst.

Springer, New York, USA.

Fraser D 1995. Science, values and animal welfare: Exploring the ‘inextricable

connection’. Animal Welfare 4, 103-117.

Record 17, 357.

Grabisch M 1996. The application of fuzzy integrals in multi-criteria decision making.

European Journal of Operational Research 89, 445-456.

Grabisch M 1997. k-order additive discrete fuzzy measures and their representation.

Fuzzy Sets and Systems 92, 167-189.

Grabisch M, Kojadinovic I and Meyer M 2008. A review of capacity identification

methods for Choquet Integral based multi-attribute utility theory. Applications of

the Kappalab R package. European Journal of Operational Research 186, 766-

values tradeoffs. Wiley, New York, USA.

Krantz DH, Luce RD, Suppes P and Tversky A 1971. Foundations of measurement, vol.

1: Additive and polynomial representations. Academic Press, New York, USA.

Kojadinovic I 2007. Minimum variance capacity identification. European Journal of

Operational Research 177, 498-514.

Labreuche C and Grabisch M 2003. The Choquet integral for the aggregation of interval

scales in multi-criteria decision making. Fuzzy sets and Systems 137, 11-16.

Marichal JL 2002. An axiomatic approach of the discrete Choquet integral as a tool to

aggregate interacting criteria. IEEE Transaction on fuzzy systems, vol. 8, no 6.

Marichal JL and Roubens M 2000. Determination of weights of interacting criteria from

a reference set. European Journal of Operational Research, vol. 124, no 3, 641-

Mayag B, Grabisch M and Labreuche C 2011. A characterization of the 2-additive

Choquet integral through cardinal information. Fuzzy sets and Systems 184, 84-

Merad M, Dechy N, Serir L, Grabisch M and Marcel F 2013. Using a multi-criteria

decision aid methodology to implement sustainable development principles within

an organization. European Journal of Operational Research 224, 603-613.

integral as an integral with respect to a fuzzy measure. Fuzzy sets and systems 29,

201-227.

Parnell GS, Brensik TA, Tani SN and Johnson ER 2013. Handbook of decision

analysis. John Wiley and sons, New York, USA.

Programming 1, 239-266.

allocation. McGraw-Hill, New York, USA.

Sugeno M 1974. Theory of fuzzy integrals and its applications. PhD thesis, Tokyo

Institute of Technology. Tokyo, Japan.

making. IEEE Transactions on Systems, Man and Cybernetics 18, 183-190.

Vapnek J and Chapman M 2010. Legislative and regulatory options for animal welfare.

FAO Legislative study 104. FAO, Rome, Italy.

Vincke P 1992. Multi-criteria Decision-aid. Wiley, New York, USA.

von Winterfeldt D and Edwards W 1986. Decision analysis and behavioral research.

Cambridge University Press, Cambridge, UK.

Wakker PP 1989. Additive representations of preferences: A new foundation of

decision analysis. Kluwer Academic Publishers, Dordrecht, Netherlands.

Welfare Quality 2009. Welfare Quality® Assessment Protocol for Growing Pigs. Wefare

Quality® Consortium, Lelystad, Netherlands.

CHAPTER TWO

Development of a multi-criteria evaluation system to assess

growing pig welfare

P. Martín 1, I. Traulsen 1, C. Buxadé 2 and J. Krieter 1

Abstract

The aim of this paper was to present an alternative multi-criteria evaluation model to

assess animal welfare on farms based on the Welfare Quality® project, using an

example of welfare assessment of growing pigs. The WQ assessment protocol follows a

three-step aggregation process. Measures are aggregated into criteria, criteria into

principles, and principles into an overall assessment. This study focused on the first step

of the aggregation. Multi-attribute utility theory (MAUT) was used to produce a value

of welfare for each criterion. The utility functions and the aggregation function were

constructed in two separated steps. The MACBETH method was used for utility

function determination and the Choquet integral (CI) was used as an aggregation

operator. The WQ decision-makers’ preferences were fitted in order to construct the

utility functions and to determine the CI parameters. The methods were tested with

generated datasets for farms of growing pigs. Using the MAUT, similar results were

obtained to the ones obtained applying the WQ protocol aggregation methods. It can be

concluded that due to the use of an interactive approach such as MACBETH, this

alternative methodology is more transparent for stakeholders and more flexible than the

methodology proposed by WQ, which allows the possibility to modify the model

according, for instance, to new scientific knowledge.

Keywords: Growing pigs, Welfare Quality, multi-criteria evaluation.

1 Introduction

Concern about livestock living conditions has increased considerably in the last few

years. Also, consumers have been increasingly linking animal welfare indicators with

food safety and quality. These consumer preferences create economic incentives for

certification schemes (Vapnek and Chapman, 2010). Due to the lack of a standard

assessment of animal welfare, these standards vary from one certification scheme to

another. This statement was the origin of the EU Welfare Quality® project (WQ),

which aimed at proposing an overall assessment system to assess the welfare of cattle,

pigs and poultry (Botreau et al., 2008).

Animal welfare is a multi-dimensional concept, and its assessment should be based on a

variety of measures related to several aspects such as the absence of thirst, hunger,

discomfort, disease, pain, injuries and stress, and the presence of normal behavioural

expressions (Farm Animal Welfare Council (FAWC), 1992)). Due to this fact, a multi-

criteria evaluation model is required for the evaluation of an animal unit (farm,

slaughterhouse). These multi-criteria, decision-making approaches all share the need for

an aggregation operator. In this case, information at the measures level may be useful

for farm management purposes; however, labelling purposes require a certain level of

aggregation of the measures into overall scores. Considerable efforts continue to be

made in order to develop overall assessment systems for different farm animal species

(e.g. WQ project, Bristol Welfare Assurance Programme and Animal Welfare Indicators

project, AWIN). WQ developed animal welfare multi-criteria evaluation models for

different livestock species (Botreau et al., 2009). The inputs for the WQ animal welfare

multi-criteria evaluation model are on-farm welfare measures described in the WQ

assessment protocol (Welfare Quality, 2009). The WQ multi-criteria evaluation model

uses different aggregation methods (e.g., decision tree, weighted sum or Choquet

integral) to aggregate measures into an overall assessment (Botreau et al., 2008). There

are other ways of approaching the aggregation problem that differ from the ones used by

the WQ multi-criteria evaluation model, e.g., the multi-attribute utility theory (MAUT),

ELECTRE or the Analytic Hierarchy Process (AHP). In the MAUT, uni-dimensional

utility functions which corresponds to each criterion are aggregated into a single global

utility function combining the whole of the criteria (Keeney and Raiffa, 1976), whereas

by using ELECTRE (outranking procedure) only the preference relations on pairs of

alternatives are aggregated (Roy, 1971); whilst in the Analytic Hierarchy Process

(Saaty, 1980).

In the present study, we focused on the MAUT. A large number of methods have been

proposed to determine the utility functions in MAUT, for instance the standard

sequences method described by Bouyssou et al. (2000) and the MACBETH method,

described by Bana e Costa et al. (1999). Examples of aggregation functions in MAUT

are the weighted sum, the ordered weighted average (Yager, 1988) and the Choquet

integral (CI) (Murofushi and Sugeno, 1989). The most common aggregation tool still

used today is the weighted sum, with all its well-known drawbacks. Using this

aggregator, different importance can be attached to the criteria, but no interaction

between the criteria is taken into account. The distinguishing feature of a CI is that it is

able to represent a certain interaction, ranging from redundancy (negative interaction) to

synergy (positive interaction) (Grabish, 1996). In the framework of the MAUT, the

MACBETH method was used for utility function determination, and the CI as the

aggregation method.

The aim of this paper is to present an alternative multi-criteria evaluation model to

assess animal welfare on farms, within the WQ framework, employing, as an example, a

welfare assessment of growing pigs. The aim was to find a model which solved the

main difficulties described by Botreau et al. (2007b) that a multi-criteria aggregation

model for animal welfare faces, for instance, the problem that interactions may exist

between measures and that measures may have different importance for animal welfare,

but it remains more transparent and flexible than the model proposed in the WQ

protocol. In other words, we looked for a model which can be easily understood by the

stakeholders and which would allow the parameters to be changed according to new

scientific knowledge. The paper is organised as follows: Section 2 presents the general

methodology followed in the WQ protocol and the methodology we propose to

construct the multi-criteria evaluation model. Section 3 presents the construction of

criteria from the initial measures by means of examples. Finally, Section 4 discusses the

strengths and weaknesses of the model.

2 General methodology

2.1 Welfare Quality®

The WQ assessment protocol for growing pigs consists of 27 welfare measures, which

were aggregated following a three-step aggregation process (Welfare Quality, 2009). 27

welfare measures were thus combined into 12 criteria, these were aggregated into 4

principles, and these 4 principles were aggregated into an overall assessment. Different

types of operators were used in this aggregation process, such as decision trees,

weighted sums, conversion to ordinal scores, least squares spline fitting, and CI. To

parameterise the operators used for the aggregation of the welfare measures and criteria,

datasets were presented to expert panels of 13 animal scientists, who individually

ranked farms and gave an absolute score on a scale of 0-100 for each of the farms

presented in each of the datasets (Botreau et al., 2008). Partners of the WQ project and

members of the Management Committee and Advisory Committee (i.e. stakeholder

representatives), were consulted to agree upon parameters for the aggregation of

principles into an overall classification (Botreau et al., 2009).

2.1.1 First step of the aggregation process

In the first step, welfare measures were aggregated into the 12 corresponding criteria.

WQ used different types of aggregation of measures into criteria (Figure 1). For some

criteria, the numbers of moderate and severe problems were first combined with a

weighted sum, producing a measure index, on a scale from 0 (worst) to 100 (best).

Afterwards, these index values were converted into measure scores (expressed on the

same 0-100 scale), using spline functions (Ramsay, 1988) that were fitted by least-

square methods. Finally the CI was used to combine the scores for the different

measures into a score for the criterion (a in Figure 1). For some other criteria, the

measures where first transformed into an ordinal scale, which consisted of assigning

warning or alarms depending on the value of the measures. The number of warnings and

alarms were then combined into an index for the criterion, and afterwards this index was

converted into a criterion score using l-spline functions (b in Figure 1). Decision trees

were used to produce the criterion score (c in Figure 1) for other measures. Further

information on the development and employment of these operators can be found in

Botreau et al. (2008, 2009) and Veissier et al. (2011).

Figure 1. Outline of the three different methodologies followed in the Welfare Quality®

project to aggregate the measures into criteria (adapted from Welfare Quality, 2009).

2.1.2 Second step of the aggregation process

In the second step, a CI was used to aggregate the 12 criteria into four principles. This

integral uses weights to combine the different criterion scores into one principle score

(expressed on the 0-100 scale), while limiting the possibility that a poor score of a

criterion is compensated by other excellent scores (Botreau et al., 2007b; Veissier et al.,

2011).

Measure1

Measuren

Score1

Scoren

Criterion score

Measuren

Measure1

Criterion Index Criterion

Ordinal measure1

Ordinal measuren

Measuren

Measure1

Criterion score

Previous calculations I-spline curve fitting Aggregation (Choquet integral)

Previous calculations Weighted sum I-spline curve fitting

Decision tree

2.1.3 Third step of the aggregation process

In the third and final step, the four principles were combined into one overall

assessment. The herds were classified in four different welfare categories:

‘unacceptable’, ‘acceptable’, ‘enhanced’, or ‘excellent’, based on reference profiles for

these four principles (Botreau et al., 2009). To be classified as ‘excellent’, a herd had to

score >55 for each principle and >80 for two principles; to be classified as ‘enhanced’,

each principle had to be >20 and at least two principles had to be >55; to be classified as

‘acceptable’, each principle had to be >10 and at least three principles had to be >2’.

Herds which did not comply with the minimum scores were classified as

‘unacceptable’, which means that at least one principle was ≤ 10 or at least two

principles were ≤ 20.

2.2 Multi-attribute utility theory (MAUT)

As presented before, the WQ assessment protocol follows a three-step aggregation

process. Measures are aggregated into criteria, criteria into principles, and principles

into an overall assessment (Welfare Quality, 2009). This study focused on the first step

of the aggregation to introduce an alternative methodology to the one proposed in the

WQ protocol by means of examples illustrated using growing pigs. MAUT was used to

produce a value of welfare for each criterion, the application of the MAUT consisted of

two separated steps, the utility functions determination and the aggregation function

determination. The MACBETH method was used for the determination of the utilities

and the CI was used as the aggregator method (Figure 2).

Figure 2. Outline of the alternative methodology proposed in this study to aggregate the

Welfare Quality® measures into criteria.

Measure1

Measuren

Utility1

Utilityn

Criterion utility

Previous calculations Utility function determination (MACBETH)

Aggregation (Choquet integral)

2.2.1 Utility function determination (MACBETH)

The utility function gives value to the measure in terms of welfare, it represents the

preferences of the decision-maker (DM) for the measures and their different values. For

example, 5% of lameness in a farm may be interpreted as a worse situation than 5% of

wounds on the body. There are different methods for utility function determination, we

chose MACBETH (Measuring Attractiveness by a Categorical Based Evaluation

Technique) for several reasons:

First, due to the available information on how to use this method to facilitate a

consensus between stakeholders (Parnell et al., 2013, Bana e Costa et al., 2014), which

may be one of the main difficulties which arise when a panel of different DMs is

consulted to determine the utility functions and the aggregation parameters in a further

stage of the project. Second, due to the fact that this method makes it easier to judge the

different attractiveness of options with an increasing number of criteria, due to the use

of qualitative judgments, and moreover, a scale of indifferent categories (‘very weak’,

‘weak’, ‘moderate’, ‘strong’, ‘very strong’ or ‘extreme’), Bana e Costa et al. (2004).

Third, the determination of the utilities process remains transparent due to the extensive

bibliography on it (Bana e Costa et al., 1999, 2004) and it is easier to explain to the

stakeholders due to the interactive software provided (M-MACBETH). Fourth,

MACBETH allows for a comparison of not only quantitative performance levels but

qualitative performances too, with no need for a previous conversion of the qualitative

scales into a quantitative scale, allowing a solution to one of the problems presented by

Botreau et al. (2007b).

MACBETH is a methodology which requires only qualitative judgements to quantify

the relative attractiveness (utilities) of options (farms). In order to elicit a marginal

utility function with MACBETH, the first step is to define whether the measure

performs as a quantitative measure or as a qualitative one and which are the

quantitative/qualitative performance levels of the measure. The next step is to fill in a

matrix, giving qualitative judgements regarding the difference of attractiveness between

the different quantitative performance levels of the measure. The qualitative judgements

can be rated as ‘very weak’, ‘weak’, ‘moderate’, ‘strong’, ‘very strong’ or ‘extreme’. As

each judgement was given, the matrix’s consistency was automatically verified with an

interactive algorithm based on linear programming (Mayag et al., 2010), and judgment

modifications were suggested which could be made to fix any detected inconsistency.

numerical scale. With the numerical scale, MACBETH produces the marginal utility

function (u) for each measure. In order to be able to aggregate the different measures

into criteria, this method also allows the user to normalise the raw data expressed in

different scales into an absolute value scale, ranging, for example, from 0 and 100,

where 0 is the worst situation one can find on a farm and 100 the best situation.

After the initial calculation of the MACBETH scale, a check was performed to ensure

that it adequately represented the relative magnitude of the WQ DMs’ judgements, if

not, the scores were adjusted.

2.2.2 Aggregation with the Choquet integral

In a second step, the CI was used to aggregate the different measures into the

corresponding criteria. In order to combine the measures (individual utilities calculated

with MACBETH) into the corresponding criteria using the CI, the first step used is the

capacity identification. Capacities can be regarded as a weighting vector involved in the

calculation of weighted sums. Seen as an aggregation operator, the CI takes into account

the different importance of the measures and the interaction between them. These

interactions can be complementary (positive) or substitutive (negative). The number of

variables involved in the CI increases exponentially, along with the coefficients which

define a capacity. To keep things simple, it may be preferable to restrict to two-additive

solutions.

In this study, capacity identification, based on the least squares (LS) approach, was

implemented using the Kappalab R package following the method described by

Grabisch et al. (2008). In order to use the LS identification method, the utilities

calculated with MACBETH corresponding to the examples’ data, were used as subsets

against which the initial preferences of the WQ DMs are expressed.

The results of the aggregation of the examples’ data following the WQ protocol were

used as initial preferences in order to fit the model to the WQ DMs preferences.

With this methodology, a progressive interactive approach can be developed after an

initial calculation of the CI, where additional constraints on the Shapley values, which

measure the overall importance of a measure (criterion), and the interaction indices can

be imposed in order to fit more precisely the WQ DMs preferences.

According to Mayag et al. (2011) given (x1, x2, …, xn) the individual utilities for the

different measures, the CI with respect to a two-additive capacity can be written as

follows:

Where vi represents the importance of the measure i and corresponds to the Shapley

value of µ (capacity) and Iij represents the interaction between measures i and j.

3 Examples of the aggregation of measures into criteria

In order to illustrate the methodology proposed for the construction of the criteria, three

examples are given: absence of injuries, absence of disease and absence of pain induced

by management procedures. The WQ protocol distinguishes three types of aggregation

of measures into criteria. Each one of these three criteria are calculated in a different

way in the WQ protocol (Figure 1), whereas this study proposes a unique methodology

for all the criteria (Figure 2).

3.1 Example 1: Criterion ‘Absence of injuries’

Absence of injuries is assessed by three measures: lameness, wounds on the body and

tail-biting. The measures which form this criterion have in common that they are

recorded at individual level. This scale generally represents the severity of the problem

and the range of animals surveyed can be easily calculated (e.g. percentage of animals

walking normally, percentage of moderately lame animals, and percentage of severely

lame animals).

3.1.1 Welfare Quality®

Briefly, the WQ protocol, first produced an ‘Index’ ( ) by combining the percentage

of animals in each severity category, particularly for lameness and wounds on the body.

It consists of a weighted sum, where n can be substituted by lameness or wounds in the

body. For instance, for lameness (l):

For example, a farm with a 10% moderately lamed animals (lameness1) and a 1%

severely lamed animals (lameness2) will achieve an Index for lameness ( ) of 95.

Afterwards this ‘Index’ is restored into a non-linear function (l-spline function)

producing a ‘Score’ ( . For instance, for lameness:

When ≤ 85 then:

When ≥ 85 then:

For example, the farm presented before which was assigned with =95 will achieve

a Score for lameness (Sl) of 51.35.

Figure 3 shows an example of the WQ I-spline function for lameness.

Figure 3. Scores for lameness according to the Index calculated for the % of lame pigs.

For tail biting the I-spline function is calculated directly. The mere absence or presence

of it is recorded, and thus there is no need for a weighted sum to combine the scores

regarding the severity of the problem.

To produce the criterion score, the partial scores previously obtained with the I-spline

functions are combined with the CI (Welfare Quality, 2009).

3.1.2 MAUT

Before determining the utility functions of lameness and wounds on the body, we

produced an Index as was carried out in the WQ protocol, in order to combine the

percentage of animals with a moderate problem and the percentage of animals with a

severe problem ( , -where n can be lameness or wounds on the body. We

implemented the same weights as those used in the WQ protocol. For instance, for

lameness:

For example, a farm with a 10% moderately lamed animals and a 1% severely lamed

animals will achieve an Index for lameness ( ) of 5.

The utility function for the percentage of animals with tail biting was calculated

directly.

-Utility function determination (MACBETH)

The measures which form this criteria were defined as quantitative measures in

MACBETH. The quantitative levels of these measures were defined according to the

WQ protocol. Figure 3 shows how the scores assigned by the WQ DMs corresponding

to the percentage of lame animals decreased rapidly for the 100 to 85 range – reflecting

0 to 15 % lame animals respectively – the rate gradually slowing down after this point.

Performance levels which vary in one unit between 0 and 15% animals with lameness

were established, and when the slope of the l-spline function became homogeneous,

intervals of 10 units were established, as can be seen in Figure 4 with an example of the

utility function for lameness calculated with MACBETH.

Figure 4. Utility function for lameness calculated with MACBETH

For example, the farm presented before which was assigned with =5 will achieve

a utility for lameness of 51.35.

-Aggregation with the Choquet integral

Ten farms were used as learning data to determine the CI aggregation parameters (Data

in Table 1). The utilities calculated with MACBETH for these ten farms were used as a

subset to express the WQ DMs preferences (Utilities in Table 1). The results of the

aggregation of the ten farms’ data following the WQ protocol were used as the WQ

DMs’ initial preferences in order to identify the capacity using the LS-based approach

(WQ overall scores in Table 1).

Table 1. Absence of injuries measures data for selected farms. Measures’ values, individual utilities and overall utilities for each selected farm.

Farm Measures data

(criteria)

(criteria) L1 L2 W1 W2 BT2 L W BT L W BT

a 0 0 0 0 0 0 0 0 100 100 100 99 100

b 1 0.31 4 1 1 0.72 3.67 1 90.37 90.16 93.84 89.99 90.30

c 5.10 0.03 7.46 2 3 2.08 6.97 3 75.04 81.95 82.46 75.01 75.04

d 4.67 1 12.71 5 5 2.87 13.47 5 67.59 67.64 72.41 67.49 67.59

e 1 3.77 25.37 1 7 4.17 17.91 7 57.08 59.29 63.61 57 57.08

f 5.53 3 5 5.6 10 5.21 8.98 10 50.07 77.21 52.52 50.01 50.07

g 1.92 7 36 10 19 7.77 33.33 19 37.69 37.7 31.58 33.81 33.70

h 40.37 1 37.31 5 33 17.15 29.87 33 24.57 41.52 19.57 21.17 21.31

i 33.25 30 89.55 20 55 43.3 79.70 55 10.12 14.40 10.49 10 10.12

j 0 100 0 100 100 100 100 100 0 0 0 0 0

¹Percentage of animals affected with lameness (L) /wounds on the body (W) scored 1

²Percentage of animals affected with lameness/wounds on the body/bitten tails (BT) scored 2

Comparing the results for lameness, wounds on the body and tail-biting obtained with

the WQ method and the MAUT (Table 1), the overall utilities from different farms –

calculated with MACBETH – fit the scores (at criteria level) obtained with the WQ I-

spline functions. The Shapley values for each measure are shown in Table 2. As we can

see, lameness was considered more important than tail-biting, which was in turn

considered more important than wounds on the body. Furthermore, Table 2 shows that

all the interaction between the measures were positive, thus, the measures were defined

as complementary, in accordance with the WQ protocol.

Table 2. Shapley value and interaction indices to aggregate the measures’ utilities into

the criteria with the Choquet integral.

Shapley value Interaction indices

Lameness Wounds Tail biting

Lameness 0.500 - 0.347 0.652

Wounds on the

body 0.174 0.347

- 0.000

Tail biting 0.326 0.652 0.000 -

3.2 Example 2: Criterion ‘Absence of disease’

Absence of disease is assessed by 13 measures. The measures used to check this

criterion lead to data expressed on different scales.

Due to the different nature of the measures (for instance, mortality is recorded as the

percentage of mortality on farm during the last 12 months, whilst coughing and

sneezing are assessed as the average frequency of coughs/sneezes per animal over 5

minutes), WQ decided to compare the data to alarm thresholds which represent the limit

between what is considered abnormal and what is considered normal. When the

incidence observed on a measure reaches approximately half the alarm threshold, a

warning is attributed. The measures are grouped into six areas: mortality, respiratory,

digestive, liver, skin and hernias. The severity of the problem is estimated per area: if

within an area the frequency of one symptom is above the warning threshold and the

other is below, a warning is attributed to the area. On the other hand, if within an area

the frequency of one symptom is above the alarm threshold, the alarm is attributed to

the area; if neither occurs, no problem is recorded. The number of alarms and warnings

detected on a farm are calculated and used to calculate an ‘Index’ for the absence of

disease criteria (Iad) with a weighted sum.

For instance, a farm with a warning in 2 areas and an alarm in another will achieve an

index for absence of disease ( ) of 63.3.

Finally the ‘Index’ is transformed into a score using I-spline functions.

When ≤ 10 then:

When ≥ 10 then:

For instance, the farm presented before which was assigned with an =63.33 will

achieve a score of 48.42.

3.2.2 MAUT

The measures employed to check this criterion were transformed in a first step, into an

ordinal scale, before determining the utility functions. The data was compared to the

warning and alarm thresholds defined in the WQ protocol. The measures were grouped

into the six areas defined in the WQ protocol. The area was attributed with a warning or

an alarm when one of its measures was above the warning or the alarm threshold.

The utility function was calculated per area. We defined the six disease areas as

qualitative measures where the performance levels could be recorded using the terms

‘no problem’, a ‘warning’ attributed to the area and an ‘alarm’ attributed to the area. In

MACBETH, when the area was attributed a warning, a utility of 40 was assigned to it.

When the area was assigned with an alarm, a utility of 0 was assigned, and when there

was no problem recorded the utility assigned to the area was 100 (Figure 5).

Figure 5. Utilities assigned to the performance levels of the absence of disease areas

For instance, the farm presented before, will achieve a utility of 0 in the area which was

assigned with an alarm, a utility of 40 for both areas which were assigned with a

warning, and utilities of 100 for the rest of areas.

Again, ten farms were used as learning data to determine the CI aggregation parameters

(Data in Table 3; the data were highlighted in grey or dark grey when they were above

the corresponding WQ warning or alarm thresholds respectively). The utilities obtained

for the ten farms with MACBETH were used as subsets to express the WQ DMs’

preferences (Utilities in Table 3). The results of the aggregation of the ten farms

following the WQ protocol (WQ overall scores) were used as initial preferences in order

to use the least squares-based approach for capacity identification (WQ in Table 3).

Table 3. Absence of disease Measures’ values for each selected farm. Measures’ values, individual utilities and overall utilities for each selected farm.

Measures Data

(criteria)

(criteria) Mortality Respiratory

condition

Digestive

condition

Parasites Skin

condition

Hernias

M1 C2 S2 LB3 TS3 RP3 LF4 P SC5 H5 H6

a 0.3 5 2 0.2 0.1 0.1 2 0 0.4 0.5 0.1 99.99 100.00

b 0.7 12 5 0.3 0.2 0.8 3 0 1 1 0.3 83.97 83.80

c 1 14 24 1.4 1 0.6 20 0 3 2.3 0.3 74.13 73.00

d 1.3 16 10 0.5 0.3 0.3 6 0 1.3 1.5 0.5 69.46 69.46

e 1.8 20 16 1 0.7 0.5 10 0 2.4 2 0.8 56.38 58.30

f 2 6 24 1.4 1 0.7 12 0 9 2.4 0.9 48.42 48.42

g 3 30 38 1.8 1.3 1 10 0 3.6 3 1 34.23 41.81

h 2.6 33 42 2 1.6 1.2 16 0 4 3.2 1.1 27.94 31.00

i 3 37 44 6.1 2 1.5 17 0 4.3 7 1.2 16.88 14.00

j 5.3 50 46 3 2.4 1.7 18 0 9.7 3.8 1.7 7.67 3.01

¹Percentage of mortality (M) on farm during the last 12 months. ² Average frequency of cough(C)/sneezes (S) per animal during 5 minutes. 3Percentage of pigs with evidence of laboured breathing (LB)/twisted snouts (TS)/rectal prolapse (RP) 4Percentage of pigs in herd with liquid faeces (LF) 5Percentage of pigs scored as 2 in skin condition (SC)/ hernias (H) 6Percentage of pigs scored as 1 in hernias(H) Data over the warning threshold; Data over the alarm threshold

For instance, we can notice that the farm presented before corresponding to Farm F is

assigned an overall utility of 48.42 after the aggregation of the individual utilities for

each area with the CI.

We found that the initial Shapley values resulted from aggregating the utilities with the

CI varied between each area slightly, and in the WQ protocol all the areas were consider

equally important. After imposing additional constraints on the Shapley values, the

importance attached to all the areas was the same. Regardless, the overall utility

remained equal. The interaction indices (Table 4) varied from the initial calculation of

the CI and the second constrained calculation, but in both cases all the areas performed

as complementary measures.

Mortality Respiratory Digestive Liver Skin Hernias

Mortality 0.165 - 0.024 0.046 0.029 0.018 0.024

Respirato

ry 0.167 0.024

- 0.017 0.055 0.046 0.035

Digestive 0.168 0.046 0.017 - 0.077 0.037 0.025

Liver 0.163 0.029 0.055 0.077 - 0.056 0.049

Skin 0.166 0.018 0.046 0.037 0.056 - 0.021

Hernias 0.168 0.0214 0.035 0.025 0.049 0.021 -

3.3 Example 3: Criterion ‘Absence of pain induced by management

procedures’

Absence of pain induced by management procedures is assessed by two qualitative

measures: castration and tail docking. These measures are taken at farm level. The

farms are classified according to the presence or absence of these mutilation procedures,

and if so, the use or not of anaesthetics.

WQ used a lexicographic valuation tree for these types of measures (Figure 6).

Figure 6. Tree created in the MACBETH decision support system for the criteria

Absence of pain induced by management procedures

For instance, a farm on which pigs were castrated using anaesthetics and tail docking

was performed without anaesthetics will achieve an index of 35 for the absence of pain

induced by management procedures.

3.3.2 MAUT

Castration and tail docking were defined in MACBETH as qualitative measures in this

study. Following the WQ protocol, their performance levels were established as no

castration/no tail docking, castration/tail docking with anaesthetics and castration/tail

docking without anaesthetics. Figure 7 shows the MACBETH scales for each measure.

Figure 7. Utilities assigned to the performance levels of the Absence of pain induced by

management procedures

For instance, the farm we presented before will achieve a utility of 60 for castration and

a utility of 0 for tail docking.

Nine farms were used as learning data to determine the CI aggregation parameters (Data

in Table 5). The utilities calculated with MACBETH corresponding to these farms were

used as subsets employed to express the WQ DMs preferences (Utilities in Table 5). To

enable the use of the LS-based approach for capacity identification, results from

aggregating the 9 farms data following the WQ protocol were used as WQ DMs’ initial

preferences (WQ overall scores in Table 5).Considering that WQ DMs were satisfied,

we decided not to impose any additional constraint when aggregating the absence of

injuries criterion. Table 5 demonstrates how the utilities concerning castration and tail

docking obtained from the 9 possible farm situations, were adjusted as much as possible

to the WQ scores, for this given criterion. When adjusting the utilities to the WQ DMs’

preferences, the CI parameters obtained indicated that tail docking was considered more

important than castration corresponding to its Shapley values of 0.539 and 0.461. We

also learnt that both measures were performing in a complementary way, with an

interaction index of 0.109.

For instance, we can notice that the farm presented before (Farm F) is assigned an

overall utility of 24.37 after the aggregation of the individual utilities for castration and

tail docking with the CI.

Table 5. Absence of pain induced by management procedures. Measures’ values, individual utilities and overall utilities for each selected farm.

Farm Measures data Utilities (criteria) (criteria)

Castration Tail docking Castration Tail Docking

a No No 100 100 100 100

b No Yes (with anaesthetics) 100 45 60 67.34

c No Yes (without anaesthetics) 100 0 38 40.62

d Yes (with anaesthetics) No 60 100 77 79.36

e Yes (with anaesthetics) Yes (with anaesthetics) 60 45 53 51.09

f Yes (with anaesthetics) Yes (without anaesthetics) 60 0 35 24.37

g Yes (without anaesthetics) No 0 100 47 48.40

h Yes (without anaesthetics) Yes (with anaesthetics) 0 45 27 21.78

i Yes (without anaesthetics) Yes (without anaesthetics) 0 0 8 0

4 Discussion and conclusions

4.1 General methodology

By using the MAUT, it has been proven that the main difficulties described by Botreau

et al. (2007b) faced by a multi-criteria aggregation model are solved by allowing this

method to assign different importance to the measures, by limiting the compensation

between them and by working with data collected on different types of scales.

Furthermore, the model’s flexibility allowed us to fit the WQ assessment, obtaining

results that were comparable to the ones obtained by implementing the WQ protocol.

Compared to the I-spline functions used in the WQ protocol to interpret the measures in

terms of welfare, the use of MACBETH presented several advantages:

First, by using MACBETH the assessment remained more transparent, which could help

to explain to the stakeholders the results and to identify the causes of poor welfare while

encouraging them to take efficient remedial measures which would affect the results.

On the other hand, the assessment remains more flexible. With this method all the

parameters can be changed according to new scientific knowledge (inclusion or

exclusion of measures based on new studies on their influence in animal welfare), due

to changes in societal expectations (if the welfare of animals improves significantly on

all farms, stakeholders may want to be more selective when considering a farm as

excellent), etc. The main drawback from using MACBETH was related to the the M-

MACBETH software implementation, as it does not allow the possibility of exporting

the utility functions formulae to other environments, while typing the information into

the software can be indeed extremely tedious when working with large amounts of

With regard to other methods proposed for the overall evaluation of animal welfare,

such as sum of ranks and sum of scores (Botreau et al., 2007a), the use of the CI as an

aggregator presented an important advantage since it allowed interaction between

measures to be taken into account, thus allowing the possibility to limit the interaction

between them, and in this way, solving one of the main problems described by Botreau

et al. (2007b). The CI was also used in the WQ protocol for the aggregation of some

measures into criteria and for the aggregation of criteria into principles (Welfare

Quality, 2009).

The main difficulty in implementing the least squares-based approach for CI capacity

identification is that it depends on information which the DM cannot always provide, as

are the overall scores for each criteria (Grabisch et al., 2008). Due to the fitting of our

results in accordance with the WQ DMs’ preferences, the results obtained from the WQ

model were used as initial preferences, thus avoiding this issue. However, following the

study of Merad et al. (2013), in other circumstances, it may be difficult for the DMs to

provide overall scores. Nevertheless, there are easier methods for capacity identification

proposed in the literature, such as the minimum variance approach, which requires only

a partial order over the farms as preference information. See Grabisch et al. (2008) for a

review of different methods for capacity identification.

4.2 Examples

In order to apply this methodology to the particular case of an Animal Welfare

assessment we have found some key points to take into account:

4.2.1 Absence of injuries

Defining the performance levels in MACBETH which the DM will have to react to is

extremly important in these sorts of measures. Although theoretically, these measures

can vary between 0 and 100 %, in real conditions the values of the measures usually

vary in a lower interval. For instance, Temple et al. (2011) found values which varied

between 0 and 5.8% animals affected with wounds on the body, between 0 and 8.1% for

tail-biting and between 0 and 1.8% for severe lameness. Thus, it will be in the lower

intervals of the measures in which the utility functions will have to be better fit to the

DMs preferences. For instance, for lameness (Figure 4), we established that its

performance levels varied in intervals of one unit between 0 and 15 % lame animals.

After this point ,we established intervals of ten units. In this way, we were able to fit

more precisely the preferences of the DM in the lower interval of the measure.

The use of linear combinations (weighted sum) is also a key feature which can be

reviewed and modifed in further stages of the study, employed to combine measures

which are defined in two severity categories: lameness and wounds on the body in this

study. By using a linear combination we assume that the measures can compensate each

other, and thus, by using the WQ weights, a farm which has for example 0% moderately

lame animals and 10% severely lame animals will be regarded, in terms of welfare, as a

farm with 10% moderately lame animals and 6% of severely lame animals.

Although it was emphasised throughout the development of the WQ model that welfare

scores should not compensate each other (Botreau et al., 2007b and Veissier et al.,

2011), compensation occurred in the first stages by using linear combinations.

Providing an individual utility function for each severity measure and afterwards

aggregate them by using the CI could prove to be an alternative solution. On one hand,

the model accuracy would increase, but on the other hand, so would the complexity of

the decision process, demanding from the DMs that they interpret a higher number of

measures in terms of welfare.

4.2.2 Absence of disease

In order to simulate the WQ DMs’ preferences, we compared the data for the absence of

disease measures with the warning and alarm thresholds established in the protocol.

However, in the development of the methodology we show that by converting the

original, quantitative data into an ordinal scale (3 qualitative levels: no problem

recorded, a warning or an alarm), it was impossible for the model to distinguish

between herds which slightly or greatly exceeded the thresholds. Further, conversion

into an ordinal scale might be reconsidered, and the measures should be treated as

quantitative ones, using the warning and alarm thresholds as references for the DM to

build the utility functions.

To stay in line with the WQ protocol preferences, we decided to create a utility function

per area rather than calculate a utility per measure. Following this method a large

compensation between disease areas measures’ is allowed. For instance, looking in

Table 3 at the warnings and alarms attributed to the measures gathered in the respiratory

area, a warning is both attributed to the respiratory area on a farm which only has one of

the measures classified with a warning (Farm E), as well as a farm which has the fourth

measure classified with a warning (Farm G). The compensation of measures between

disease areas is a crucial point which must be further studied.

4.2.3 Absence of pain induced by management procedures

A decision tree was used for these types of measures in the WQ protocol. By employing

this method, the two measures were considered together, and a score for each one of the

possible scenarios is given directly by the DMs. This methodology can be considered as

a direct rating. Although our methodology provided us with similar results, according to

Bouyssou et al., (2006) it can be concluded that the use of a direct rating method (for

example by using decision trees) makes the methodology less intuitive as opposed to

considering each measure separately and using an aggregation method based on an

intuitive process, which can be easily revised.

5 Acknowledgements

6 References

Foundations of Computing and Decision Science 33, 1-18.

adopted in Welfare Quality. Animal Welfare 18, 363-370.

and decision models: A critical perspective. Kluwer, Dordrecht.

and decision models with multiple criteria: Stepping stones for the analyst.

Springer, New York, USA.

Record 17, 357.

Grabisch M 1996. The application of fuzzy integrals in multi-criteria decision making.

Grabisch M, Kojadinovic I and Meyer M, 2008. A review of capacity identification

methods for Choquet Integral based multi-attribute utility theory, Applications of

Mayag B, Grabisch M and Labreuche C 2010. An interactive algorithm to deal with

inconsistencies in the representation of cardinal information, in: Hüllermeier E,

Kruse R and Hoffmann F (Eds), Information processing and management of

uncertainty in knowledge-based systems. Theory and Methods. Springer, Book

Series: Communication in computer and information science, vol.80, pp. 148-157.

Mayag B, Grabisch M and Labreuche C 2011. A characterization of the 2-additive

Choquet integral through cardinal information. Fuzzy sets and Systems 184, 84-

Merad M, Dechy N, Serir L, Grabisch M and Marcel F 2013. Using a multi-criteria

decision aid methodology to implement sustainable development principles within

an organization. European Journal of Operational Research 224, 603-613.

201-227.

analysis. New York: John Wiley and sons.

Ramsay JO 1988. Monotone regression splines in action. Statistical Science 3, 425-442.

Programming 1, 239-266.

allocation. McGraw-Hill, New York.

Temple D, Dalmau A, Ruiz de la Torre JL, Manteca X, Velarde A 2011. Application of

the Welfare Quality® protocol to assess growing pigs kept under intensive

conditions in Spain. Journal of Veterinary Behaviour 6, 138-149.

making. IEEE Transactions on Systems, Man and Cybernetics 18, 183-190.

Vapnek, J and Chapman M 2010. Legislative and regulatory options for animal welfare.

FAO Legislative study 104, FAO, Rome.

Veissier, I., K. K. Jensen, R. Botreau, and P. Sandoe. 2011. Highlighting ethical

decisions underlying the scoring of animal welfare in the Welfare Quality scheme.

Animal Welfare 20, 89–101.

Welfare Quality 2009. Welfare Quality® Assessment Protocol for Growing Pigs.

Lelystad: Wefare Quality® Consortium.

Winckler C 2013. Progress in, the present state of, and challenges for on-farm animal

welfare assessments in Europe. UFAW International Animal Welfare Science

Symposium, 4-5 July 2013. Universitat Autónoma de Barcelona, Spain.

CHAPTER THREE

Validation of a multi-criteria evaluation model for animal

welfare

P. Martín 1, I. Czycholl 1, C. Buxadé 2 and J. Krieter 1

Abstract

The aim of this paper was to validate an alternative multi-criteria evaluation system to

assess animal welfare on farms based on the Welfare Quality® (WQ) project, using an

example of welfare assessment of growing pigs. This alternative methodology aimed to

be more transparent for stakeholders and more flexible than the methodology proposed

by WQ. The WQ assessment protocol for growing pigs was implemented to collect data

in different farms in Schleswig-Holstein, Germany. In total, 44 observations were

carried out. The aggregation system proposed in the WQ protocol follows a three-step

aggregation process. Measures are aggregated into criteria, criteria into principles, and

principles into an overall assessment. This study focused on the first two steps of the

aggregation. Multi-attribute utility theory (MAUT) was used to produce a value of

welfare for each criterion and principle. The utility functions and the aggregation

function were constructed in two separated steps. The MACBETH method was used for

utility function determination and the Choquet integral (CI) was used as an aggregation

operator. The WQ decision-makers’ preferences were fitted in order to construct the

utility functions and to determine the CI parameters. The validation of the MAUT

model was divided into two steps, first the results of the model were compared with the

results of the WQ project at criteria and principle level, and second, a sensitivity

analysis of our model was carried out to demonstrate the relative importance of welfare

measures in the different steps of the multi-criteria aggregation process. Using the

MAUT, similar results were obtained to those obtained when applying the WQ protocol

aggregation methods, both at criteria and principle level. Thus, this model could be

implemented to produce an overall assessment of animal welfare in the context of the

WQ protocol for growing pigs. Furthermore, this methodology could also be used as a

framework in order to produce an overall assessment of welfare for other livestock

species. Two main findings are obtained from the sensitivity analysis, first, a limited

number of measures had a strong influence on improving or worsening the level of

welfare at criteria level and second, the MAUT model was not very sensitive to an

improvement in or a worsening of single welfare measures at principle level. The use of

weighted sums and the conversion of disease measures into ordinal scores should be

reconsidered.

Keywords: Growing pigs, Welfare Quality, multi-criteria assessment, sensitivity

analysis

1 Introduction

Animal welfare is a multi-dimensional concept, and its assessment should be based on a

variety of measures related to several aspects such as the absence of thirst, hunger,

discomfort, disease, pain, injuries and stress, and the presence of normal behavioural

expressions (Farm Animal Welfare Council (FAWC), 1992)). Due to this fact, a multi-

criteria evaluation model is required for the evaluation of an animal unit (farm,

slaughterhouse). In animal welfare, as well as in other areas, the development of a

multi-criteria evaluation system requires considerable efforts due to its complexity. The

complexity of this kind of model lies in the high number of measures involved, the

varied nature of these measures (qualitative, quantitative, measures recorded in different

scales, precision of the measures, different ranges of variation, etc.), the different

importance of the measures, the interaction between them, and last but not least the

number of stakeholder groups involved, which makes it difficult to arrive at decisions

which accommodate stakeholders’ wants and needs (Botreau et al., 2007).

Welfare Quality® (WQ) developed multi-criteria animal welfare evaluation models for

different livestock species (Botreau et al., 2009). The inputs for the WQ multi-criteria

animal welfare evaluation model are on-farm welfare measures described in the WQ

assessment protocol (Welfare Quality, 2009). The WQ multi-criteria evaluation model

uses different aggregation methods (e.g., decision tree, weighted sum or Choquet

integral) to aggregate measures into an overall assessment (Botreau et al., 2008).

Usually, it is in the development of the model where the greatest efforts are made and

less attention is paid to the credibility of the model. However, validation is a crucial

point in order to build sufficient confidence in the model for it to be used for practical

purposes. Model validation can be divided into three components – verification,

validation and sensitivity analysis – according to Qureshi et al. (1999) and Harrison

(1991). Verification refers to building the model correctly (O’Keefe at al., 1991). It

ensures that the model has been developed in a formally correct manner in accordance

with a specified methodology (Geissman and Schultz, 1991). In the case of a

mathematical model implemented by computer programme, verification establishes that

the program has been written correctly and that it behaves as intended. Validation refers

to building the correct model (O’Keefe et al., 1991). Most attempts at model validation

check agreement between the model and real system outputs or between the model and

expert opinions (Qureshi et al., 1999). Sensitivity analysis examines the extent of

variation in predicted performances when parameters are varied over some range of

interest. Sensitivity analysis provides information on the priority areas for refinement if

further versions of the model are to be developed (Qureshi et al., 1999).

The WQ multi-criteria evaluation model was tested on commercial European farms

during the WQ project and partly adjusted according to these results. Also,

classification of some of these farms was compared with the general impression of

observers who carried out audits of the farms (Botreau et al., 2009). Since publication

of the protocols, different studies on the validation of the measures used in the protocol

have been carried out (Temple et al., 2011a, b, 2012a, b, 2013), assessing whether the

measures included in the protocol are sensitive enough to distinguish between different

types of housing systems, and between farms. However, there are few studies which

have assessed whether the model is sensitive at criteria, principle or overall assessment

level, and whether it can distinguish between different farms (de Vries et al., 2013).

The aim of this paper was to validate an alternative multi-criteria evaluation model to

assess animal welfare on farms, within the WQ framework, employing, as an example, a

growing pigs’ welfare assessment. The objective was to compare the results obtained by

implementing our approach with the results obtained by using the approach proposed in

the WQ protocol, as well as assessing its sensitivity to distinguish between commercial

growing pigs’ farms and to demonstrate the relative importance of welfare measures in

the different steps of the multi-criteria aggregation process.

2 Material and methods

2.1 Data

Data collection took place between January 2013 and January 2014 on 8 German

growing pig farms in Schleswig Holstein. All the farms were assessed by the same

observer, who was trained to use the WQ assessment protocol for growing pigs

(Welfare Quality, 2009) by members of the WQ project group. The pigs on the farms

were housed either conventionally or according to the guidelines of the German animal

welfare label “Tierwohllabel” of the German animal welfare organisation “Deutscher

Tierschutzbund e.V.” (Tierschutzbund, 2013). Each farm was visited six times at two

consecutive growing periods. Thereby, during each of the two growing periods, three

assessments took place: the first protocol assessment two weeks after entry into the

growing stable at an average weight of the pigs of 40 kg (Farm Visit 1), the second in

the middle of the growing period at an average weight of 75 kg (Farm Visit 2) and the

third assessment two weeks before beginning of sales to the slaughterhouse at an

average weight of 100 kg (Farm Visit 3). Changes in management occurred on one of

the farms and due to this fact this farm was assessed only two times. In total, the

protocol was run 44 times. The entire WQ protocol for growing pigs was carried out at

each farm visit. Data were collected at pig and herd level, depending on the type of

measurement. After data collection, data were expressed as welfare measures at the herd

level. These welfare measures could be either quantitative or qualitative and were

expressed on different scales depending on the measure (e.g., percentage of lame

animals or coughs per animal in 5 minutes) following the WQ protocol (Welfare

Quality, 2009).

Table 1. Quantitative animal based measures with scoring scale (Welfare Quality,

2009).

Welfare measure Scale

Body condition 2 % lean pigs Bursitis 1 % pigs affected with moderate bursitis Bursitis 2 % pigs affected with severe bursitis Manure on the body 1 % pigs with 20-50% of body surface soiled with faeces Manure on the body 2 % pigs with >50% of body surface soiled with faeces Space allowance Sqm/ 100 kg pig Lameness 1 % pigs moderately lame Lameness 2 % pigs severely lame Wounds on the body 1 % pigs with moderate wounds on the body Wounds on the body 2 % pigs with severe wounds on the body Tail biting 2 % pigs with evidence of tail biting Twisted snouts 2 % pigs with evidence of twisted snout Pumping 2 % pig with laboured breathing Pneumonia % slaughter pigs with pneumonia Pericarditis % slaughter pigs with pericarditis Pleuritis % slaughter pigs with pleuritis Coughing Number of coughs per animal in 5 minutes Sneezing Number of sneezes per animal in 5 minutes Scouring % pens with liquid faeces Rectal prolapse 2 % pigs with evidence of rectal prolapse Skin condition 2 % pigs with ≥ 10 % of skin inflamed Milkspots % pigs slaughter with milkspots on liver Hernia 1 % pigs with hernia/rupture not bleeding or touching the floor Hernia 2 % pigs with hernia/rupture bleeding or touching the floor Mortality % mortality on farm during last year Negative behaviour % negative behaviour out of all social behaviour Exploratory behaviour % pen investigation out of exploration behaviours

% enrichment investigation out of exploration behaviours Human-animal relationship % pens showing panic response QBA descriptors 0-125 mm scale

2.2 Aggregation of welfare measures into criteria and principles

WQ proposes a three-step aggregation process (Welfare Quality, 2009), welfare

measures are aggregated into 12 criteria, these criteria are in turn aggregated into four

principles, and finally these four principles are combined into an overall assessment. In

this study we focused on the first two steps of the aggregation process (Figure 1).

Figure 1. Welfare Quality® bottom-up approach for integrating the data of the different

welfare measures into an overall assessment.

In the present study, two methodologies were used to produce criteria and principle

values from the data of the welfare measures collected in the farms observed: first,

following the WQ assessment protocol for growing pigs (Welfare Quality, 2009) and

second, following an alternative methodology which consisted of the use of MACBETH

and the Choquet integral in the context of the multi-attribute utility theory (MAUT).

Absence of prolonged hunger

Absence of prolonged thirst

Good feeding

Comfort around resting

Thermal comfort

Ease of movement

Good housing

Absence of injuries

Absence of disease

Good health

Social behaviour

Other behaviours

Good human-animal relationship

Appropriate behaviour

Positive emotional state

Measures Criteria Principles Overall value

Details of the aggregation of the measures into criteria and principles following both

methodologies are given in the annexed document.

2.2.1 Welfare Quality® (WQ)

-Aggregation of measures into criteria

In the first step, welfare measures were aggregated into the 12 corresponding criteria.

WQ used different types of aggregation of measures into criteria. For some criteria, the

numbers of moderate and severe problems were first combined with a weighted sum,

producing a measure index, on a scale from 0 (worst) to 100 (best). Afterwards, these

index values were converted into measure scores (expressed on the same 0-100 scale),

using spline functions (Ramsay, 1988) fitted by least-square methods. Finally, the

Choquet integral (CI) was used to combine the scores for the different measures into a

score for the criterion. For other criteria, the measures were first transformed into an

ordinal scale, which consisted of assigning warning or alarms, depending on the value

of the measures. The number of warnings and alarms were then combined into an index

for the criterion, and afterwards this index was converted into a criterion score using l-

spline functions. Decision trees were used to produce the criterion score for other

measures. Further information on the development and employment of these operators

can be found in Botreau et al. (2008, 2009) and Veissier et al. (2011).

-Aggregation of criteria into principles

In the second step, WQ used the CI to aggregate the 12 criteria into four principles. This

integral uses weights to combine the different criterion scores into one principle score

(expressed on the 0-100 scale), while limiting the possibility that a poor score of a

criterion is compensated by other excellent scores (Botreau et al., 2007; Veissier et al.,

2011).

2.2.2 Multiattribute Utility Theory (MAUT)

We developed a multi-criteria evaluation system which aimed to produce comparable

results to the methodology produced in the WQ protocol but remained more transparent

and flexible.

-Aggregation of measures into criteria

In the first step of the aggregation, MAUT (Keeney and Raiffa, 1976) was used to

produce a value of welfare for each criteria. The application of the MAUT consisted of

two separated steps, the utility function determination and the aggregation function

determination.

Utility function determination (MACBETH)

The utility function gives value to the measure in terms of welfare, it represents the

preferences of the decision-maker (DM) over the measures and its different values. For

example, 5% of lameness on a farm may be interpreted as a worse situation than 5% of

wounds on the body. There are different methods for utility function determination.

MACBETH (Measuring Attractiveness by a Categorical Based Evaluation Technique)

was chosen for several reasons: First, due to the available information on how to use

this method to facilitate a consensus among stakeholders (Parnell et al., 2013, Bana e

Costa et al., 2014), which is one of the main difficulties that a multi-criteria evaluation

system for animal welfare faces. Second, due to the fact that this method makes it easier

to judge the different attractiveness of options with an increasing number of criteria, due

to the use of qualitative judgments, and moreover, a scale of indifferent categories

(‘very weak’, ‘weak’, ‘moderate’, ‘strong’, ‘very strong’ or ‘extreme’), Bana e Costa et

al. (2004). Third, MACBETH allows for a comparison of not only qualitative

performance levels but quantitative performances too, with no need for a previous

conversion of the quantitative scales into a qualitative scale, allowing a solution to one

of the problems presented by Botreau et al. (2007). Fourth, the determination of the

utilities process remains transparent due to the extensive bibliography on it (Bana e

Costa et al., 1999, 2004) and it is easier to explain to the stakeholders due to the

interactive software provided (M-MACBETH).