Post on 24-Jan-2022
transcript
Aus dem Institut für Tierzucht und Tierhaltung
der Agrar- und Ernährungswissenschaftlichen Fakultät
der Christian-Albrechts-Universität zu Kiel
___________________________________________________________________
DEVELOPMENT OF A MULTI-CRITERIA EVALUATION
SYSTEM TO ASSESS ANIMAL WELFARE
Dissertation
zur Erlangung des Doktorgrades
der Agrar- und Ernährungswissenschaftlichen Fakultät
der Christian-Albrechts-Universität zu Kiel
vorgelegt von Ing. agr. Paula Martín Fernández
aus Madrid, Spanien
Dekan: Prof. Dr. Eberhard Hartung
Erster Berichterstatter: Prof. Dr. Joachim Krieter
Zweiter Berichterstatter: Prof. Dr. Eberhard Hartung
Tag der mündlichen Prüfung: 23.01.2015
___________________________________________________________________
Die Dissertation wurde mit dankenswerter finanzieller Unterstützung aus Mitteln des
Bundesministeriums für Bildung und Forschung im Rahmen des Kompetenznetzes der Agrar-
und Ernährungsforschung PHÄNOMICS angefertigt.
A Mis Padres
TABLE OF CONTENTS GENERAL INTRODUCTION………………………………………………………………..1 CHAPTER ONE
Comparison of methods to develop a multi-criteria evaluation system to assess animal welfare…………………………………………………………………………………...........5 CHAPTER TWO
Development of a multi-criteria evaluation system to assess growing pig welfare…………33
CHAPTER THREE
Validation of a multi-criteria evaluation model for animal welfare…………………………61
Annex………………………………………………………………………………………...89
GENERAL DISCUSSION………………………………………………………………….121
GENERAL SUMMARY……………………………………………………………………128 ZUSAMMENFASSUNG…………………………………………………………………...131 ACKNOWLEDGMENTS…………………………………………………………………..134 CURRICULUM VITAE…………………………………………………………………....135
1
GENERAL INTRODUCTION Concern about livestock living conditions has increased considerably in the last few
years. Consumers are increasingly linking animal welfare indicators with food safety
and quality. These consumers’ preferences create economic incentives for stakeholders
to meet animal welfare standards, as established by legislation or voluntary certification
schemes (Vapnek and Chapman, 2010). It is a generally accepted fact that animal
welfare is a multi-dimensional concept which compromises several aspects such as the
absence of thirst, hunger, discomfort, disease, pain, injuries and stress, and the presence
of normal behavioural expressions (the classical five freedoms (Farm Animal Welfare
Council (FAWC), 1992)). The EU Welfare Quality® (WQ) project developed several
protocols for the assessment of welfare of cattle, pigs and poultry (Botreau et al., 2009).
The inputs for the WQ protocols are on farm welfare measures described in the
protocols. Information at measure level may be useful for farm management purposes;
however, labelling purposes require a certain level of aggregation of the measures into
overall scores. Due to this fact, a multi-criteria evaluation model is required for the
evaluation of an animal unit (farm, slaughterhouse). The WQ protocols proposed a
multi-criteria evaluation system to aggregate the information of the welfare measures
into an overall assessment. Different operators (e.g., I-spline functions, decision trees,
weighted sums or Choquet integrals) were used for this purpose (Botreau et al., 2008).
The main drawback of the multi-criteria evaluation system proposed in the WQ
protocols is that it lacks of transparency and flexibility with respect to the I-spline
functions and the different aggregation operators used. There are other ways of
approaching the multi-criteria evaluation problem that differ from the ones used by the
WQ multi-criteria evaluation model, e.g., the multi-attribute utility theory (MAUT),
ELECTRE or the Analytic Hierarchy Process (AHP). In the MAUT, uni-dimensional
utility functions, which correspond to each criterion, are aggregated into a single global
utility function combining the whole of the criteria (Keeney and Raiffa, 1976), whereas
by using ELECTRE (outranking procedure) only the preference relations of pairs of
alternatives are aggregated (Roy, 1971); whilst in the Analytic Hierarchy Process
‘children’ nodes of a common ‘parent’ are aggregated using pair-wise comparisons
(Saaty, 1980). This thesis focuses on the MAUT. The application of MAUT consists of
two separated steps, the utility function determination and the aggregation function
2
determination. A large number of methods have been proposed to determine the utility
function in MAUT, for instance the standard sequences method described by Bouyssou
et al., (2000) and the MACBETH method described by Bana e Costa et al., (1999).
Examples of aggregation functions in MAUT are the weighted sum, the ordered
weighted average (Yager, 1989) and the Choquet integral (Choquet, 1953, Murofushi
and Sugeno, 1989, Grabisch, 1997).
Chapter One contains a comparison of different MAUT methods which can be applied
to produce an overall evaluation of animal welfare in the context of certification
schemes. This was performed with regard to the potential of these methodologies to
solve the main difficulties found in the literature faced by such a model, which are that
criteria may have different importance, and interactions may exist between them. This is
a key aspect since the welfare criteria may not fully compensate for each other (Botreau
et al., 2007). Two utility function determination methods (the standard sequences
method and the MACBETH method), and two aggregation functions (the weighted sum
and the Choquet integral (CI)) were compared. In the framework of MAUT, the use of
the MACBETH method together with the CI seemed to be the model which better
solved the difficulties presented.
In order to compare the different methodologies which could be used in the context of
MAUT, a theoretical model of a welfare assessment for growing pigs was used
considering only four criteria, good feeding, good housing, good health and appropriate
behaviour. Due to this fact, in Chapter Two, the application of the MACBETH method
together with the CI based on a real welfare assessment, such as the WQ protocol for
growing pigs (Welfare Quality, 2009), was presented by means of examples.
Throughout this study the different multi-criteria methods used in the WQ protocol
were also compared with the unique methodology proposed in this study.
After the development of any multi-criteria evaluation system, a validation of the model
must be carried out in order to prove that it works as intended in practical conditions
(Qureshi et al., 1999). In Chapter Three, the MAUT methodology proposed in
Chapter Two was implemented to aggregate welfare data which was collected in
different growing pig farms in Schleswig-Holstein, Germany. In total, 44 observations
were carried out. The whole WQ assessment protocol for growing pig farms was
implemented in each observation. The results obtained for each observation were
compared with the results obtained by implementing the multi-criteria methodology
3
proposed in the WQ protocol. Also, the influence of variations in the welfare measure
values was estimated in order to assess the sensitivity of the model.
Overall, the thesis provides a multi-criteria evaluation model for animal welfare, the use
of which has been implemented in the context of the Welfare Quality® protocol for
growing pigs.
References
Bana e Costa CA, de Corte JM and Vansnick JC 1999. The MACBETH approach:
Basic ideas, software, and an application, in: Meskens, N., Roubens, M., (Eds.),
Advances in Decision Analysis. Kluwer Academic Publishers, Book Series:
Mathematical Modelling: Theory and Applications, vol. 4, pp.131-157.
Botreau R, Bracke MBM, Perny P, Butterworth A, Capdeville J, van Reenen CG and
Veissier I 2007. Aggregation of measures to produce an overall assessment of
animal welfare. Part 2: Analysis of constraints. Animal 1, 1188-1197.
Botreau R, Capdeville J, Perny P and Veissier I 2008. Multi-criteria evaluation of
animal welfare at farm level: an application of MCDA methodologies.
Foundations of Computing and Decision Science. 33, 1-18.
Botreau R, Veissier I and Perny P 2009. Overall assessment of animal welfare: Strategy
adopted in Welfare Quality. Animal Welfare. 18, 363-370.
Bouyssou D, Marchant T, Perny P, Pirlot M, Tsoukias A and Vincke P 2000. Evaluation
and decision models: A critical perspective. Kluwer, Dordrecht.
Choquet G 1953. Theory of capacities. Annales de l’Institut Fourier. 5, 131-295.
Farm Animal Welfare Council 1992. FAWC updates the five freedoms. The Veterinary
Record, 17, 357.
Grabisch M 1997. K-Order additive discrete fuzzy measures and their interpretation.
Fuzzy sets and systems. 92, 167-189.
Keeney LR and Raiffa H 1976. Decisions with multiple objectives: Preferences and
values tradeoffs. Wiley, New York.
Murofushi T and Sugeno M 1989. An interpretation of fuzzy measure and the Choquet
integral as an integral with respect to a fuzzy measure. Fuzzy sets and systems.
29, 201-227.
Qureshi ME, Harrison SR & Wegener MK 1999. Validation of multi-criteria analysis
models. Agricultural Systems. 62, 105-116.
4
Roy B 1971. Problems and methods with multiple objective functions. Mathematical
Programming. 1, 239-266.
Saaty TL 1980. The Analytic Hierarchy Process: Planning, priority setting, resource
allocation. McGraw-Hill, New York.
Vapnek, J and Chapman M 2010. Legislative and regulatory options for animal welfare.
FAO Legislative study 104, FAO, Rome.
Welfare Quality 2009. Welfare Quality® Assessment Protocol for Growing Pigs.
Lelystad: Welfare Quality® Consortium.
Yager R 1988. On ordered weighted averaging operators in multi-criteria decision
making. IEEE Transactions on Systems, Man and Cybernetics. 18, 183-190.
5
CHAPTER ONE
Comparison of methods to develop a multi-criteria
evaluation system to assess animal welfare
P. Martín 1, I. Traulsen 1, C. Buxadé 2 and J. Krieter 1
1 Institute of Animal Breeding and Husbandry, Christian-Albrechts-University, Kiel, Germany
2 Animal Production Department, Polytechnic University, Madrid, Spain
6
Abstract
The aim of this paper was to create a model to review different methodologies which
can be applied to produce an overall evaluation of animal welfare in the context of
certification schemes. This was performed with regard to the potential of these
methodologies to solve the main difficulties found in the literature faced by such a
model. Welfare Quality® distinguishes four welfare criteria (good feeding, good
housing, good health and appropriate behaviour). Data for growing pigs farms was
generated, with each farm receiving one score for each welfare criteria. Ten farms were
used as learning data and the complete dataset generated was used to exemplify the
differences between the methods. The multi-attribute utility theory (MAUT) was used
to produce an overall value of welfare. The utility functions and the aggregation
function were constructed in two separated steps. First, utility functions for each
criterion were determined in two different ways, using the standard sequences method
(SS) and the MACBETH software. In the second step, the weighted sum (WS) and the
Choquet integral (CI) were used as aggregation functions. The utilities derived from
MACBETH allowed us to model more adequately the preferences of the decision-maker
regarding the different importance of the criteria and the interaction between them than
the SS method. A comparison of the WS and the CI results obtained from each method
was carried out. The results showed that there were interactions between the criteria,
assuming independence among the criteria led to important differences in the
classification of the farms.
Keywords: Animal Welfare, assessment, methods, pigs.
7
1 Introduction
Concern about livestock living conditions has increased considerably in the last few
years and consumers have also been increasingly linking animal welfare indicators with
food safety and quality. These consumer preferences create economic incentives for
stakeholders to meet animal welfare standards, as established by legislation or voluntary
certification schemes (Vapnek and Chapman, 2010). It is a generally accepted fact that
animal welfare is a multidimensional concept which compromises several aspects such
as the absence of thirst, hunger, discomfort, disease, pain, injuries and stress, and the
presence of normal behavioural expressions (the classical five freedoms (Farm Animal
Welfare Council (FAWC), 1992)). Due to this fact the assessment of animal welfare
must be based on several measures. Information at measure level may be useful for farm
management purposes; however, labelling purposes require a certain level of
aggregation into overall scores (Blokhuis et al., 2010). To determine an overall level of
animal welfare, measures need to be combined. Although it has been argued that
science should not attempt to perform overall welfare assessments because value
judgements are inherently involved (Fraser, 1995), others state that an overall welfare
assessment is not arbitrary and a high level of accuracy can be achieved (Bracke et al.,
1999). In spite of the different viewpoints, various models have been developed to
assess overall levels of animal welfare. More recently, Welfare Quality (WQ) has
developed several protocols for the overall assessment of the welfare of cattle, pigs and
poultry (Welfare Quality, 2009).
A common feature of all the approaches in multi-criteria decision-making is the need
for an aggregation operator. In the multi-attribute utility theory (MAUT), uni-
dimensional utility functions which correspond to each criterion are aggregated into a
single global utility function combining all the criteria (Keeney and Raiffa, 1976),
whereas in ELECTRE (outranking procedure) the preference relations on pairs of
alternatives are aggregated (Roy, 1971) and in the Analytic Hierarchy Process (AHP)
‘children’ nodes of a common ‘parent’ are aggregated using pair-wise comparisons
(Saaty, 1980). Examples of aggregation functions in MAUT are the weighted sum
(WS), the ordered weighted average (Yager, 1988) and the Choquet integral (CI)
(Murofushi and Sugeno, 1989). The most common aggregation tool still used today is
the WS, with all its well-known drawbacks. The WS can be used as an aggregator when
8
mutual preferential independence among criteria is assumed. However, in practice, this
mutual preferential independence is rarely verified. In order to be able to take into
account the interaction between the criteria, Sugeno (1974) proposed substituting the
weight vector involved in the calculation of the WS for a fuzzy measure (also called
capacity). The fuzzy integrals, such as the CI, are defined from the concept of a fuzzy
measure. The capacity with respect to the CI can be seen as an extension of the weight
vector with respect to the WS (Grabisch et al., 2008). The distinguishing feature of a CI
is that it is able to represent a certain kind of interaction, ranging from redundancy
(negative interaction) to synergy (positive interaction) (Grabish, 1996).
The aim of this study was to create a model to compare different methodologies which
can be used in the context of the MAUT. These could then be applied to develop a
multidimensional estimation system in order to produce an overall evaluation of animal
welfare in the context of certification schemes. In the framework of MAUT, a
comparison was undertaken between two methods of utility function determination (the
standard sequences method and the MACBETH method) and two aggregation methods
(the WS and the CI). These different methods were used with the objective of finding
the method which better solves the main difficulties found in the literature faced by
such a model. The main difficulties the model faces are that criteria may have different
levels of importance, and interactions may exist between them, this being a key aspect
that the welfare criteria may not fully compensate for each other (Botreau et al., 2007b).
2 Material and methods
2.1 Data
In order to compare the different methodologies which can be used in the context of
MAUT, a theoretical model of a welfare assessment for growing pigs was used
considering four criteria, good feeding (F), good housing (Ho), good health (He) and
appropriate behaviour (B), corresponding to the four main WQ principles. Each of these
criteria was assessed by a different number of measures. Values of the measures were
established which check whether each criteria could be 0 and 1 (absence or presence).
In this way, and considering a linear combination (sum) of the values of the measures to
9
produce the criteria value, if a criterion is assessed by three measures, it can take four
different values: 0, 1, 2 and 3. Thus, good feeding was defined by four measures, and
thus could vary between 0 (worst) and 4 (best), good housing by 7 measures varying
between 0 and 7, good health by 13 measures, varying between 0 and 13, and
appropriate behaviour between 0 and 4, assessed by 4 measures. These scales were
elicited in this way instead of establishing intervals between 0 and 100, so they
represent raw data which was not interpreted in terms of welfare and can allow the
study of the potential of the different methods to work in a future step of the project
with measures collected in different units or scales.
Data from ten farms regarding the four criteria were selected as learning data (Table 1)
from which the decision-maker (DM) had to express his preferences. These consisted of
giving a partial weak order over the set of weights related to each criterion (W in Table
1), the sign of interaction between the 6 pairs of criteria ((F, Ho), (F, He), (F,B), (Ho,
He), (Ho, B), (He, B)) and a partial weak order (R) over the farms (Table 1) taking into
account both the different importance of the criteria and the interactions between them.
Farms a, b, c, d and e were selected to assess how the DM perceived the different
importance of the criteria. For these 5 farms, 3 of the criteria were assigned a good
value and only one of the criteria corresponded to a medium value. Farms g, h, i and j
were selected to assess how the DM perceived the interaction between a bad grade in
one criterion and medium values in the other criteria.
A second dataset consisting of 2,800 farms from the combination of all the possible
values for the four criteria was generated in order to obtain an absolute impression of
the influence of using the different methods not limited to the relative comparison of a
small dataset (learning data).
10
Table 1. Criteria values for each selected farm (learning data) and initial preferences of
the decision-maker.
Farm Feeding¹ Housing² Health³ Behaviour4 R
a 2 5 10 3 1
b 3 3 10 3 2
c 3 5 7 3 3
d 3 5 10 2 4
e 3 5 6 3 5
f 2 3 6 2 6
g 0 3 6 2 7
h 2 1 6 2 8
i 2 3 4 2 9
j 2 3 6 1 10
W + ++ +++ +++
¹Feeding values can vary between 0 (worst) and 4 (best).
² Housing values can vary between 0 (worst) and 7 (best).
³ Health values can vary between 0 (worst) and 13(best). 4 Behavioural values can vary between 0 (worst) and 4 (best).
R: DM’s ranking over the farms
W: Initial notions of the DM about the importance of the weights.
Bad grade; medium grade; good grade.
2.2 General methodology
The MAUT was used to produce an overall value of welfare starting from the data
regarding the four main criteria. The utility functions and the aggregation functions
were constructed in two separated steps (Figure 1). A comparison was made between
the two methods of utility function determination, i.e. the standard sequences method
(SS) described by Bouyssou et al. (2000) and the MACBETH method described by
Bana e Costa et al. (1999), and also two aggregation methods, i.e. the WS and the CI.
11
Figure 1. General methodology followed in the study.
The results obtained via the different utility function determination methods and the
aggregation operators were also compared. The rankings of the overall utilities obtained
for the 10 farms selected as learning data were compared. However, in order to obtain
an absolute impression - not limited to the relative comparison of a small dataset - of the
influence of taking the interactions between the criteria into account, four welfare
categories were defined which match the ones proposed by Welfare Quality (2009):
unacceptable (overall utility < 20), acceptable (overall utility >20 but < 55), enhanced
(overall utility > 55 and < 80) and excellent (overall utility > 80). The MACBETH
overall utilities obtained for the complete dataset (2,800 farms) through the WS and the
CI were classified into one of the four categories and the number of farms assigned to
each welfare category were compared between aggregation methods.
2.3 MAUT - Utilities determination
For the utility function determination, each criterion was considered separately. The
utility function ui represents the preferences of the DM over the criteria Xi. The utilities
can be seen as providing numerical representation of the attractiveness of the different
values of the criteria for the DM. A large number of methods have been proposed to
Feeding Utility
SS
MACBETH
Housing Utility
SS
MACBETH
Health Utility
SS
MACBETH
Behaviour Utility
SS
MACBETH
Choquet integral
Weighted Sum
Overall Utility
Individual utilities determination Aggregation into an overall utility
12
determine the utility functions in an additive multi-attribute utility model, see von
Winterfeltd and Edwars, (1986) for an accessible account of such methods. There are
essentially two families of methods, one based on direct numerical estimations and the
other on indifference judgements. We chose two methods from the latter category, the
MACBETH method (Bana e Costa et al., 1999) and the SS method (Kranztz et al.
(1971), von Winterfeldt and Edwards (1986), Wakker (1989), Bouyssou et al (2000));
since utilities which are spontaneous might not be as reliable as utilities which follow a
methodology to construct them, see Bouyssou et al. (2006) and Bana e Costa et al.
(2004) for a deeper review. These two methods were chosen for two reasons; first we
wanted to compare a methodology based on qualitative judgements (MACBETH) with
a method based on quantitative judgements (SS), and second due to the extensive
literature available on these two methods.
2.3.1 Standard sequences method
To elicit a utility function (ui), for example uHo corresponding to Housing, the SS
method starts by considering two hypothetical farms which differ only in the feeding
and housing criteria. Ceteris paribus is considered for the performance levels of the
other criteria. Then, it is assumed that the two farms differ in Feeding by a noticeable
amount (1 point for instance). An interval of this amplitude is located in the middle of
the range for Feeding; say for example 1-2. Then, a value for Housing is also set in the
middle of the range, say 3. Then, the DM is asked to assess a value of Housing (XHo)
such that he would be indifferent towards the two farms (1, 3) and (2, XHo). The second
question to the DM uses his answer to the first question (he is asked to assess the value
X’Ho of Housing that would leave him indifferent towards the two farms (1, 2) and (2,
X’Ho). Continuing along the same line would lead for instance to the following
sequences of indifference:
(1, 3) ~ (2, 2)
(1, 2) ~ (2, 0)
Then, similar questions are asked for the upper half of the range of the Housing, which
may lead to the following sequences of indifference.
(1, 5) ~ (2, 3)
(1, 7) ~ (2, 5)
13
In other words, the DM considers that a farm with a score of 1 in Feeding and a score of
3 in Housing (considering ceteris paribus in the other two criteria) is equal in terms of
preference to a farm with a score of 2 in Feeding and a score of 2 in Housing. A farm
with a score of 1 in Feeding and a score of 2 in Housing is thus considered equal to a
farm with a score of 2 in Feeding and a score of 0 in Housing. A farm with a score of 1
in Feeding and a score of 5 in Housing is considered equal to a farm with a score of 2 in
Feeding and a score of 3 in Housing, and finally a farm with a score of 1 in Feeding and
a score of 7 in Housing is considered equal to a farm with a score of 2 in Feeding and a
score of 5 in Housing. Such a sequence gives the analyst an approximation of the
single-attribute utility function for Housing uHo. The final step is to normalise the
individual utility function of each criterion in a (0-100) interval in order to be able to
aggregate the marginal utility functions for the different criteria.
To determine uHe (Health) and uB (Behaviour) in the same way as for Housing, a
successive search was carried out for intervals on the Health and Behaviour scales
which would exactly compensate the Feeding interval 1 - 2 in terms of preference.
Finally, the same recording was made for Feeding itself (uF), fixing an interval for
instance on the Housing of 2 - 3.
2.3.2 MACBETH
MACBETH (Measuring Attractiveness by a Categorical Based Evaluation Technique)
is a methodology described by Bana e Costa et al. (1999), which requires only
qualitative judgements to quantify the relative attractiveness (utilities) of options
(farms). To elicit a marginal utility function (ui) using the MACBETH software, for
example uHo corresponding to Housing, the first step is to fill in a matrix, giving
qualitative judgements regarding the difference of attractiveness between the different
quantitative performance levels of the criterion. For instance, for Housing, the
quantitative performance levels vary between 0 and 7. The qualitative judgements of
difference can be rated as ‘very weak’, ‘weak’, ‘moderate’, ‘strong’, ‘very strong’ or
‘extreme’, (Figure 2a).
14
Figure 2a. MACBETH matrix of qualitative judgements. Quantitative performance
levels for Housing.
As each judgement is given, the software automatically verifies the matrix’s consistency
(Figure 2b), and suggests judgement modifications which can be made to fix any
detected inconsistency (Figure 2c).
Figure 2b. MACBETH matrix of qualitative judgements. Example of building a
consistent matrix.
15
Figure 2c. MACBETH matrix of qualitative judgements. Example of inconsistency.
From the complete and consistent matrix of judgements, MACBETH creates a
numerical scale (Figure 2d). With the numerical scale, MACBETH produces the
marginal utility function (u) for each criterion. The range in which the utilities vary was
defined in this study as 0-100 in order to be in accordance with the SS method.
Figure 2d. MACBETH matrix of qualitative judgements. Complete matrix of
judgments and numerical scale.
16
2.4 MAUT - Aggregation methods
In the second step, all the criteria were considered together. Here, the weighted sum
(WS) and the Choquet integral (CI) were used as aggregation functions in order to
evaluate the differences in the output of taking the interactions between criteria into
account and considering that the welfare criteria behaved as independent criteria.
2.4.1 Weighted sum
After the SS technique and the MACBETH method, 4 utility functions were present
where 0 was the worst performance and 100 was the best performance for each
criterion. Weights would have had to be used to additively combine these values using
the WS. The DM was asked to provide some initial notions on the importance of the
weights (W in Table 1). Thus, a test was performed to determine whether the same
weighting vector was obtained when two different methods were implemented to elicit
them:
Firstly, following a method suggested by Bouyssou et al. (2006), and described first by
Keeney and Raiffa (1976). The interest in this technique is that the weights are not
obtained by asking the DM to give the value of the parameters (direct rating procedure).
Instead, the DM is asked to rank alternatives, and the different importance values of the
criteria are determined from this ranking, following a determined procedure which uses
the utility functions previously determined.
Secondly, the weighting of the criteria was performed within the MACBETH software
following the same procedure as described for the elicitation of the utilities, in other
words, giving qualitative judgements regarding the difference of attractiveness between
criteria.
The same weighting vector was obtained using the Keeney and Raiffa (1976) technique
and the MACBETH methodology. The utilities calculated by the SS method and the
MACBETH methodology were aggregated with the WS using the weighting vector
obtained.
17
2.4.2 Choquet integral
In order to combine the 4 utility functions calculated by the SS technique or by the
MACBETH method using the CI, the first step was the capacity identification.
Capacities can be regarded as a weighting vector involved in the calculation of weighted
sums. Seen as an aggregation operator, the CI with respect to the capacity can be
considered as taking into account the different importance of the criteria and the
interaction between criteria. The overall importance of a criterion can be measured by
its Shapley value and the interaction between criteria can be measured by the interaction
indices. The interaction phenomena among criteria can be very complex and difficult to
identify. Different forms of dependence exist, for instance, correlation,
substitutive/complementary, and preferential dependence (Marichal, 2000). In this
study, the DM regarded the criteria as complementary (positive interaction) or
substitutive (negative interaction). According to the definition of Marichal, (2000)
subtitutiveness between criteria can be understood as when a decision maker demands
that the satisfaction of only one criterion produces almost the same effect than
satisfaction of both. Of course, it is better that they be good on both directions, but it is
less important. For instance, in this study and considering two criteria i and j, they
would be regarded as substitutive when it is important that farms are good at criterion i
or j, in other words, compensation is allowed between them, but they will be considered
complementary when for the DM the satisfaction of only one criterion produces a very
weak effect compared with the satisfaction of both.
The number of variables involved in the CI increases exponentially with the
coefficients, which define a capacity. For reasons of simplicity, it may be preferable to
restrict to 2-additive or 3-additive solutions (Gabrisch et al., 2008), which in this study
corresponded to the definition of 10 or 14 coefficients respectively. We proposed
restricting the model to the 2nd order, thus assuming that interaction between more than
2 criteria does not exist. Due to the fact that in this example although only 4 criteria
were considered and the difference in coefficients to be determined between a 2nd and a
3rd order was small, more criteria in a further step of the project may have to be
considered. If, for instance, 6 health criteria are aggregated, 21 coefficients will be
needed with a 2-additive model and 41 with a 3-additive one. The number of variables
involved in the CI increases exponentially with the coefficients which define a capacity.
Let us consider a decision problem involving a set X of n elements, here
18
(criteria). Defining a capacity on X requires the definition of coefficients. This
could be too complex to handle if n goes beyond, say 8 (Grabisch, 1997). As a
consequence it is frequent to consider that the capacity is additive, what identifies the
Choquet integral with the weighted arithmetic mean (Marichal, 2000), and that can be
defined with only n coefficients, at the price of a very poor modelling tool, avoiding in
this way the complexity of using non-additive capacities but also losing their richness
(Kojadinovic, 2007). The fundamental notion of k-additive proposed by Grabisch
(1997) enables to find an intermediate solution between the complexity of
representation and the richness of the model. K-additive measures for need less
than coefficients to be defined. Only n coefficients are needed for (additive
capacity), for , and in general for k-additive measures.
According to Mayag et al. (2011) given (x1,…, xn) the individual utilities for the
criteria, in this study (xF, xHo, xHe, xB) the individual utilities for Feeding, Housing,
Health and Behaviour respectively, the CI with respect to a 2-additive capacity can be
written as follows:
Where vi represents the importance of the criterion i and Iij represent the interaction
between criteria i and j.
There are different methods for capacity identification proposed in the literature. Most
of them can be stated as optimisation problems. The main differences between them are
the objective function and the preferential information they require as input. The
minimum variance approach was used, which requires only a partial order over the
farms as preference information. Capacity identification was implemented within the
Kappalab R package following the method described by Grabisch et al. (2008). The
utilities calculated using MACBETH and the SS method corresponding to the criteria
data for the 10 farms were used as subsets against which the capacity was to be
identified, in order for the CI to numerically represent the preferences of the DM with
respect to this capacity. The partial weak order over the farms (R in Table 1) given by
the DM was used for the implementation of the minimum variance approach (MV). A
19
non-negative indifference threshold for the ranking over the farms was defined so the
partial weak orders previously mentioned were translated into partial semi-orders with
fixed indifference thresholds, see Grabisch et al. (2008) for a deeper review. The values
of the thresholds had to be chosen carefully, since a very large indifference threshold
could have made the program infeasible, see Marichal and Roubens (2000) for a deeper
review. The indifference threshold for the ranking of the alternatives was established as
0.05.
After an initial calculation of the CI with the MV, a progressive interactive approach
was developed in order to be in accordance with the DM’s initial preferences regarding
the importance of the criteria (Shapley values) and the interaction indices (MV’). Non-
negative indifference thresholds for the Shapley values and for the interaction indices
were defined. The indifference threshold established to regard the criteria as different
was 0.05 and the minimal absolute value of an interaction index to be considered as
significantly different from zero was established as 0.05.
Additional constraints on the Shapley values were imposed, so the importance of the
criteria followed the order determined before following the DM preferences (W in
Table 1) and additional constraints on the interactions indices were imposed so the
criteria were regarded as complementary and compensation was limited between them
(positive interaction between the 6 pairs of criteria (F, Ho), (F, He), (F, B), (Ho, He),
(Ho, B), (He, B).
2.5 Estimation of the importance of the interactions between criteria
The utility functions determined before with MACBETH were used to produce a utility
value for each criterion for the 2,800 farms. In order to demonstrate the importance of
taking into account the interaction between the criteria to produce an overall assessment
of farm animal welfare, the individual utilities for each criterion were aggregated
additively (with a weighting vector, WS) and non-additively (with the CI). For the CI
aggregation, the coefficients (Shapley values and interaction indices) obtained by the
MV’ approach for the MACBETH method were used and for the WS aggregation only
the Shapley values of the MV’ approach (WSMV’) were used as weights. The objective
was to estimate the number of farms that changed their welfare category due to the
inclusion in the model of the interactions between the criteria and the limitation of the
20
compensation between them. Each one of the 2,800 farms was assigned to a welfare
category (unacceptable, acceptable, enhanced and excellent). The number of farms
assigned to each welfare category were compared when the criteria were considered as
independent criteria and when the interactions between the criteria were taken into
account limiting the compensation between them.
3 Results
3.1 Utility function determination methods
The differences between the utility functions calculated using the SS method and the
MACBETH method were in general minor, except for the lowest value of Behaviour,
where a difference between the utilities of both methods greater than 10 was found
(Figure 3).
Figure 3. Utility functions calculated using the SS method (−−−) and the MACBETH
method (───) for Feeding, Housing, Health and Behaviour
3.2 Aggregation methods - Weighted sum
The resulting weighting vector following the Keeney and Raiffa technique and the
MACBETH method matched well, and were in accordance with the initial preferences
21
of the DM (W in Table 1). For both methods, the importance of the criteria conformed
to the following sequence:
Health (0.3333) = Behaviour (0.3333) > Housing (0.2223) > Feeding (0.1111)
These weights were used for the aggregation of the individual utilities calculated using
the SS and the MACBETH methods.
3.2.1 Standard sequences
The ranking of the 10 farms’ utilities obtained after aggregating with the WS, i.e. the
individual utilities calculated with the SS method (Table 2), was different from the
ranking over the farms given by the DM as initial preferences (Table 1). For farms a, b,
c, d, e and f, the ranking of the utilities was coincident with the initial DM preferences,
but completely different for farms g, h, i and j.
Table 2. Partial utilities calculated with the standard sequences method and overall
utilities and rankings (R) computed using the weighted sum (WS) and the Choquet
integral (CI) with the different approaches implemented, the minimum variance (MV)
and the minimum variance with Shapley value and interaction indices constrains (MV’).
Farm
Partial utilities WS CI
F Ho He B Overall
utility
R Overall utility
(MV)
R Overall utility
(MV’)
a 50 75 70 66.66 67.78 1 67.70 1 NS
b 75 50 70 66.66 65 2 66.38 2 NS
c 75 75 40 66.66 60.55 3 61.86 3 NS
d 75 75 70 33.33 59.44 4 59.02 4 NS
e 75 75 30 66.66 57.22 5 58.97 5 NS
f 50 50 30 33.33 37.78 6 38.18 6 NS
g 0 50 30 33.33 32.22 7=8 32.50 7 NS
h 50 12.5 30 33.33 29.44 10 32.45 8 NS
i 50 50 10 33.33 31.11 9 32.40 9 NS
j 50 50 30 16.66 32.22 7=8 32.35 10 NS
F: Feeding; Ho: Housing; He: Health; B: Behaviour. NS: No solution.
22
3.2.2 MACBETH
The ranking obtained after aggregating with the WS, the individual utilities calculated
with the MACBETH method (Table 3) and the ranking over the farms provided by the
DM as initial preferences (Table 1) were equal except for farms c and d. MACBETH
did not distinguish between them whereas the DM preferred farm c to farm d.
Table 3 Partial utilities calculated with MACBETH and overall utilities and rankings
(R) computed using the weighted sum (WS) and the Choquet integral (CI) with the
different approaches implemented, the minimum variance (MV) and the minimum
variance with Shapley values and interaction indices constraints (MV’).
Farm
Partial utilities WS CI
F Ho He B Overall
utility
R Overall utility
(MV)
R Overall utility
(MV’)
R
a 55 75 65 65 66.11 1 65.21 1 64.52 1
b 80 50 65 65 63.33 2 65.16 2 61.22 2
c 80 75 40 65 60.56 3=4 63.07 3 58.51 3
d 80 75 65 40 60.56 3=4 63.02 4 58.46 4
e 80 75 30 65 57.22 5 60.12 5 54.67 5
f 55 50 30 40 40.56 6 42.49 6 39.27 6
g 0 50 30 40 34.45 7 35.22 7 29.77 7
h 55 15 30 40 32.78 8 35.17 8 29.72 8
i 55 50 5 40 32.22 9 35.12 9 29.67 9
j 55 50 30 5 28.89 10 32.31 10 26.26 10
F: Feeding; Ho: Housing; He: Health; B: Behaviour.
3.3 Aggregation methods - Choquet integral.
3.3.1 Standard sequences
The overall utilities for the 10 farms computed using the CI with respect to the 2-
additive solutions are given in Table 2. For the MV approach, the results follow the
23
partial weak order provided by the DM at the beginning, and comply with the
indifference threshold established by the DM (0.05). Note that the differences between
the overall utilities of farms g, h, i, and j, are exactly equal to 0.05, which is exactly the
indifference threshold. The Shapley values and the interaction indices of the 2-additive
solution obtained by means of the MV approach are given in Table 4.
Table 4 Coefficients of the weighted sum (WS) the Choquet integral obtained by the
minimum variance approach (MV) and by the minimum variance approach with
constraints on the Shapley values and on the interaction indices (MV’), to aggregate
individual utilities calculated using the SS method and the MACBETH method.
Shapley values Interaction indices
F* Ho* He* B* F,Ho F,He F,B Ho,He’ Ho,B He,B
SS
WS 0.111 0.222 0.333 0.333 - - - - - -
MV 0.183 0.226 0.278 0.312 -0.151 0.008 -0.029 0.027 0.054 -0.014
MV’ NS NS NS NS NS NS NS NS NS NS
MACBETH
WS 0.111 0.222 0.333 0.333 - - - - - -
MV 0.228 0.233 0.266 0.273 -0.048 0.019 0.019 0.018 0.007 0.022
MV’ 0.139 0.241 0.309 0.311 0.05 0.05 0.05 0.05 0.05 0.05
F: Feeding; Ho: Housing; He: Health; B: Behaviour. NS: No solution.
The importance of the criteria followed the next order: Behaviour > Health > Housing >
Feeding. This order over the overall importance of the criteria was not completely in
accordance with the initial preferences of the DM. In the interaction indices, it should be
noted that there was a strong negative interaction between Feeding and Housing (-
0.151). Feeding also negatively interacted with Behaviour, and Health interacted
negatively with Behaviour. There was no solution for the MV’ approach, due to the fact
that the model was not compatible with the three constraints imposed: ranking over the
farms, Behaviour = Health > Housing > Feeding, and all criteria regarded as
complementary (with indifference thresholds of 0.05, 0.05 and 0.05 respectively).
24
3.3.2 MACBETH
The overall utilities computed using the CI with respect to the 2-additive solutions for
the 10 farms are given in Table 3. Note that, as expected, for the MV approach the
results follow the partial weak order provided by the DM as an initial preference. It
should also be noted that the differences between the overall utilities of farms a and b, c
and d, g and h, and between h and i, are exactly equal to 0.05, which is the indifference
threshold. The Shapley values and the interaction indices of the 2-additive solutions for
MACBETH are given in Table 4. For the MV approach, the importance of the criteria
followed the order: Behaviour > Health > Housing > Feeding, which was not
completely in accordance with the initial preferences of the DM. All pairs of criteria
interacted positively except for Feeding and Housing, which interacted negatively.
For the MV’ approach, the constraints for both the interaction indices (indifference
threshold 0.05), and the Shapley values (indifference threshold 0.05) imposed by the
DM were satisfied, these being all the criteria complementary (positive interaction) and
following the Shapley values, the order: Health = Behaviour > Housing > Feeding
(Table 4). If these utilities (MV’) were compared with the initial ones without any
constraint (MV), three main facts could be noticed: first, the ranking over the farms
remained equal; second, the farms had lower values, an effect that was even more
marked in farms g, h, i and j; and third, the utilities of the MV approach decreased when
the compensation between the criteria was limited (MV’), this effect was stronger for
farms g, h, i and j, which are the farms that were elicited to evaluate compensation
between good and bad grades.
3.4 General dataset
When the number of farms assigned to each welfare category by using the MV’ Shapley
values and interaction indices and using the MV’ Shapley values as if they were the
coefficients of a weighted sum (WSMV’) were compared, it was noted that in the first
case 485 farms were classified as unacceptable, 1,788 farms as acceptable, 475 as
enhanced and 52 as excellent, whereas in the second case 407 farms were classified as
unacceptable, 1,574 as acceptable, 697 as enhanced and 122 as excellent. The number
25
of farms which changed to a higher or lower classification when the interaction indices
were not used in the aggregation are shown in Table 5.
Table 5 Number of farms changing to a higher or lower classification when the Shapley
values of the MV’ approach were used as the coefficients of a weighted sum instead of
the minimum variance with Shapley values and interaction indices constraints (WSMV’).
Original class Farms changed to class:
Unacceptable Acceptable Enhanced Excellent
Unacceptable (n=485) 373 112 0 0
Acceptable (n=1788) 34 1455 299 0
Enhanced (n=475) 0 7 398 70
Excellent (n=52) 0 0 0 52
4 Discussion
The animal welfare multi-criteria evaluation was constructed in two separated steps.
First, utility functions for each criterion were determined in two different ways, using
the SS method and the MACBETH software. In the second step, the WS and the CI
were used as aggregation functions. For the CI capacity identification, minimum
variance (MV) and minimum variance with constraints (MV’) approaches were used.
The main problem found in the utility functions determination with the SS method was
that they are determined on the basis of a linear transformation. For the utility function
of Behaviour (Figure 3), an increase in Behaviour from a score of 2 to a score of 3 had a
utility for the DM of one unit, an increase from 3 to 4 also had a utility of one unit, and
an increase from a score from 0 to 2 was corresponded by a utility of one. Due to the
linear transformation which follows the model, an increase in Behaviour from 0 to 1
passively corresponds to an increase of half a unit. However, there is no opportunity to
assign a lower or a higher value, which can lead to overestimating or underestimating
the utility values a DM would like to assign to a determine performance of one criteria.
It must be pointed out that when the number of performance levels of a criteria
decreases, this under/over estimation can become higher even making the model
unfeasible.
26
The rankings obtained after aggregating with the WS, i.e. the individual utilities
calculated using the SS method (Table 2) and the MACBETH method (Table 3) were
very different. Compared to the ranking over the farms given by the DM as initial
preferences, MACBETH was the method that better fitted the DM preferences, with
only a different ranking for farms c and d, whereas the SS method presented several
ranking reversals between farms g, h, i and j, which were the farms that were elicited to
estimate how the methods behave when a criterion had a very low value and the other
criteria presented medium-high values. In other words, they were elicited to study the
preferences of the DM regarding the compensation between good grades and bad
grades. This difference between the rankings appeared to be related to the problem
presented above, i.e. the SS method did not allow the DM to assign lower values for
Housing and Behaviour, and this led to a non-accurate interpretation of the DM’s
preferences, implying that the ranking over the overall utilities differed from the DM’s
initial preferences’ ranking.
The results of the MV approach, both for the SS method (Table 2) and the MACBETH
method (Table 3), followed the partial weak order provided at the beginning. The
Shapley values obtained using both methods conformed to the same sequence which
was not completely in accordance with the DM preferences although the differences
were minor. However, the major difference between the methods and the DM
preferences were the values of the interaction indices. The DM considered all the
criteria as complementary; however, there was a negative interaction between Feeding
and Housing for the MACBETH method and a strong negative interaction between
Feeding and Housing for the SS method. There were also interactions between Feeding
and Behaviour, and Health and Behaviour. In an initial calculation of the capacity with
no additional constraints imposed on the model it is usual that the results do not
completely fit the preferences of the DM due to the small dataset from which the
capacity is determined. This issue can be solved by imposing additional constraints on
the Shapley values and on the interaction indices. However, for the SS method, there
was no solution compatible with the constraints (MV’), whereas for the MACBETH
method there was a compatible solution. In the case of the SS method, both the poor
fitting of the DM preferences in the MV approach and the inconsistency of the MV’
model appears to be related to the problem with the SS utility function determination
27
method. In the case of the MACBETH method, the fact that the preferences of the DM
regarding the Shapley values and the interaction indices in the first approach (MV) were
not completely satisfactory appeared to be more related to the limited learning data than
to a poor interpretation of the DM preferences, since there was a compatible solution
after imposing the constrains.
In summary, the problem in the determination of the utility functions with the SS lay in
the quantitative performances of the criteria. These performances were a mere
simulation. Real welfare measures, as proposed in Welfare Quality® (2009), may be
used in a further step of the project. The quantitative performances of WQ measures
vary, for instance, between 0 and 100 percentage animals with the presence of the
measure. In this scenario, it could be assumed that the utility functions determined using
the SS method would fit the DM preferences as well as the MACBETH method would.
However, we prefer the use of MACBETH to the use of the SS method for several
reasons: first, due to the available information on how to use this method to facilitate a
consensus between stakeholders (Parnell et al., 2013, Bana e Costa et al., 2014), which
may be one of the difficulties when a panel of different DMs is consulted to determine
the utility functions and the aggregation parameters in a further step of the project.
Second, due to the fact that this method makes it easier to judge the different
attractiveness of options with an increasing number of criteria, due to its interactive
software, and due to the use of qualitative judgments, and moreover, a scale of
indifferent categories (‘very weak’, ‘weak’, ‘moderate’, ‘strong’, ‘very strong’ or
‘extreme’), Bana e Costa et al. (2004). Third, the determination of the utilities process
remained more transparent with the MACBETH method and it is easier to explain to the
stakeholders due to its interactive software than the SS method. Fourth, MACBETH
allows for a comparison of not only qualitative performance levels but quantitative
performances too, with no need for a previous conversion of the quantitative scales into
a qualitative scale, allowing a solution to one of the problems presented by Botreau et
al. (2007b).
What the results of the MV and MV’ approaches corroborated is that using MAUT,
whose aggregation process is based on the WS, is not a valid method to develop an
overall assessment of animal welfare due to the fact that the criteria do not behave as
28
independent criteria, which is an assumption when using this aggregator (Vincke,
1992). The estimation of the different classification of the farms obtained if the DM
decided to use an additive value model (WSMV’) in spite of all its well-known
drawbacks showed that the main differences occurred in the number of farms classified
as unacceptable and enhanced. 112 of the 485 farms classified with the MV' as
unacceptable were classified as acceptable with the WSMV, and 299 farms of the 1,788
farms classified as acceptable with the MV’ approach were classified as enhanced with
the WSMV’. In other words, not taking the interaction between the criteria into account
led to a considerable decrease in the number of farms classified as unacceptable (from
17.3% of the farms to 14.5%) and acceptable (from 63.9% to 56.2%) and a noticeable
increase in the number of farms classified as enhanced and excellent (from 17% to
24.9% and from 1.9% to 4.4% respectively). Note that the percentage of farms in each
welfare category may vary if the thresholds established for each category are modified.
The large difference in the number of farms classified as unacceptable appeared to be
related to the limitation of compensation between bad and good grades. This revealed
the potential impact of not taking into account the interactions between the criteria to
produce an overall assessment of animal welfare in the context of certification schemes,
which might have been unnoticed had the differences between the aggregation methods
for a small subset of farms as the initial dataset been considered.
5 Conclusions
In summary, in the aggregation of animal welfare criteria it is of major importance to
choose an aggregation method which allows an interaction between the criteria to be
taken into account, such as the CI, and allows the limitation of these interactions when
the criteria are considered complementary by the DMs. Choosing a simpler aggregation
method, such as the WS, which allows compensation between the criteria would lead to
an important misclassification of farms in the context of certification schemes, as
demonstrated here. In this study, it was concluded that MACBETH method better
represented the preferences of the DM than the SS method. The interpretation of the
DM preferences through the utility functions was found crucial in the determination of
the CI aggregation coefficients. A utility function which does not reflect the preferences
29
of the DM adequately would lead to an incompatible solution when additional
constraints are imposed on the capacity determination model.
6 Acknowledgements
The present study is part of the PHENOMICS research project which is funded by the
German Federal Ministry of education and research.
7 References
Bana e Costa CA, de Corte JM and Vansnick JC 1999. The MACBETH approach:
Basic ideas, software, and an application. In Advances in Decision Analysis (eds
N Meskens and M Roubens), vol. 4, pp.131-157. Kluwer Academic Publishers,
Dordrecht, Netherlands.
Bana e Costa CA, de Corte JM and Vansnick JC 2004. On the mathematical
foundations of MACBETH. In MCDA, Multiple Criteria Decision Analysis (eds J
Figueira, S Greco and M Ehrgott), pp. 409 - 442. Kluwer Academic Publishers,
Dordrecht, Netherlands.
Bana e Costa CA, Lourenço JC, Oliveira MD and Bana e Costa JC 2014. A socio-
technical approach for group decision support in public strategic planning: The
Pernambuco PPA case. Group decision and negotiation 23, 5-29.
Blokhuis HJ, Veissier I, Miele M and Jones B 2010. The Welfare Quality® project and
beyond: Safeguarding farm animal well-being. Acta Agriculturae Scandinava,
Section A, Animal Science 60, 129-140.
Botreau R, Bonde M, Butterworth A, Perny P, Bracke MBM, Capdeville J and Veissier
I 2007a. Aggregation of measures to produce an overall assessment of animal
welfare. Part 1: A review of existing methods. Animal 1, 1179-1187.
Botreau R, Bracke MBM, Perny P, Butterworth A, Capdeville J, van Reenen CG and
Veissier I 2007b. Aggregation of measures to produce an overall assessment of
animal welfare. Part 2: Analysis of constraints. Animal 1, 1188-1197.
Botreau R, Butterworth A, Engel B, Frokman B, Jones B, Keeling L, Kjærnes U,
Manteca X, Miele M, Perny P, van Reenen CG and Veissier I 2009. An Overview
30
of the Development of the Welfare Quality® Assessment Systems. In Welfare
Quality Reports® no. 12 (eds L Keeling). Cardiff University, UK.
Botreau R, Capdeville J, Perny P and Veissier I 2008. Multi-criteria evaluation of
animal welfare at farm level: an application of MCDA methodologies.
Foundations of Computing and Decision Science 33, 1-18.
Bracke MBM, Spruijt BM and Metz JHM 1999. Overall animal welfare assessment
reviewed. Part 1: Is it possible? Journal of Agricultural Science 47, 279-291.
Bouyssou D, Marchant T, Perny P, Pirlot M, Tsoukias A and Vincke P 2000. Evaluation
and decision models: A critical perspective. Kluwer Academic Publishers,
Dordrecht, Netherlands.
Bouyssou D, Marchant T, Perny P, Pirlot M, Tsoukias A and Vincke P 2006. Evaluation
and decision models with multiple criteria: Stepping stones for the analyst.
Springer, New York, USA.
Fraser D 1995. Science, values and animal welfare: Exploring the ‘inextricable
connection’. Animal Welfare 4, 103-117.
Farm Animal Welfare Council 1992. FAWC updates the five freedoms. The Veterinary
Record 17, 357.
Grabisch M 1996. The application of fuzzy integrals in multi-criteria decision making.
European Journal of Operational Research 89, 445-456.
Grabisch M 1997. k-order additive discrete fuzzy measures and their representation.
Fuzzy Sets and Systems 92, 167-189.
Grabisch M, Kojadinovic I and Meyer M 2008. A review of capacity identification
methods for Choquet Integral based multi-attribute utility theory. Applications of
the Kappalab R package. European Journal of Operational Research 186, 766-
785.
Keeney LR and Raiffa H 1976. Decisions with multiple objectives: Preferences and
values tradeoffs. Wiley, New York, USA.
Krantz DH, Luce RD, Suppes P and Tversky A 1971. Foundations of measurement, vol.
1: Additive and polynomial representations. Academic Press, New York, USA.
Kojadinovic I 2007. Minimum variance capacity identification. European Journal of
Operational Research 177, 498-514.
Labreuche C and Grabisch M 2003. The Choquet integral for the aggregation of interval
scales in multi-criteria decision making. Fuzzy sets and Systems 137, 11-16.
31
Marichal JL 2002. An axiomatic approach of the discrete Choquet integral as a tool to
aggregate interacting criteria. IEEE Transaction on fuzzy systems, vol. 8, no 6.
Marichal JL and Roubens M 2000. Determination of weights of interacting criteria from
a reference set. European Journal of Operational Research, vol. 124, no 3, 641-
650.
Mayag B, Grabisch M and Labreuche C 2011. A characterization of the 2-additive
Choquet integral through cardinal information. Fuzzy sets and Systems 184, 84-
105.
Merad M, Dechy N, Serir L, Grabisch M and Marcel F 2013. Using a multi-criteria
decision aid methodology to implement sustainable development principles within
an organization. European Journal of Operational Research 224, 603-613.
Murofushi T and Sugeno M 1989. An interpretation of fuzzy measure and the Choquet
integral as an integral with respect to a fuzzy measure. Fuzzy sets and systems 29,
201-227.
Parnell GS, Brensik TA, Tani SN and Johnson ER 2013. Handbook of decision
analysis. John Wiley and sons, New York, USA.
Roy B 1971. Problems and methods with multiple objective functions. Mathematical
Programming 1, 239-266.
Saaty TL 1980. The Analytic Hierarchy Process: Planning, priority setting, resource
allocation. McGraw-Hill, New York, USA.
Sugeno M 1974. Theory of fuzzy integrals and its applications. PhD thesis, Tokyo
Institute of Technology. Tokyo, Japan.
Yager R 1988. On ordered weighted averaging operators in multi-criteria decision
making. IEEE Transactions on Systems, Man and Cybernetics 18, 183-190.
Vapnek J and Chapman M 2010. Legislative and regulatory options for animal welfare.
FAO Legislative study 104. FAO, Rome, Italy.
Vincke P 1992. Multi-criteria Decision-aid. Wiley, New York, USA.
von Winterfeldt D and Edwards W 1986. Decision analysis and behavioral research.
Cambridge University Press, Cambridge, UK.
Wakker PP 1989. Additive representations of preferences: A new foundation of
decision analysis. Kluwer Academic Publishers, Dordrecht, Netherlands.
Welfare Quality 2009. Welfare Quality® Assessment Protocol for Growing Pigs. Wefare
Quality® Consortium, Lelystad, Netherlands.
32
33
CHAPTER TWO
Development of a multi-criteria evaluation system to assess
growing pig welfare
P. Martín 1, I. Traulsen 1, C. Buxadé 2 and J. Krieter 1
1 Institute of Animal Breeding and Husbandry, Christian-Albrechts-University, Kiel, Germany
2 Animal Production Department, Polytechnic University, Madrid, Spain
34
Abstract
The aim of this paper was to present an alternative multi-criteria evaluation model to
assess animal welfare on farms based on the Welfare Quality® project, using an
example of welfare assessment of growing pigs. The WQ assessment protocol follows a
three-step aggregation process. Measures are aggregated into criteria, criteria into
principles, and principles into an overall assessment. This study focused on the first step
of the aggregation. Multi-attribute utility theory (MAUT) was used to produce a value
of welfare for each criterion. The utility functions and the aggregation function were
constructed in two separated steps. The MACBETH method was used for utility
function determination and the Choquet integral (CI) was used as an aggregation
operator. The WQ decision-makers’ preferences were fitted in order to construct the
utility functions and to determine the CI parameters. The methods were tested with
generated datasets for farms of growing pigs. Using the MAUT, similar results were
obtained to the ones obtained applying the WQ protocol aggregation methods. It can be
concluded that due to the use of an interactive approach such as MACBETH, this
alternative methodology is more transparent for stakeholders and more flexible than the
methodology proposed by WQ, which allows the possibility to modify the model
according, for instance, to new scientific knowledge.
Keywords: Growing pigs, Welfare Quality, multi-criteria evaluation.
35
1 Introduction
Concern about livestock living conditions has increased considerably in the last few
years. Also, consumers have been increasingly linking animal welfare indicators with
food safety and quality. These consumer preferences create economic incentives for
stakeholders to meet animal welfare standards, as established by legislation or voluntary
certification schemes (Vapnek and Chapman, 2010). Due to the lack of a standard
assessment of animal welfare, these standards vary from one certification scheme to
another. This statement was the origin of the EU Welfare Quality® project (WQ),
which aimed at proposing an overall assessment system to assess the welfare of cattle,
pigs and poultry (Botreau et al., 2008).
Animal welfare is a multi-dimensional concept, and its assessment should be based on a
variety of measures related to several aspects such as the absence of thirst, hunger,
discomfort, disease, pain, injuries and stress, and the presence of normal behavioural
expressions (Farm Animal Welfare Council (FAWC), 1992)). Due to this fact, a multi-
criteria evaluation model is required for the evaluation of an animal unit (farm,
slaughterhouse). These multi-criteria, decision-making approaches all share the need for
an aggregation operator. In this case, information at the measures level may be useful
for farm management purposes; however, labelling purposes require a certain level of
aggregation of the measures into overall scores. Considerable efforts continue to be
made in order to develop overall assessment systems for different farm animal species
(e.g. WQ project, Bristol Welfare Assurance Programme and Animal Welfare Indicators
project, AWIN). WQ developed animal welfare multi-criteria evaluation models for
different livestock species (Botreau et al., 2009). The inputs for the WQ animal welfare
multi-criteria evaluation model are on-farm welfare measures described in the WQ
assessment protocol (Welfare Quality, 2009). The WQ multi-criteria evaluation model
uses different aggregation methods (e.g., decision tree, weighted sum or Choquet
integral) to aggregate measures into an overall assessment (Botreau et al., 2008). There
are other ways of approaching the aggregation problem that differ from the ones used by
the WQ multi-criteria evaluation model, e.g., the multi-attribute utility theory (MAUT),
ELECTRE or the Analytic Hierarchy Process (AHP). In the MAUT, uni-dimensional
utility functions which corresponds to each criterion are aggregated into a single global
utility function combining the whole of the criteria (Keeney and Raiffa, 1976), whereas
36
by using ELECTRE (outranking procedure) only the preference relations on pairs of
alternatives are aggregated (Roy, 1971); whilst in the Analytic Hierarchy Process
‘children’ nodes of a common ‘parent’ are aggregated using pair-wise comparisons
(Saaty, 1980).
In the present study, we focused on the MAUT. A large number of methods have been
proposed to determine the utility functions in MAUT, for instance the standard
sequences method described by Bouyssou et al. (2000) and the MACBETH method,
described by Bana e Costa et al. (1999). Examples of aggregation functions in MAUT
are the weighted sum, the ordered weighted average (Yager, 1988) and the Choquet
integral (CI) (Murofushi and Sugeno, 1989). The most common aggregation tool still
used today is the weighted sum, with all its well-known drawbacks. Using this
aggregator, different importance can be attached to the criteria, but no interaction
between the criteria is taken into account. The distinguishing feature of a CI is that it is
able to represent a certain interaction, ranging from redundancy (negative interaction) to
synergy (positive interaction) (Grabish, 1996). In the framework of the MAUT, the
MACBETH method was used for utility function determination, and the CI as the
aggregation method.
The aim of this paper is to present an alternative multi-criteria evaluation model to
assess animal welfare on farms, within the WQ framework, employing, as an example, a
welfare assessment of growing pigs. The aim was to find a model which solved the
main difficulties described by Botreau et al. (2007b) that a multi-criteria aggregation
model for animal welfare faces, for instance, the problem that interactions may exist
between measures and that measures may have different importance for animal welfare,
but it remains more transparent and flexible than the model proposed in the WQ
protocol. In other words, we looked for a model which can be easily understood by the
stakeholders and which would allow the parameters to be changed according to new
scientific knowledge. The paper is organised as follows: Section 2 presents the general
methodology followed in the WQ protocol and the methodology we propose to
construct the multi-criteria evaluation model. Section 3 presents the construction of
criteria from the initial measures by means of examples. Finally, Section 4 discusses the
strengths and weaknesses of the model.
37
2 General methodology
2.1 Welfare Quality®
(WQ)
The WQ assessment protocol for growing pigs consists of 27 welfare measures, which
were aggregated following a three-step aggregation process (Welfare Quality, 2009). 27
welfare measures were thus combined into 12 criteria, these were aggregated into 4
principles, and these 4 principles were aggregated into an overall assessment. Different
types of operators were used in this aggregation process, such as decision trees,
weighted sums, conversion to ordinal scores, least squares spline fitting, and CI. To
parameterise the operators used for the aggregation of the welfare measures and criteria,
datasets were presented to expert panels of 13 animal scientists, who individually
ranked farms and gave an absolute score on a scale of 0-100 for each of the farms
presented in each of the datasets (Botreau et al., 2008). Partners of the WQ project and
members of the Management Committee and Advisory Committee (i.e. stakeholder
representatives), were consulted to agree upon parameters for the aggregation of
principles into an overall classification (Botreau et al., 2009).
2.1.1 First step of the aggregation process
In the first step, welfare measures were aggregated into the 12 corresponding criteria.
WQ used different types of aggregation of measures into criteria (Figure 1). For some
criteria, the numbers of moderate and severe problems were first combined with a
weighted sum, producing a measure index, on a scale from 0 (worst) to 100 (best).
Afterwards, these index values were converted into measure scores (expressed on the
same 0-100 scale), using spline functions (Ramsay, 1988) that were fitted by least-
square methods. Finally the CI was used to combine the scores for the different
measures into a score for the criterion (a in Figure 1). For some other criteria, the
measures where first transformed into an ordinal scale, which consisted of assigning
warning or alarms depending on the value of the measures. The number of warnings and
alarms were then combined into an index for the criterion, and afterwards this index was
converted into a criterion score using l-spline functions (b in Figure 1). Decision trees
were used to produce the criterion score (c in Figure 1) for other measures. Further
information on the development and employment of these operators can be found in
Botreau et al. (2008, 2009) and Veissier et al. (2011).
38
Figure 1. Outline of the three different methodologies followed in the Welfare Quality®
project to aggregate the measures into criteria (adapted from Welfare Quality, 2009).
2.1.2 Second step of the aggregation process
In the second step, a CI was used to aggregate the 12 criteria into four principles. This
integral uses weights to combine the different criterion scores into one principle score
(expressed on the 0-100 scale), while limiting the possibility that a poor score of a
criterion is compensated by other excellent scores (Botreau et al., 2007b; Veissier et al.,
2011).
Measure1
Measuren
Score1
Scoren
Criterion score
Measuren
Measure1
Criterion Index Criterion
Score
Ordinal measure1
Ordinal measuren
Measuren
Measure1
Criterion score
Previous calculations I-spline curve fitting Aggregation (Choquet integral)
Previous calculations Weighted sum I-spline curve fitting
Decision tree
a
c
b
39
2.1.3 Third step of the aggregation process
In the third and final step, the four principles were combined into one overall
assessment. The herds were classified in four different welfare categories:
‘unacceptable’, ‘acceptable’, ‘enhanced’, or ‘excellent’, based on reference profiles for
these four principles (Botreau et al., 2009). To be classified as ‘excellent’, a herd had to
score >55 for each principle and >80 for two principles; to be classified as ‘enhanced’,
each principle had to be >20 and at least two principles had to be >55; to be classified as
‘acceptable’, each principle had to be >10 and at least three principles had to be >2’.
Herds which did not comply with the minimum scores were classified as
‘unacceptable’, which means that at least one principle was ≤ 10 or at least two
principles were ≤ 20.
2.2 Multi-attribute utility theory (MAUT)
As presented before, the WQ assessment protocol follows a three-step aggregation
process. Measures are aggregated into criteria, criteria into principles, and principles
into an overall assessment (Welfare Quality, 2009). This study focused on the first step
of the aggregation to introduce an alternative methodology to the one proposed in the
WQ protocol by means of examples illustrated using growing pigs. MAUT was used to
produce a value of welfare for each criterion, the application of the MAUT consisted of
two separated steps, the utility functions determination and the aggregation function
determination. The MACBETH method was used for the determination of the utilities
and the CI was used as the aggregator method (Figure 2).
Figure 2. Outline of the alternative methodology proposed in this study to aggregate the
Welfare Quality® measures into criteria.
Measure1
Measuren
Utility1
Utilityn
Criterion utility
Previous calculations Utility function determination (MACBETH)
Aggregation (Choquet integral)
40
2.2.1 Utility function determination (MACBETH)
The utility function gives value to the measure in terms of welfare, it represents the
preferences of the decision-maker (DM) for the measures and their different values. For
example, 5% of lameness in a farm may be interpreted as a worse situation than 5% of
wounds on the body. There are different methods for utility function determination, we
chose MACBETH (Measuring Attractiveness by a Categorical Based Evaluation
Technique) for several reasons:
First, due to the available information on how to use this method to facilitate a
consensus between stakeholders (Parnell et al., 2013, Bana e Costa et al., 2014), which
may be one of the main difficulties which arise when a panel of different DMs is
consulted to determine the utility functions and the aggregation parameters in a further
stage of the project. Second, due to the fact that this method makes it easier to judge the
different attractiveness of options with an increasing number of criteria, due to the use
of qualitative judgments, and moreover, a scale of indifferent categories (‘very weak’,
‘weak’, ‘moderate’, ‘strong’, ‘very strong’ or ‘extreme’), Bana e Costa et al. (2004).
Third, the determination of the utilities process remains transparent due to the extensive
bibliography on it (Bana e Costa et al., 1999, 2004) and it is easier to explain to the
stakeholders due to the interactive software provided (M-MACBETH). Fourth,
MACBETH allows for a comparison of not only quantitative performance levels but
qualitative performances too, with no need for a previous conversion of the qualitative
scales into a quantitative scale, allowing a solution to one of the problems presented by
Botreau et al. (2007b).
MACBETH is a methodology which requires only qualitative judgements to quantify
the relative attractiveness (utilities) of options (farms). In order to elicit a marginal
utility function with MACBETH, the first step is to define whether the measure
performs as a quantitative measure or as a qualitative one and which are the
quantitative/qualitative performance levels of the measure. The next step is to fill in a
matrix, giving qualitative judgements regarding the difference of attractiveness between
the different quantitative performance levels of the measure. The qualitative judgements
can be rated as ‘very weak’, ‘weak’, ‘moderate’, ‘strong’, ‘very strong’ or ‘extreme’. As
each judgement was given, the matrix’s consistency was automatically verified with an
interactive algorithm based on linear programming (Mayag et al., 2010), and judgment
41
modifications were suggested which could be made to fix any detected inconsistency.
From the complete and consistent matrix of judgements, MACBETH creates a
numerical scale. With the numerical scale, MACBETH produces the marginal utility
function (u) for each measure. In order to be able to aggregate the different measures
into criteria, this method also allows the user to normalise the raw data expressed in
different scales into an absolute value scale, ranging, for example, from 0 and 100,
where 0 is the worst situation one can find on a farm and 100 the best situation.
After the initial calculation of the MACBETH scale, a check was performed to ensure
that it adequately represented the relative magnitude of the WQ DMs’ judgements, if
not, the scores were adjusted.
2.2.2 Aggregation with the Choquet integral
In a second step, the CI was used to aggregate the different measures into the
corresponding criteria. In order to combine the measures (individual utilities calculated
with MACBETH) into the corresponding criteria using the CI, the first step used is the
capacity identification. Capacities can be regarded as a weighting vector involved in the
calculation of weighted sums. Seen as an aggregation operator, the CI takes into account
the different importance of the measures and the interaction between them. These
interactions can be complementary (positive) or substitutive (negative). The number of
variables involved in the CI increases exponentially, along with the coefficients which
define a capacity. To keep things simple, it may be preferable to restrict to two-additive
solutions.
In this study, capacity identification, based on the least squares (LS) approach, was
implemented using the Kappalab R package following the method described by
Grabisch et al. (2008). In order to use the LS identification method, the utilities
calculated with MACBETH corresponding to the examples’ data, were used as subsets
against which the initial preferences of the WQ DMs are expressed.
The results of the aggregation of the examples’ data following the WQ protocol were
used as initial preferences in order to fit the model to the WQ DMs preferences.
With this methodology, a progressive interactive approach can be developed after an
initial calculation of the CI, where additional constraints on the Shapley values, which
measure the overall importance of a measure (criterion), and the interaction indices can
be imposed in order to fit more precisely the WQ DMs preferences.
42
According to Mayag et al. (2011) given (x1, x2, …, xn) the individual utilities for the
different measures, the CI with respect to a two-additive capacity can be written as
follows:
Where vi represents the importance of the measure i and corresponds to the Shapley
value of µ (capacity) and Iij represents the interaction between measures i and j.
3 Examples of the aggregation of measures into criteria
In order to illustrate the methodology proposed for the construction of the criteria, three
examples are given: absence of injuries, absence of disease and absence of pain induced
by management procedures. The WQ protocol distinguishes three types of aggregation
of measures into criteria. Each one of these three criteria are calculated in a different
way in the WQ protocol (Figure 1), whereas this study proposes a unique methodology
for all the criteria (Figure 2).
3.1 Example 1: Criterion ‘Absence of injuries’
Absence of injuries is assessed by three measures: lameness, wounds on the body and
tail-biting. The measures which form this criterion have in common that they are
recorded at individual level. This scale generally represents the severity of the problem
and the range of animals surveyed can be easily calculated (e.g. percentage of animals
walking normally, percentage of moderately lame animals, and percentage of severely
lame animals).
3.1.1 Welfare Quality®
Briefly, the WQ protocol, first produced an ‘Index’ ( ) by combining the percentage
of animals in each severity category, particularly for lameness and wounds on the body.
It consists of a weighted sum, where n can be substituted by lameness or wounds in the
body. For instance, for lameness (l):
43
For example, a farm with a 10% moderately lamed animals (lameness1) and a 1%
severely lamed animals (lameness2) will achieve an Index for lameness ( ) of 95.
Afterwards this ‘Index’ is restored into a non-linear function (l-spline function)
producing a ‘Score’ ( . For instance, for lameness:
When ≤ 85 then:
When ≥ 85 then:
For example, the farm presented before which was assigned with =95 will achieve
a Score for lameness (Sl) of 51.35.
Figure 3 shows an example of the WQ I-spline function for lameness.
Figure 3. Scores for lameness according to the Index calculated for the % of lame pigs.
For tail biting the I-spline function is calculated directly. The mere absence or presence
of it is recorded, and thus there is no need for a weighted sum to combine the scores
regarding the severity of the problem.
44
To produce the criterion score, the partial scores previously obtained with the I-spline
functions are combined with the CI (Welfare Quality, 2009).
3.1.2 MAUT
Before determining the utility functions of lameness and wounds on the body, we
produced an Index as was carried out in the WQ protocol, in order to combine the
percentage of animals with a moderate problem and the percentage of animals with a
severe problem ( , -where n can be lameness or wounds on the body. We
implemented the same weights as those used in the WQ protocol. For instance, for
lameness:
For example, a farm with a 10% moderately lamed animals and a 1% severely lamed
animals will achieve an Index for lameness ( ) of 5.
The utility function for the percentage of animals with tail biting was calculated
directly.
-Utility function determination (MACBETH)
The measures which form this criteria were defined as quantitative measures in
MACBETH. The quantitative levels of these measures were defined according to the
WQ protocol. Figure 3 shows how the scores assigned by the WQ DMs corresponding
to the percentage of lame animals decreased rapidly for the 100 to 85 range – reflecting
0 to 15 % lame animals respectively – the rate gradually slowing down after this point.
Performance levels which vary in one unit between 0 and 15% animals with lameness
were established, and when the slope of the l-spline function became homogeneous,
intervals of 10 units were established, as can be seen in Figure 4 with an example of the
utility function for lameness calculated with MACBETH.
45
Figure 4. Utility function for lameness calculated with MACBETH
For example, the farm presented before which was assigned with =5 will achieve
a utility for lameness of 51.35.
-Aggregation with the Choquet integral
Ten farms were used as learning data to determine the CI aggregation parameters (Data
in Table 1). The utilities calculated with MACBETH for these ten farms were used as a
subset to express the WQ DMs preferences (Utilities in Table 1). The results of the
aggregation of the ten farms’ data following the WQ protocol were used as the WQ
DMs’ initial preferences in order to identify the capacity using the LS-based approach
(WQ overall scores in Table 1).
46
Table 1. Absence of injuries measures data for selected farms. Measures’ values, individual utilities and overall utilities for each selected farm.
Farm Measures data
(criteria)
(criteria) L1 L2 W1 W2 BT2 L W BT L W BT
a 0 0 0 0 0 0 0 0 100 100 100 99 100
b 1 0.31 4 1 1 0.72 3.67 1 90.37 90.16 93.84 89.99 90.30
c 5.10 0.03 7.46 2 3 2.08 6.97 3 75.04 81.95 82.46 75.01 75.04
d 4.67 1 12.71 5 5 2.87 13.47 5 67.59 67.64 72.41 67.49 67.59
e 1 3.77 25.37 1 7 4.17 17.91 7 57.08 59.29 63.61 57 57.08
f 5.53 3 5 5.6 10 5.21 8.98 10 50.07 77.21 52.52 50.01 50.07
g 1.92 7 36 10 19 7.77 33.33 19 37.69 37.7 31.58 33.81 33.70
h 40.37 1 37.31 5 33 17.15 29.87 33 24.57 41.52 19.57 21.17 21.31
i 33.25 30 89.55 20 55 43.3 79.70 55 10.12 14.40 10.49 10 10.12
j 0 100 0 100 100 100 100 100 0 0 0 0 0
¹Percentage of animals affected with lameness (L) /wounds on the body (W) scored 1
²Percentage of animals affected with lameness/wounds on the body/bitten tails (BT) scored 2
47
Comparing the results for lameness, wounds on the body and tail-biting obtained with
the WQ method and the MAUT (Table 1), the overall utilities from different farms –
calculated with MACBETH – fit the scores (at criteria level) obtained with the WQ I-
spline functions. The Shapley values for each measure are shown in Table 2. As we can
see, lameness was considered more important than tail-biting, which was in turn
considered more important than wounds on the body. Furthermore, Table 2 shows that
all the interaction between the measures were positive, thus, the measures were defined
as complementary, in accordance with the WQ protocol.
Table 2. Shapley value and interaction indices to aggregate the measures’ utilities into
the criteria with the Choquet integral.
Shapley value Interaction indices
Lameness Wounds Tail biting
Lameness 0.500 - 0.347 0.652
Wounds on the
body 0.174 0.347
- 0.000
Tail biting 0.326 0.652 0.000 -
3.2 Example 2: Criterion ‘Absence of disease’
Absence of disease is assessed by 13 measures. The measures used to check this
criterion lead to data expressed on different scales.
3.2.1 Welfare Quality®
Due to the different nature of the measures (for instance, mortality is recorded as the
percentage of mortality on farm during the last 12 months, whilst coughing and
sneezing are assessed as the average frequency of coughs/sneezes per animal over 5
minutes), WQ decided to compare the data to alarm thresholds which represent the limit
between what is considered abnormal and what is considered normal. When the
incidence observed on a measure reaches approximately half the alarm threshold, a
warning is attributed. The measures are grouped into six areas: mortality, respiratory,
digestive, liver, skin and hernias. The severity of the problem is estimated per area: if
within an area the frequency of one symptom is above the warning threshold and the
48
other is below, a warning is attributed to the area. On the other hand, if within an area
the frequency of one symptom is above the alarm threshold, the alarm is attributed to
the area; if neither occurs, no problem is recorded. The number of alarms and warnings
detected on a farm are calculated and used to calculate an ‘Index’ for the absence of
disease criteria (Iad) with a weighted sum.
For instance, a farm with a warning in 2 areas and an alarm in another will achieve an
index for absence of disease ( ) of 63.3.
Finally the ‘Index’ is transformed into a score using I-spline functions.
When ≤ 10 then:
When ≥ 10 then:
For instance, the farm presented before which was assigned with an =63.33 will
achieve a score of 48.42.
3.2.2 MAUT
The measures employed to check this criterion were transformed in a first step, into an
ordinal scale, before determining the utility functions. The data was compared to the
warning and alarm thresholds defined in the WQ protocol. The measures were grouped
into the six areas defined in the WQ protocol. The area was attributed with a warning or
an alarm when one of its measures was above the warning or the alarm threshold.
-Utility function determination (MACBETH)
The utility function was calculated per area. We defined the six disease areas as
qualitative measures where the performance levels could be recorded using the terms
‘no problem’, a ‘warning’ attributed to the area and an ‘alarm’ attributed to the area. In
49
MACBETH, when the area was attributed a warning, a utility of 40 was assigned to it.
When the area was assigned with an alarm, a utility of 0 was assigned, and when there
was no problem recorded the utility assigned to the area was 100 (Figure 5).
Figure 5. Utilities assigned to the performance levels of the absence of disease areas
For instance, the farm presented before, will achieve a utility of 0 in the area which was
assigned with an alarm, a utility of 40 for both areas which were assigned with a
warning, and utilities of 100 for the rest of areas.
-Aggregation with the Choquet integral
Again, ten farms were used as learning data to determine the CI aggregation parameters
(Data in Table 3; the data were highlighted in grey or dark grey when they were above
the corresponding WQ warning or alarm thresholds respectively). The utilities obtained
for the ten farms with MACBETH were used as subsets to express the WQ DMs’
preferences (Utilities in Table 3). The results of the aggregation of the ten farms
following the WQ protocol (WQ overall scores) were used as initial preferences in order
to use the least squares-based approach for capacity identification (WQ in Table 3).
50
Table 3. Absence of disease Measures’ values for each selected farm. Measures’ values, individual utilities and overall utilities for each selected farm.
Farm
Measures Data
(criteria)
(criteria) Mortality Respiratory
condition
Digestive
condition
Parasites Skin
condition
Hernias
M1 C2 S2 LB3 TS3 RP3 LF4 P SC5 H5 H6
a 0.3 5 2 0.2 0.1 0.1 2 0 0.4 0.5 0.1 99.99 100.00
b 0.7 12 5 0.3 0.2 0.8 3 0 1 1 0.3 83.97 83.80
c 1 14 24 1.4 1 0.6 20 0 3 2.3 0.3 74.13 73.00
d 1.3 16 10 0.5 0.3 0.3 6 0 1.3 1.5 0.5 69.46 69.46
e 1.8 20 16 1 0.7 0.5 10 0 2.4 2 0.8 56.38 58.30
f 2 6 24 1.4 1 0.7 12 0 9 2.4 0.9 48.42 48.42
g 3 30 38 1.8 1.3 1 10 0 3.6 3 1 34.23 41.81
h 2.6 33 42 2 1.6 1.2 16 0 4 3.2 1.1 27.94 31.00
i 3 37 44 6.1 2 1.5 17 0 4.3 7 1.2 16.88 14.00
j 5.3 50 46 3 2.4 1.7 18 0 9.7 3.8 1.7 7.67 3.01
¹Percentage of mortality (M) on farm during the last 12 months. ² Average frequency of cough(C)/sneezes (S) per animal during 5 minutes. 3Percentage of pigs with evidence of laboured breathing (LB)/twisted snouts (TS)/rectal prolapse (RP) 4Percentage of pigs in herd with liquid faeces (LF) 5Percentage of pigs scored as 2 in skin condition (SC)/ hernias (H) 6Percentage of pigs scored as 1 in hernias(H) Data over the warning threshold; Data over the alarm threshold
51
For instance, we can notice that the farm presented before corresponding to Farm F is
assigned an overall utility of 48.42 after the aggregation of the individual utilities for
each area with the CI.
We found that the initial Shapley values resulted from aggregating the utilities with the
CI varied between each area slightly, and in the WQ protocol all the areas were consider
equally important. After imposing additional constraints on the Shapley values, the
importance attached to all the areas was the same. Regardless, the overall utility
remained equal. The interaction indices (Table 4) varied from the initial calculation of
the CI and the second constrained calculation, but in both cases all the areas performed
as complementary measures.
Table 4. Shapley value and interaction indices to aggregate the measures’ utilities into
the criteria with the Choquet integral.
Mortality Respiratory Digestive Liver Skin Hernias
Mortality 0.165 - 0.024 0.046 0.029 0.018 0.024
Respirato
ry 0.167 0.024
- 0.017 0.055 0.046 0.035
Digestive 0.168 0.046 0.017 - 0.077 0.037 0.025
Liver 0.163 0.029 0.055 0.077 - 0.056 0.049
Skin 0.166 0.018 0.046 0.037 0.056 - 0.021
Hernias 0.168 0.0214 0.035 0.025 0.049 0.021 -
3.3 Example 3: Criterion ‘Absence of pain induced by management
procedures’
Absence of pain induced by management procedures is assessed by two qualitative
measures: castration and tail docking. These measures are taken at farm level. The
farms are classified according to the presence or absence of these mutilation procedures,
and if so, the use or not of anaesthetics.
3.3.1 Welfare Quality®
WQ used a lexicographic valuation tree for these types of measures (Figure 6).
52
Figure 6. Tree created in the MACBETH decision support system for the criteria
Absence of pain induced by management procedures
For instance, a farm on which pigs were castrated using anaesthetics and tail docking
was performed without anaesthetics will achieve an index of 35 for the absence of pain
induced by management procedures.
3.3.2 MAUT
-Utility function determination (MACBETH)
Castration and tail docking were defined in MACBETH as qualitative measures in this
study. Following the WQ protocol, their performance levels were established as no
castration/no tail docking, castration/tail docking with anaesthetics and castration/tail
docking without anaesthetics. Figure 7 shows the MACBETH scales for each measure.
Figure 7. Utilities assigned to the performance levels of the Absence of pain induced by
management procedures
53
For instance, the farm we presented before will achieve a utility of 60 for castration and
a utility of 0 for tail docking.
-Aggregation with the Choquet integral
Nine farms were used as learning data to determine the CI aggregation parameters (Data
in Table 5). The utilities calculated with MACBETH corresponding to these farms were
used as subsets employed to express the WQ DMs preferences (Utilities in Table 5). To
enable the use of the LS-based approach for capacity identification, results from
aggregating the 9 farms data following the WQ protocol were used as WQ DMs’ initial
preferences (WQ overall scores in Table 5).Considering that WQ DMs were satisfied,
we decided not to impose any additional constraint when aggregating the absence of
injuries criterion. Table 5 demonstrates how the utilities concerning castration and tail
docking obtained from the 9 possible farm situations, were adjusted as much as possible
to the WQ scores, for this given criterion. When adjusting the utilities to the WQ DMs’
preferences, the CI parameters obtained indicated that tail docking was considered more
important than castration corresponding to its Shapley values of 0.539 and 0.461. We
also learnt that both measures were performing in a complementary way, with an
interaction index of 0.109.
For instance, we can notice that the farm presented before (Farm F) is assigned an
overall utility of 24.37 after the aggregation of the individual utilities for castration and
tail docking with the CI.
.
54
Table 5. Absence of pain induced by management procedures. Measures’ values, individual utilities and overall utilities for each selected farm.
Farm Measures data Utilities (criteria) (criteria)
Castration Tail docking Castration Tail Docking
a No No 100 100 100 100
b No Yes (with anaesthetics) 100 45 60 67.34
c No Yes (without anaesthetics) 100 0 38 40.62
d Yes (with anaesthetics) No 60 100 77 79.36
e Yes (with anaesthetics) Yes (with anaesthetics) 60 45 53 51.09
f Yes (with anaesthetics) Yes (without anaesthetics) 60 0 35 24.37
g Yes (without anaesthetics) No 0 100 47 48.40
h Yes (without anaesthetics) Yes (with anaesthetics) 0 45 27 21.78
i Yes (without anaesthetics) Yes (without anaesthetics) 0 0 8 0
55
4 Discussion and conclusions
4.1 General methodology
By using the MAUT, it has been proven that the main difficulties described by Botreau
et al. (2007b) faced by a multi-criteria aggregation model are solved by allowing this
method to assign different importance to the measures, by limiting the compensation
between them and by working with data collected on different types of scales.
Furthermore, the model’s flexibility allowed us to fit the WQ assessment, obtaining
results that were comparable to the ones obtained by implementing the WQ protocol.
Compared to the I-spline functions used in the WQ protocol to interpret the measures in
terms of welfare, the use of MACBETH presented several advantages:
First, by using MACBETH the assessment remained more transparent, which could help
to explain to the stakeholders the results and to identify the causes of poor welfare while
encouraging them to take efficient remedial measures which would affect the results.
On the other hand, the assessment remains more flexible. With this method all the
parameters can be changed according to new scientific knowledge (inclusion or
exclusion of measures based on new studies on their influence in animal welfare), due
to changes in societal expectations (if the welfare of animals improves significantly on
all farms, stakeholders may want to be more selective when considering a farm as
excellent), etc. The main drawback from using MACBETH was related to the the M-
MACBETH software implementation, as it does not allow the possibility of exporting
the utility functions formulae to other environments, while typing the information into
the software can be indeed extremely tedious when working with large amounts of
data.
With regard to other methods proposed for the overall evaluation of animal welfare,
such as sum of ranks and sum of scores (Botreau et al., 2007a), the use of the CI as an
aggregator presented an important advantage since it allowed interaction between
measures to be taken into account, thus allowing the possibility to limit the interaction
between them, and in this way, solving one of the main problems described by Botreau
et al. (2007b). The CI was also used in the WQ protocol for the aggregation of some
measures into criteria and for the aggregation of criteria into principles (Welfare
Quality, 2009).
The main difficulty in implementing the least squares-based approach for CI capacity
identification is that it depends on information which the DM cannot always provide, as
56
are the overall scores for each criteria (Grabisch et al., 2008). Due to the fitting of our
results in accordance with the WQ DMs’ preferences, the results obtained from the WQ
model were used as initial preferences, thus avoiding this issue. However, following the
study of Merad et al. (2013), in other circumstances, it may be difficult for the DMs to
provide overall scores. Nevertheless, there are easier methods for capacity identification
proposed in the literature, such as the minimum variance approach, which requires only
a partial order over the farms as preference information. See Grabisch et al. (2008) for a
review of different methods for capacity identification.
4.2 Examples
In order to apply this methodology to the particular case of an Animal Welfare
assessment we have found some key points to take into account:
4.2.1 Absence of injuries
Defining the performance levels in MACBETH which the DM will have to react to is
extremly important in these sorts of measures. Although theoretically, these measures
can vary between 0 and 100 %, in real conditions the values of the measures usually
vary in a lower interval. For instance, Temple et al. (2011) found values which varied
between 0 and 5.8% animals affected with wounds on the body, between 0 and 8.1% for
tail-biting and between 0 and 1.8% for severe lameness. Thus, it will be in the lower
intervals of the measures in which the utility functions will have to be better fit to the
DMs preferences. For instance, for lameness (Figure 4), we established that its
performance levels varied in intervals of one unit between 0 and 15 % lame animals.
After this point ,we established intervals of ten units. In this way, we were able to fit
more precisely the preferences of the DM in the lower interval of the measure.
The use of linear combinations (weighted sum) is also a key feature which can be
reviewed and modifed in further stages of the study, employed to combine measures
which are defined in two severity categories: lameness and wounds on the body in this
study. By using a linear combination we assume that the measures can compensate each
other, and thus, by using the WQ weights, a farm which has for example 0% moderately
lame animals and 10% severely lame animals will be regarded, in terms of welfare, as a
farm with 10% moderately lame animals and 6% of severely lame animals.
57
Although it was emphasised throughout the development of the WQ model that welfare
scores should not compensate each other (Botreau et al., 2007b and Veissier et al.,
2011), compensation occurred in the first stages by using linear combinations.
Providing an individual utility function for each severity measure and afterwards
aggregate them by using the CI could prove to be an alternative solution. On one hand,
the model accuracy would increase, but on the other hand, so would the complexity of
the decision process, demanding from the DMs that they interpret a higher number of
measures in terms of welfare.
4.2.2 Absence of disease
In order to simulate the WQ DMs’ preferences, we compared the data for the absence of
disease measures with the warning and alarm thresholds established in the protocol.
However, in the development of the methodology we show that by converting the
original, quantitative data into an ordinal scale (3 qualitative levels: no problem
recorded, a warning or an alarm), it was impossible for the model to distinguish
between herds which slightly or greatly exceeded the thresholds. Further, conversion
into an ordinal scale might be reconsidered, and the measures should be treated as
quantitative ones, using the warning and alarm thresholds as references for the DM to
build the utility functions.
To stay in line with the WQ protocol preferences, we decided to create a utility function
per area rather than calculate a utility per measure. Following this method a large
compensation between disease areas measures’ is allowed. For instance, looking in
Table 3 at the warnings and alarms attributed to the measures gathered in the respiratory
area, a warning is both attributed to the respiratory area on a farm which only has one of
the measures classified with a warning (Farm E), as well as a farm which has the fourth
measure classified with a warning (Farm G). The compensation of measures between
disease areas is a crucial point which must be further studied.
4.2.3 Absence of pain induced by management procedures
A decision tree was used for these types of measures in the WQ protocol. By employing
this method, the two measures were considered together, and a score for each one of the
possible scenarios is given directly by the DMs. This methodology can be considered as
a direct rating. Although our methodology provided us with similar results, according to
58
Bouyssou et al., (2006) it can be concluded that the use of a direct rating method (for
example by using decision trees) makes the methodology less intuitive as opposed to
considering each measure separately and using an aggregation method based on an
intuitive process, which can be easily revised.
5 Acknowledgements
The present study is part of the PHENOMICS research project which is funded by the
German Federal Ministry of education and research.
6 References
Bana e Costa CA, de Corte JM and Vansnick JC 1999. The MACBETH approach:
Basic ideas, software, and an application, in: Meskens, N., Roubens, M., (Eds.),
Advances in Decision Analysis. Kluwer Academic Publishers, Book Series:
Mathematical Modelling: Theory and Applications, vol. 4, pp.131-157.
Bana e Costa CA, de Corte JM and Vansnick JC 2004. On the mathematical
foundations of MACBETH. In MCDA, Multiple Criteria Decision Analysis (eds J
Figueira, S Greco and M Ehrgott), pp. 409 - 442. Kluwer Academic Publishers,
Dordrecht, Netherlands.
Bana e Costa CA, Lourenço JC, Oliveira MD and Bana e Costa JC 2014. A socio-
technical approach for group decision support in public strategic planning: The
Pernambuco PPA case. Group decision and negotiation 23, 5-29.
Botreau R, Bonde M, Butterworth A, Perny P, Bracke MBM, Capdeville J and Veissier
I 2007a. Aggregation of measures to produce an overall assessment of animal
welfare. Part 1: A review of existing methods. Animal 1, 1179-1187.
Botreau R, Bracke MBM, Perny P, Butterworth A, Capdeville J, van Reenen CG and
Veissier I 2007b. Aggregation of measures to produce an overall assessment of
animal welfare. Part 2: Analysis of constraints. Animal 1, 1188-1197.
Botreau R, Capdeville J, Perny P and Veissier I 2008. Multi-criteria evaluation of
animal welfare at farm level: an application of MCDA methodologies.
Foundations of Computing and Decision Science 33, 1-18.
Botreau R, Veissier I and Perny P 2009. Overall assessment of animal welfare: Strategy
adopted in Welfare Quality. Animal Welfare 18, 363-370.
59
Bouyssou D, Marchant T, Perny P, Pirlot M, Tsoukias A and Vincke P 2000. Evaluation
and decision models: A critical perspective. Kluwer, Dordrecht.
Bouyssou D, Marchant T, Perny P, Pirlot M, Tsoukias A and Vincke P 2006. Evaluation
and decision models with multiple criteria: Stepping stones for the analyst.
Springer, New York, USA.
Farm Animal Welfare Council 1992. FAWC updates the five freedoms. The Veterinary
Record 17, 357.
Grabisch M 1996. The application of fuzzy integrals in multi-criteria decision making.
European Journal of Operational Research 89, 445-456.
Grabisch M, Kojadinovic I and Meyer M, 2008. A review of capacity identification
methods for Choquet Integral based multi-attribute utility theory, Applications of
the Kappalab R package. European Journal of Operational Research 186, 766-
785.
Keeney LR and Raiffa H 1976. Decisions with multiple objectives: Preferences and
values tradeoffs. Wiley, New York.
Mayag B, Grabisch M and Labreuche C 2010. An interactive algorithm to deal with
inconsistencies in the representation of cardinal information, in: Hüllermeier E,
Kruse R and Hoffmann F (Eds), Information processing and management of
uncertainty in knowledge-based systems. Theory and Methods. Springer, Book
Series: Communication in computer and information science, vol.80, pp. 148-157.
Mayag B, Grabisch M and Labreuche C 2011. A characterization of the 2-additive
Choquet integral through cardinal information. Fuzzy sets and Systems 184, 84-
105.
Merad M, Dechy N, Serir L, Grabisch M and Marcel F 2013. Using a multi-criteria
decision aid methodology to implement sustainable development principles within
an organization. European Journal of Operational Research 224, 603-613.
Murofushi T and Sugeno M 1989. An interpretation of fuzzy measure and the Choquet
integral as an integral with respect to a fuzzy measure. Fuzzy sets and systems 29,
201-227.
Parnell GS, Brensik TA, Tani SN and Johnson ER 2013. Handbook of decision
analysis. New York: John Wiley and sons.
Ramsay JO 1988. Monotone regression splines in action. Statistical Science 3, 425-442.
60
Roy B 1971. Problems and methods with multiple objective functions. Mathematical
Programming 1, 239-266.
Saaty TL 1980. The Analytic Hierarchy Process: Planning, priority setting, resource
allocation. McGraw-Hill, New York.
Temple D, Dalmau A, Ruiz de la Torre JL, Manteca X, Velarde A 2011. Application of
the Welfare Quality® protocol to assess growing pigs kept under intensive
conditions in Spain. Journal of Veterinary Behaviour 6, 138-149.
Yager R 1988. On ordered weighted averaging operators in multi-criteria decision
making. IEEE Transactions on Systems, Man and Cybernetics 18, 183-190.
Vapnek, J and Chapman M 2010. Legislative and regulatory options for animal welfare.
FAO Legislative study 104, FAO, Rome.
Veissier, I., K. K. Jensen, R. Botreau, and P. Sandoe. 2011. Highlighting ethical
decisions underlying the scoring of animal welfare in the Welfare Quality scheme.
Animal Welfare 20, 89–101.
Welfare Quality 2009. Welfare Quality® Assessment Protocol for Growing Pigs.
Lelystad: Wefare Quality® Consortium.
Winckler C 2013. Progress in, the present state of, and challenges for on-farm animal
welfare assessments in Europe. UFAW International Animal Welfare Science
Symposium, 4-5 July 2013. Universitat Autónoma de Barcelona, Spain.
61
CHAPTER THREE
Validation of a multi-criteria evaluation model for animal
welfare
P. Martín 1, I. Czycholl 1, C. Buxadé 2 and J. Krieter 1
1 Institute of Animal Breeding and Husbandry, Christian-Albrechts-University, Kiel, Germany
2 Animal Production Department, Polytechnic University, Madrid, Spain
62
Abstract
The aim of this paper was to validate an alternative multi-criteria evaluation system to
assess animal welfare on farms based on the Welfare Quality® (WQ) project, using an
example of welfare assessment of growing pigs. This alternative methodology aimed to
be more transparent for stakeholders and more flexible than the methodology proposed
by WQ. The WQ assessment protocol for growing pigs was implemented to collect data
in different farms in Schleswig-Holstein, Germany. In total, 44 observations were
carried out. The aggregation system proposed in the WQ protocol follows a three-step
aggregation process. Measures are aggregated into criteria, criteria into principles, and
principles into an overall assessment. This study focused on the first two steps of the
aggregation. Multi-attribute utility theory (MAUT) was used to produce a value of
welfare for each criterion and principle. The utility functions and the aggregation
function were constructed in two separated steps. The MACBETH method was used for
utility function determination and the Choquet integral (CI) was used as an aggregation
operator. The WQ decision-makers’ preferences were fitted in order to construct the
utility functions and to determine the CI parameters. The validation of the MAUT
model was divided into two steps, first the results of the model were compared with the
results of the WQ project at criteria and principle level, and second, a sensitivity
analysis of our model was carried out to demonstrate the relative importance of welfare
measures in the different steps of the multi-criteria aggregation process. Using the
MAUT, similar results were obtained to those obtained when applying the WQ protocol
aggregation methods, both at criteria and principle level. Thus, this model could be
implemented to produce an overall assessment of animal welfare in the context of the
WQ protocol for growing pigs. Furthermore, this methodology could also be used as a
framework in order to produce an overall assessment of welfare for other livestock
species. Two main findings are obtained from the sensitivity analysis, first, a limited
number of measures had a strong influence on improving or worsening the level of
welfare at criteria level and second, the MAUT model was not very sensitive to an
improvement in or a worsening of single welfare measures at principle level. The use of
weighted sums and the conversion of disease measures into ordinal scores should be
reconsidered.
Keywords: Growing pigs, Welfare Quality, multi-criteria assessment, sensitivity
analysis
63
1 Introduction
Animal welfare is a multi-dimensional concept, and its assessment should be based on a
variety of measures related to several aspects such as the absence of thirst, hunger,
discomfort, disease, pain, injuries and stress, and the presence of normal behavioural
expressions (Farm Animal Welfare Council (FAWC), 1992)). Due to this fact, a multi-
criteria evaluation model is required for the evaluation of an animal unit (farm,
slaughterhouse). In animal welfare, as well as in other areas, the development of a
multi-criteria evaluation system requires considerable efforts due to its complexity. The
complexity of this kind of model lies in the high number of measures involved, the
varied nature of these measures (qualitative, quantitative, measures recorded in different
scales, precision of the measures, different ranges of variation, etc.), the different
importance of the measures, the interaction between them, and last but not least the
number of stakeholder groups involved, which makes it difficult to arrive at decisions
which accommodate stakeholders’ wants and needs (Botreau et al., 2007).
Welfare Quality® (WQ) developed multi-criteria animal welfare evaluation models for
different livestock species (Botreau et al., 2009). The inputs for the WQ multi-criteria
animal welfare evaluation model are on-farm welfare measures described in the WQ
assessment protocol (Welfare Quality, 2009). The WQ multi-criteria evaluation model
uses different aggregation methods (e.g., decision tree, weighted sum or Choquet
integral) to aggregate measures into an overall assessment (Botreau et al., 2008).
Usually, it is in the development of the model where the greatest efforts are made and
less attention is paid to the credibility of the model. However, validation is a crucial
point in order to build sufficient confidence in the model for it to be used for practical
purposes. Model validation can be divided into three components – verification,
validation and sensitivity analysis – according to Qureshi et al. (1999) and Harrison
(1991). Verification refers to building the model correctly (O’Keefe at al., 1991). It
ensures that the model has been developed in a formally correct manner in accordance
with a specified methodology (Geissman and Schultz, 1991). In the case of a
mathematical model implemented by computer programme, verification establishes that
the program has been written correctly and that it behaves as intended. Validation refers
to building the correct model (O’Keefe et al., 1991). Most attempts at model validation
check agreement between the model and real system outputs or between the model and
expert opinions (Qureshi et al., 1999). Sensitivity analysis examines the extent of
64
variation in predicted performances when parameters are varied over some range of
interest. Sensitivity analysis provides information on the priority areas for refinement if
further versions of the model are to be developed (Qureshi et al., 1999).
The WQ multi-criteria evaluation model was tested on commercial European farms
during the WQ project and partly adjusted according to these results. Also,
classification of some of these farms was compared with the general impression of
observers who carried out audits of the farms (Botreau et al., 2009). Since publication
of the protocols, different studies on the validation of the measures used in the protocol
have been carried out (Temple et al., 2011a, b, 2012a, b, 2013), assessing whether the
measures included in the protocol are sensitive enough to distinguish between different
types of housing systems, and between farms. However, there are few studies which
have assessed whether the model is sensitive at criteria, principle or overall assessment
level, and whether it can distinguish between different farms (de Vries et al., 2013).
The aim of this paper was to validate an alternative multi-criteria evaluation model to
assess animal welfare on farms, within the WQ framework, employing, as an example, a
growing pigs’ welfare assessment. The objective was to compare the results obtained by
implementing our approach with the results obtained by using the approach proposed in
the WQ protocol, as well as assessing its sensitivity to distinguish between commercial
growing pigs’ farms and to demonstrate the relative importance of welfare measures in
the different steps of the multi-criteria aggregation process.
2 Material and methods
2.1 Data
Data collection took place between January 2013 and January 2014 on 8 German
growing pig farms in Schleswig Holstein. All the farms were assessed by the same
observer, who was trained to use the WQ assessment protocol for growing pigs
(Welfare Quality, 2009) by members of the WQ project group. The pigs on the farms
were housed either conventionally or according to the guidelines of the German animal
welfare label “Tierwohllabel” of the German animal welfare organisation “Deutscher
Tierschutzbund e.V.” (Tierschutzbund, 2013). Each farm was visited six times at two
consecutive growing periods. Thereby, during each of the two growing periods, three
assessments took place: the first protocol assessment two weeks after entry into the
growing stable at an average weight of the pigs of 40 kg (Farm Visit 1), the second in
65
the middle of the growing period at an average weight of 75 kg (Farm Visit 2) and the
third assessment two weeks before beginning of sales to the slaughterhouse at an
average weight of 100 kg (Farm Visit 3). Changes in management occurred on one of
the farms and due to this fact this farm was assessed only two times. In total, the
protocol was run 44 times. The entire WQ protocol for growing pigs was carried out at
each farm visit. Data were collected at pig and herd level, depending on the type of
measurement. After data collection, data were expressed as welfare measures at the herd
level. These welfare measures could be either quantitative or qualitative and were
expressed on different scales depending on the measure (e.g., percentage of lame
animals or coughs per animal in 5 minutes) following the WQ protocol (Welfare
Quality, 2009).
Table 1. Quantitative animal based measures with scoring scale (Welfare Quality,
2009).
Welfare measure Scale
Body condition 2 % lean pigs Bursitis 1 % pigs affected with moderate bursitis Bursitis 2 % pigs affected with severe bursitis Manure on the body 1 % pigs with 20-50% of body surface soiled with faeces Manure on the body 2 % pigs with >50% of body surface soiled with faeces Space allowance Sqm/ 100 kg pig Lameness 1 % pigs moderately lame Lameness 2 % pigs severely lame Wounds on the body 1 % pigs with moderate wounds on the body Wounds on the body 2 % pigs with severe wounds on the body Tail biting 2 % pigs with evidence of tail biting Twisted snouts 2 % pigs with evidence of twisted snout Pumping 2 % pig with laboured breathing Pneumonia % slaughter pigs with pneumonia Pericarditis % slaughter pigs with pericarditis Pleuritis % slaughter pigs with pleuritis Coughing Number of coughs per animal in 5 minutes Sneezing Number of sneezes per animal in 5 minutes Scouring % pens with liquid faeces Rectal prolapse 2 % pigs with evidence of rectal prolapse Skin condition 2 % pigs with ≥ 10 % of skin inflamed Milkspots % pigs slaughter with milkspots on liver Hernia 1 % pigs with hernia/rupture not bleeding or touching the floor Hernia 2 % pigs with hernia/rupture bleeding or touching the floor Mortality % mortality on farm during last year Negative behaviour % negative behaviour out of all social behaviour Exploratory behaviour % pen investigation out of exploration behaviours
% enrichment investigation out of exploration behaviours Human-animal relationship % pens showing panic response QBA descriptors 0-125 mm scale
66
2.2 Aggregation of welfare measures into criteria and principles
WQ proposes a three-step aggregation process (Welfare Quality, 2009), welfare
measures are aggregated into 12 criteria, these criteria are in turn aggregated into four
principles, and finally these four principles are combined into an overall assessment. In
this study we focused on the first two steps of the aggregation process (Figure 1).
Figure 1. Welfare Quality® bottom-up approach for integrating the data of the different
welfare measures into an overall assessment.
In the present study, two methodologies were used to produce criteria and principle
values from the data of the welfare measures collected in the farms observed: first,
following the WQ assessment protocol for growing pigs (Welfare Quality, 2009) and
second, following an alternative methodology which consisted of the use of MACBETH
and the Choquet integral in the context of the multi-attribute utility theory (MAUT).
Absence of prolonged hunger
Absence of prolonged thirst
Good feeding
Comfort around resting
Thermal comfort
Ease of movement
Good housing
Absence of injuries
Absence of disease
Absence of pain induced by management procedures
Good health
Social behaviour
Other behaviours
Good human-animal relationship
Appropriate behaviour
Positive emotional state
Measures Criteria Principles Overall value
67
Details of the aggregation of the measures into criteria and principles following both
methodologies are given in the annexed document.
2.2.1 Welfare Quality® (WQ)
-Aggregation of measures into criteria
In the first step, welfare measures were aggregated into the 12 corresponding criteria.
WQ used different types of aggregation of measures into criteria. For some criteria, the
numbers of moderate and severe problems were first combined with a weighted sum,
producing a measure index, on a scale from 0 (worst) to 100 (best). Afterwards, these
index values were converted into measure scores (expressed on the same 0-100 scale),
using spline functions (Ramsay, 1988) fitted by least-square methods. Finally, the
Choquet integral (CI) was used to combine the scores for the different measures into a
score for the criterion. For other criteria, the measures were first transformed into an
ordinal scale, which consisted of assigning warning or alarms, depending on the value
of the measures. The number of warnings and alarms were then combined into an index
for the criterion, and afterwards this index was converted into a criterion score using l-
spline functions. Decision trees were used to produce the criterion score for other
measures. Further information on the development and employment of these operators
can be found in Botreau et al. (2008, 2009) and Veissier et al. (2011).
-Aggregation of criteria into principles
In the second step, WQ used the CI to aggregate the 12 criteria into four principles. This
integral uses weights to combine the different criterion scores into one principle score
(expressed on the 0-100 scale), while limiting the possibility that a poor score of a
criterion is compensated by other excellent scores (Botreau et al., 2007; Veissier et al.,
2011).
2.2.2 Multiattribute Utility Theory (MAUT)
We developed a multi-criteria evaluation system which aimed to produce comparable
results to the methodology produced in the WQ protocol but remained more transparent
and flexible.
-Aggregation of measures into criteria
In the first step of the aggregation, MAUT (Keeney and Raiffa, 1976) was used to
produce a value of welfare for each criteria. The application of the MAUT consisted of
68
two separated steps, the utility function determination and the aggregation function
determination.
Utility function determination (MACBETH)
The utility function gives value to the measure in terms of welfare, it represents the
preferences of the decision-maker (DM) over the measures and its different values. For
example, 5% of lameness on a farm may be interpreted as a worse situation than 5% of
wounds on the body. There are different methods for utility function determination.
MACBETH (Measuring Attractiveness by a Categorical Based Evaluation Technique)
was chosen for several reasons: First, due to the available information on how to use
this method to facilitate a consensus among stakeholders (Parnell et al., 2013, Bana e
Costa et al., 2014), which is one of the main difficulties that a multi-criteria evaluation
system for animal welfare faces. Second, due to the fact that this method makes it easier
to judge the different attractiveness of options with an increasing number of criteria, due
to the use of qualitative judgments, and moreover, a scale of indifferent categories
(‘very weak’, ‘weak’, ‘moderate’, ‘strong’, ‘very strong’ or ‘extreme’), Bana e Costa et
al. (2004). Third, MACBETH allows for a comparison of not only qualitative
performance levels but quantitative performances too, with no need for a previous
conversion of the quantitative scales into a qualitative scale, allowing a solution to one
of the problems presented by Botreau et al. (2007). Fourth, the determination of the
utilities process remains transparent due to the extensive bibliography on it (Bana e
Costa et al., 1999, 2004) and it is easier to explain to the stakeholders due to the
interactive software provided (M-MACBETH).
MACBETH is a methodology which requires only qualitative judgements to quantify
the relative attractiveness (utilities) of options (farms). In order to elicit a marginal
utility function with MACBETH, the first step is to define whether the measure
performs as a quantitative measure or as a qualitative one and which are the
quantitative/qualitative performance levels of the measure. The next step is to fill in a
matrix, giving qualitative judgements regarding the difference of attractiveness between
the different quantitative performance levels of the measure. The qualitative judgements
can be rated as ‘very weak’, ‘weak’, ‘moderate’, ‘strong’, ‘very strong’ or ‘extreme’. As
each judgement was given, the matrix’s consistency was automatically verified with an
interactive algorithm based on linear programming (Mayag et al., 2010), and judgment
modifications were suggested which could be made to fix any detected inconsistency.
From the complete and consistent matrix of judgements, MACBETH creates a
69
numerical scale. With the numerical scale, MACBETH produces the marginal utility
function (u) for each measure. In order to be able to aggregate the different measures
into criteria, this method also allows normalisation of the raw data expressed in
different scales into an absolute value scale, ranging, for example, between 0 and 100,
where 0 is the worst situation one can find on a farm and 100 the best situation.
After the initial calculation of the MACBETH scale, it was checked to ensure that it
adequately represented the relative magnitude of the WQ DMs judgements, if not, the
scores were adjusted.
Aggregation with the Choquet integral (CI)
In a second step, the CI (Choquet, 1953, Murofushi and Sugeno, 1989, Grabisch, 1996)
was used to aggregate the different measures into the corresponding criteria. In order to
combine measures (individual utilities calculated with MACBETH) into criteria using
the CI, the first step was the capacity identification. Capacities can be regarded as a
weighting vector involved in the calculation of weighted sums. Seen as an aggregation
operator, the CI, takes into account the different importance of the measures and the
interaction between them. These interactions can be complementary (positive) or
substitutive (negative). When the interactions between two measures are positive,
compensation is limited between them, whereas when the interactions are negative,
compensation is allowed between them. The number of variables involved in the CI
increases exponentially, along with the coefficients which define a capacity. To keep
things simple, it may be preferable to restrict to two-additive solutions.
Capacity identification, based on the least squares (LS) approach, was implemented
within the Kappalab R package following the method described by Grabisch et al.
(2008). In order to use the LS identification method, the utilities calculated with
MACBETH corresponding to the examples’ data were used as subsets against which the
initial preferences of the WQ DMs are expressed.
The results of the aggregation of the examples’ data following the WQ protocol were
used as initial preferences in order to fit the model to the WQ DMs’ preferences.
With this methodology, a progressive interactive approach can be developed after an
initial calculation of the CI, where additional constraints to the Shapley values, which
measure the overall importance of a measure (criterion), and the interaction indices can
be imposed in order to fit more precisely the WQ DMs preferences.
70
According to Mayag et al. (2011) given (x1, x2, …, xn) the individual utilities for the
different measures, the Choquet integral with respect to a two-additive capacity can be
written as follows:
Where vi represents the importance of the measure i and corresponds to the Shapley
value of µ (capacity) and Iij represent the interaction between measure i and j.
-Aggregation of criteria into principles
Since criteria are already interpret in terms of welfare in this step, there is no need for a
utility function determination. Again, for capacity identification in the context of the
Choquet integral, we implemented the least-squares-based approach. In this step, we
used the same aggregation operator as in the WQ protocol, and due to this fact, in order
to determine the CI parameters, we used the subsets used in the WQ protocol as
learning data and the values given by the WQ DMs’ for these subsets as preferences.
The utility functions for the different welfare measures as well as the different datasets
used to fit the CI parameters to the WQ DM’s preferences, and the values of the
parameters obtained, can be found in the annex.
2.3 Model validation and sensitivity analysis
According to Harrison (1991), model validation is usually divided into three steps:
verification, validation and sensitivity analysis. Due to the fact that our model was
based on the WQ methodology, the different formulae proposed in the WQ protocol
were verified before determining our model to ensure that the model behaved as
intended. The different calculations of our model (MAUT), whether implemented in
MACBETH or in R, were checked by means of small datasets. The information of the
verification of the WQ and MAUT models can be found in the annex together with the
description of both methodologies. Thus, the model validation is divided into two steps
here, validation and sensitivity analysis.
2.3.1 Validation of the MAUT model
Due to the fact that the WQ model has been already tested for validity (Botreau et al.,
2009) we compared the results for the 44 observations both at criteria and principle
71
level, obtained with our methodology (MAUT) and with the WQ methodology, which
can be considered as a gold standard. The Euclidean distances for the 44 observations
between the WQ and the MAUT for each criterion and principles were calculated.
When the Euclidean distance between both methods for a criterion/principle was greater
than 0, the Wilcoxon Signed-Rank test confidence intervals between pairs of means
were calculated because the assumption of normality was often not appropriated. A
confidence interval for the difference between two means specifies a range of values
within which the difference between the means of the two models may lie. The
confidence interval for the differences between two means contains all the values of µ1 -
µ2 (the difference between the models’ means) which would not be rejected in the two-
sided hypothesis of:
Ho: µ1 - µ2 = 0
Against:
H1: µ1 - µ2 ≠ 0
If the confidence interval includes 0, we can say that there is no significant difference
between the means of the two models, at a given level of confidence. In this study, a
level of confidence of 90%, (α=10%), was established.
2.3.2 Sensitivity analysis of the MAUT model
In order to assess whether the model is sensitive to our farms, the values of single
welfare measures were replaced with an improved and a worsened value. These values
corresponded to the first or the third quartiles of the data (Table 1). Generally, the first
quartile corresponded to an improved situation due to the fact that the incidence of the
problem was being reduced. However, for some other measures, such as space
allowance or exploratory behaviours, the improved situation corresponded to the third
quartile due to the fact that an increase in the value of the measure led to improved
welfare. We compared the criteria and principles’ values obtained in the original
situation with the improved or worsened situation with the Wilcoxon Signed-Rank test
confidence intervals for the difference between means.
-Sensitivity analysis at criteria level
Figure 2 below shows an example of how the sensitivity analysis was carried out for the
comfort around resting criterion. First, original data for the 44 observations were
aggregated into the corresponding criteria following the MAUT methodology, having in
72
total 44 values for each criteria (a, in Figure 2). Second, the data for the 44 observations
of only one measure, for instance of Manure 1, were replaced by the improved value
(for Manure1, the first quartile). Again, the data was aggregated with the MAUT into
the corresponding criterion (b, in Figure 2). Third, using the Wilcoxon Signed-Rank
test, the confidence interval of the difference between the means of the criterion values
obtained with the original data and the criterion values obtained with the improved data
was calculated with a confidence level of 90% (α=10), (c, in Figure 2). Fourth, the
second (d, in Figure 2) and third (e, in Figure 2) steps were repeated but this time the
original data for the 44 observations of the same measure, Manure 1, were replaced by
the worsened value (the third quartile value for Manure1). These steps were repeated
modifying one measure at a time for all the criteria.
Figure 2. Outline of the methodology followed to perform the sensitivity analysis at
criteria level with the example of comfort around resting following the five steps (a, b,
c, d and e) previously described.
-Sensitivity analysis at principle level
The same methodology was used as for the sensitivity analysis at criteria level, but the
results were compared at principle level. Figure 3 shows an example of how the
sensitivity analysis was carried out for the good housing principle.
Confidence interval of the difference between means (Original-Improved)
Confidence interval of the difference between means (Original-Worsened)
Manure 1
Manure 2
Wounds 1
Wounds 2
Comfort around resting
a)
Manure 1
Manure 2
Wounds 1
Wounds 2
Comfort around resting
b)
Manure 1
Manure 2
Wounds 1 Wounds 2
Comfort around resting
d)
c)
e)
Original values
Improved values
Worsened values
73
Figure 3. Outline of the methodology followed to perform the sensitivity analysis of
good housing following the five steps (a, b, c, d and e) previously described.
3 Results
Five welfare measures, twisted snouts, rectal prolapse, shivering, panting and huddling
did not occur in any of the 44 observations. The mean, median (range), first quartile
(1Q) and third quartile (3Q) of the welfare measures with prevalence in the 44
observations are listed in Table 2. Some measures were observed with a prevalence at
farm level of lower than 1%, these measures were lean animals, bursitis 2, lameness 1,
c)
Confidence interval of the difference between means (Original-Improved)
Confidence interval of the difference between means (Original-Worsened)
e)
Manure 1
Manure 2
Wounds 1
Wounds 2
Comfort around resting
a)
Shivering
Panting
Huddling
Space allowance
Thermal comfort
Good housing
Sqm/100 kg C
b)
c) Worsened values
Improved values
Original values
Manure 1
Manure 2
Wounds 1
Wounds 2
Comfort around resting
Shivering
Panting
Huddling
Space allowance
Thermal comfort
Good housing
Sqm/100 kg C
Manure 1
Manure 2
Wounds 1
Wounds 2
Comfort around resting
Shivering
Panting
Huddling
Space allowance
Thermal comfort
Good housing
Sqm/100 kg C
74
lameness 2, scouring, skin discolouration, hernia 1 and hernia 2. The low prevalence of
coughs and sneezes occurred due to the fact that it was not possible for the assessor to
identify the number of animals coughing or sneezing and due to this fact the number of
coughs and sneezes was divided by the total number of animals in the pen.
Table 2. Means, standard deviation (SD), first quartile (1Q) and third quartile (3Q) of
welfare measures for the 44 observations.
Welfare measure Unit Mean SD 1Q 3Q
Body condition 2 % 0.05 0.26 0 0 Number of drinkers places sufficient
no. Yes (35) No (9)
Drinkers clean no. Yes (44) No (0) 2 drinkers/animal no. Yes (44)
No (0)
Bursitis 1 % 50.74 13.75 40.04 58.39 Bursitis 2 % 0.96 1.32 0.00 1.48 Manure 1 % 10.52 12.09 1.91 17.95 Manure 2 % 3.53 6.35 0.00 4.06 Space allowance Sq m/100kg 1.56 1.61 0.96 1.77 Lameness 1 % 0.29 0.44 0.00 0.69 Lameness 2 % 0.24 0.46 0.00 0.00 Wounds 1 % 8.71 7.24 2.73 14.18 Wounds 2 % 1.03 1.98 0.00 1.33 Tail biting 2 % 2.88 3.09 0.55 4.60 Pumping 2 % 0.05 0.24 0.00 0.00 Pneumonia % 5.71 3.16 3.10 8.10 Pericarditis % 1.55 0.92 0.90 1.83 Pleuritis % 2.49 2.29 0.00 3.55 Coughing no. 0.18 0.24 0.01 0.26 Sneezing no. 0.04 0.04 0.01 0.05 Scouring % 0.23 1.51 0.00 0.00 Skin condition 2 % 0.73 4.73 0.00 0.00 Milkspots % 9.79 15.19 1.20 9.60
Hernia 1 % 0.29 0.43 0.00 0.70 Hernia 2 % 0.01 0.10 0.00 0.00 Mortality % 2.5 0.79 2.00 3.00 Castration no. No (44)
With (0) Without (0)
Tail docking no. No (38) With (0)
Without (6)
Negative behaviour % 33.13 21.96 17.68 39.76 Pen investigation % 24.57 8.91 19.46 29.8 Enrichment investigation % 5.78 3.23 3.60 7.01 Panic % 8.3 17.34 0.00 2.5 QBA descriptors1 mm - - - -
1 Median, range and quartiles of descriptors (active, relaxed, fearful, etc.) for the Qualitative Behaviour
Assessment not shown.
75
3.1 Validation of the MAUT model
Means and ranges of variation for the welfare criteria and principles obtained with the
WQ and MAUT methodologies are given in Table 3. The Euclidean distances (ED)
between the WQ and the MAUT methods for each criterion and principle are also
depicted in Table 3 along with the confidence intervals of the difference between the
means for each criterion.
Table 3. Means (range) of welfare criteria and principles obtained with the WQ
methodology and the MAUT, Euclidean distances (ED) between the methods for each
criterion and principle and confidence intervals for the differences of means.
WQ MAUT ED Confidence
interval
Welfare criteria
Absence of hunger 99.71 (90.14-100) 99.71 (90.14-100) 0.0 -
Absence of thirst 90.8 (55-100) 91.55 (59.1-99.9) 12.3 -0.1, -0.09
Resting comfort 60.31 (27.71-85.41) 60.31 (27.71-85.41) 0.0 -
Thermal comfort 100 (100-100) 100 (100-100) 0.0 -
Space allowance 67.95 (21.94-98.51) 67.95 (21.94-98.51) 0.0 -
Absence of pain 46.45 (38-100) 48.74 (40.65-100) 16.3 2.64, 2.65
Absence of injuries 82.18 (55.68-97.06) 82.68 (54.61-97.22) 10.3 0.17, 0.83
Absence of disease 72.36 (24.99-83.97) 72.42 (24.88-84.37) 9.3 0.21, 0.45
Social behaviour 54.70 (14.52-84.76) 54.70 (14.52-84.76) 0.0 -
Exp. Behaviour 30.96 (13.82-44.83) 30.96 (13.82-44.83) 0.0 -
HAR 89.53 (15.75-99.99) 89.53 (15.75-99.99) 0.0 -
QBA 30.84 (6.91-52.36) 30.84 (6.91-52.36) 0.0 -
Welfare principle
Feeding 91.13 (56.76-100) 91.93 (60.92-99.91) 12.7 -0.09, 0.01
Housing 62.63 (36.94-87.80) 61.81 (35.55-87.57) 18.6 -1.61, -0.29
Health 52.47 (27.63-85.41) 57.55 (27.82-86.06) 38.5 4.44, 6.08
Behaviour 39.57 (16.8-47.90) 34.14 (15.81-48.58) 39.9 -5.84, -4.84
Confidence intervals of the criteria absence of hunger, comfort around resting, thermal
comfort, space allowance, social behaviour, exploratory behaviour, good human-animal
relationship and positive emotional state not shown due to no differences between the
WQ and MAUT methods (Euclidean distance=0).
76
There were no differences between the methods for the following criteria: absence of
hunger, comfort around resting, thermal comfort, space allowance, social behaviour,
exploratory behaviour, human-animal relationship and QBA. The differences between
the methods for absence of thirst, absence of pain induced by management procedures,
absence of injuries and absence of disease were small being the Euclidian distances
12.3, 16.3, 10.3 and 9.3 respectively, and being the confidence intervals between the
means very narrow and close to the 0 value. Comparing the differences between the
methods for the four welfare principles, it can be seen that good feeding and good
housing had lower differences than good health and appropriate behaviour. The
confidence interval for the differences between methods’ means for good feeding
included the 0 value, which means that there were no significant differences between
the methods.
3.2. Sensitivity analysis of the MAUT model
For the sensitivity analysis, only the quantitative measures were considered due to the
fact that the variations in quantitative measures were not comparable in terms of
sensitivity with the rest of the measures. Thus, the qualitative measures related to
absence of thirst and absence of pain induced by management procedure criteria were
excluded from the study. Five quantitative welfare measures, twisted snouts, rectal
prolapse, shivering, panting and huddling, were also excluded from the sensitivity
analysis because of no variability between the observations. For some measures, there
was no influence on the results by improving/worsening their values either at criteria or
at principle level, and thus, the confidence intervals could not be calculated due to the
fact that the observations were tied. These measures all belong to the disease criteria,
and were pumping, pleuritis, coughs, sneezes, scouring, skin condition and hernias 1
and 2.
3.2.1 Sensitivity analysis at criteria level
Figure 4 shows the confidence intervals of the difference of means between the original
situation and the improved situation (grey) and between the original situation and the
worsened situation (black) for each criteria with respect to the welfare measure
modified.
77
Figure 4 Confidence intervals of the difference in means between the original situation
and the improved situation (grey) and between the original situation and the worsened
situation (black) for each criteria with respect to the modified welfare measure.
The most important welfare measure for worsening the level of comfort around resting
in our study was manure 1. An increase in the mean value of manure 1 from 10.51 to
17.95 resulted in a decrease in the mean values of comfort around resting, which varied
between -21.85 and -17.55 with a 90% confidence level. However, manure 1 had a low
influence on improving the level of welfare, although the differences between the
original value and the improved value were high, being the mean of the original
situation 10.52 and 1.91 of the improved situation, the confidence interval of the
differences of means was 1.63 to 3.78.
For absence of injuries there was no single measure which led to an important
difference between the original and the improved or worsened situation. The confidence
intervals of the differences of means between the original and the improved situation
indicated that improving or worsening the level of each measure never led to an increase
or decrease at criteria level greater than 10 units, which in this study was considered as
a threshold to estimate when a measure was influencing the results at criteria level.
For absence of disease, only three measures (pneumonia, milkspots and mortality) had
an influence on improving or worsening the level of welfare. An increase in the mean
78
value of pneumonia from 5.71 to 8.10 resulted in a decrease in the mean value of
absence of disease, which lay between -15.13 and -10.22 with a 90% confidence level.
A decrease in the mean value of pneumonia from 5.71 to 3.10 resulted in an increase in
the mean value of absence of disease, which varied between -0.9, 10.22 with a 90%
confidence level. For milkspots, substituting the original values with the first quartile
and the third quartile resulted in an improved situation of welfare. The decrease in the
mean value of milkspots from 9.70 to 1.20 and from 9.70 to 9.6 resulted in an increase
in the mean value of absence of disease, which lay between 14.78 and 23.68 with a 90%
confidence level. An increase in the mean value of mortality from 2.5 to 3.00 resulted in
a decrease in the mean value of absence of disease, which varied between -12.69 and -
12.54 with a 90% confidence level. A decrease in the mean value of mortality from 2.50
to 2.00 resulted in an increase in the mean values of absence of disease, which varied
between 11.28 and 12.69 with a 90% confidence level. The rest of the measures, i.e. the
results tied, had no influence at all and thus the confidence interval of the differences of
means could not be calculated.
Pen and enrichment investigation had low influence on improving or worsening the
values of the exploratory behaviour criterion. The confidence intervals of the
differences in means between the original and the improved situation indicated that
improving or worsening the level of each measure never led to an increase or decrease
of greater than 10 units at criteria level.
For the criteria conformed by a single measure, such as absence of hunger (assessed by
percentage of lean animals), space allowance (sq m/100kg pig), social behaviour
(negative behaviour) and human-animal relationship (panic), these measures had greater
influence to improve or worsen the level of welfare than measures which were
aggregated to form criteria, although the range of variation for some of these measures
was low, as is the case of lean animals.
3.2.2 Sensitivity analysis at principle level
Figure 5 shows the confidence intervals of the difference in means between the original
situation and the improved situation (grey) and between the original situation and the
79
worsened situation (black) for each principle with respect to the modified welfare
measure.
Figure 5 Confidence intervals of the difference of means between the original situation
and the improved situation (grey) and between the original situation and the worsened
situation (black) for each principle with respect to the modified welfare measure.
By aggregating the criteria into principles, the sensitivity of the model to an
improvement or worsening of the values of the measures was lower than at criteria
level. We found that only two measures which led to important differences in the
confidence intervals of the means at criteria level also led to important differences at
principle level (confidence intervals of the differences in means in which at least one of
the confidence limits reached 10 units). These measures were manure on the body 1
(worsened) and space allowance (improved and worsened). For some other measures, at
least one of the confidence limits of the confidence interval reached values higher than 5
units and lower than 10 units: these measures were manure 2 (improved), lameness 2
(improved/worsened), pneumonia (worsened), milkspots (improved/worsened) and
mortality (improved/worsened). The rest of the measures had little influence on
improving the welfare at principle level, being the confidence limits lower than 5 units.
80
4 Discussion
4.1 Data
In the present study, real data instead of simulated data was used in order to perform the
validation and the sensitivity analysis of the model. The main advantage of using real
data was that the actual performance of the measures is known (prevalence, variation,
interactions between measures), whereas the use of simulated data, as carried out by
Vries et al. (2013), would assess the performance of the model in extreme situations
which may not occur in practical conditions. On the other hand, by using real data some
measures may have low variation or non-prevalence on farms, and thus, it may be
difficult to assess the sensitivity of the model for these measures. However, we found
comparable results with the study of Temple et al. (2011b) and thus we could assume
that our results may be representative of the growing pigs. Running the WQ protocol on
a larger scale of farms may be necessary to obtain more information on the actual
variation in the welfare measures, due to the fact that few studies have yet been carried
out.
4.2 Validation
4.2.1 Validation at criteria level
There were no differences between the WQ and the MAUT methods for the criteria
assessed by just one welfare measure, such as absence of hunger, space allowance,
social behaviour, positive emotional state and exploratory behaviour, which was
assessed by two measures but combined using a weighted sum in both methodologies
before determining the I-spline (WQ) and the utility functions (MAUT). From this, it
can be concluded that the utility functions determined in MACBETH perfectly fitted the
I-spline functions proposed in the WQ protocol.
Slight difference were found for the criteria comfort around resting and absence of
injuries, which are assessed by several measures. These differences appear to be related
to the aggregation step, not with the utility function determination, since the utility
functions determined in MACBETH perfectly fitted the I-spline functions proposed in
the WQ protocol, also for the measures which form these criteria. Differences between
the methods did occur however although the differences in the parameters of the CI
between the methods were minor, and did not lead to differences between the methods
for the learning datasets when these parameters were implemented in a large data set.
81
Although the differences were minor, this highlights the importance of the aggregation
of the parameters, even though varying them slightly can produce differences in the
results.
The differences between the methods for absence of disease are explained by the
different methodologies used in the WQ and MAUT models due to the fact that WQ
uses a weighted sum to combine the number of warnings and alarms found in the
different disease areas before determining the I-spline function. In this study a utility
function was first produced per disease area and the utilities were then aggregated using
the CI.
There were small differences or almost no differences between the methods for the
qualitative criteria (absence of thirst, thermal comfort and absence of pain induced by
management procedures) although the methodologies used in the WQ and the MAUT
were very different. This was due to the fact that the datasets used to determine the
aggregation parameters of the CI covered all the possible scenarios found on a farm, and
thus, once the model was adjusted, there could be no further variations.
4.2.2 Validation at principle level
The differences in the results at principle level were related with two factors, first the
differences between the methods at criteria level and second due to the parameters used
in the aggregation step. Good feeding, which is in turn assessed by absence of hunger
and absence of thirst, was the principle with lower differences between methods at
principle level, and as shown by the two criteria that form it had almost no differences
between methods. Thus it was possible to estimate that the differences between the
methods were mainly caused by the parameters used in the aggregation step. Comparing
the parameters of the CI used in the WQ protocol and the parameters used in this study,
it is possible to see small differences, although, differences between the methods were
found. The Shapley values (which measure the importance of the different criteria) used
in WQ were 0.39 and 0.61 for absence of hunger and absence of thirst, and the
interaction index between both criteria was 0.66. In this study, the Shapley values
assigned to absence of hunger and absence of disease were 0.38 and 0.62 respectively,
and the interaction index was 0.64.
The differences between methods were also small for good housing. There were no
differences at all between the methods for the criteria that form this principle, thus, it
82
can be concluded that the aggregation parameters were responsible for the differences at
principle level.
Larger differences between the methods occurred for good health and appropriate
behaviour compared to good feeding and good housing. The effect of the differences
between the aggregation parameters for good health was joined with the differences
between the results at criteria level. WQ proposes three-additive capacities for the
aggregation of the criteria which form the appropriate behaviour principle, whereas we
decided to limit the capacity to two-additive solutions to keep things simple. The
differences between methods appear to be related to the differences between
considering interactions between pairs of criteria (two-additive capacity) and
considering interactions between three criteria (three-additive capacity).
4.3 Sensitivity analysis
For the sensitivity analysis, the original values were modified by improved or worsened
values which corresponded to the first and the third quartiles of the data. One of the
problems for the sensitivity analysis of an overall welfare assessment arises when the
range of variation of a measure is not known. The low variation of some measures could
explain the low influence on improving or worsening the welfare both at the criteria and
principle levels of these measures due to the fact that the first and the third quartiles of
the measures were not representative of an improvement or a worsening in the level of
welfare. The means and standard deviation for the measures with low incidence were
compared to the means and standard deviation of welfare measures presented in the
study of Temple et al. (2011b), where the WQ protocol was run to assess the welfare of
growing pigs kept under intensive conditions in Spain. This comparison aimed at
estimating whether the low influence of these measures might only have occurred in our
study due to the values chosen as an improved or a worsened situation of welfare, or
whether the low influence of these measures can be generalised due to similar
prevalence in other studies,
4.3.1 Sensitivity analysis at criteria level
Comfort around resting
Two main conclusions can be drawn for the sensitivity analysis for comfort around
resting. First, the low influence of bursitis in this study could have been caused by its
83
low variation at farm level. For bursitis 1 and bursitis 2 respectively, mean values and
standard deviations of 50.74 ± 13.75 and 0.96 ± 1.32 were found, whereas Temple et al.
(2011b) presented values with higher variation for these measures, 45.06 ± 21.04 and
4.4 ± 5.6 respectively. Second, for manure on the body, similar values to Temple et al.
(2011) were found. However, although manure 2 assessed a severe condition of welfare
and manure 1 a moderate condition, the results did not indicate a greater influence of the
severe condition but of the moderate condition. Thus, it can be assumed that due to the
use of a weighted sum to aggregate the moderate and the severe conditions before
determining the utility function, compensation occurred between both levels, and the
model was not sensitive to the severe condition due to the fact that its values were
smaller than the values of the moderate condition. Although it was emphasised
throughout the development of the WQ model that welfare scores should not
compensate each other (Botreau et al., 2007 and Veissier et al., 2011), compensation
occurred in the first stages by using linear combinations, which were used both in the
WQ protocol and in this alternative methodology. Providing an individual utility
function for each severity measure and aggregating them afterwards by using the CI
could prove to be an alternative solution. On the one hand, the model accuracy would
increase, but on the other so would the complexity of the decision process, demanding
from the DMs that they interpret a higher number of measures in terms of welfare.
Absence of injuries
For absence of injuries there was no single measure which led to an important
difference between the original and the improved or worsened situation. There were no
differences in the confidence intervals for the moderate and severe conditions of
lameness and wounds on the body either. Low prevalence were found at farm level for
the measures which form this criteria. Comparing our data with the study of Temple et
al. (2011b) similar values for the absence of injuries measures were found. Temple et
al. (2011b) found means and standard deviations for lameness1, lameness2, wounds on
the body 2 and tail biting of 0.2 ± 0.43, 0.2 ± 0.45, 0.9 ± 1.38 and 0.9 ± 2.02
respectively, whereas the values found in this study for the same measures were 0.29 ±
0.44, 0.24 ± 0.46, 1.03 ± 1.98 and 2.88 ± 3.09. Thus, it can be concluded that due to the
general low variance of these measures on farms (comparable to other studies), these
measures have a low influence on improving or worsening the level of welfare.
84
Absence of disease
What can be concluded from the sensitivity analysis for absence of disease is that by
converting the original data into an ordinal scale (three qualitative levels: no problem
recorded, a warning or an alarm), the original values at criteria level only changed when
alarm or warning thresholds were reached. Due to this fact, the model was only
sensitive when the number of warnings or alarms were changed by improving or
worsening the measures values. Thus, it was impossible for the model to distinguish
between situations where the thresholds were slightly or greatly exceeded. Further,
conversion into an ordinal scale might be reconsidered, and the measures should be
treated as quantitative ones, using the warning and alarm thresholds as references for the
DM to build the utility functions.
Exploratory behaviour
Pen exploration and enrichment exploration had low influences on improving or
worsening the values of exploratory behaviour criterion. The values obtained for
exploration of enrichment material were lower than values obtained in the study of
Temple et al. (2011b). Thus, it can be concluded that the low influence of this measure
lay in its low variability. However, although the ranges of variation for pen
investigation were wider and similar to the values obtained by Temple et al. (2011b),
the influence of this measure was low. It can be concluded that compensation occurred
to form the criteria values due to the fact that a weighted sum was used to combine pen
investigation and enrichment investigation, and enrichment investigation is considered
more important than pen investigation. This compensation did not allow the model to be
sensitive to pen investigation.
Absence of hunger, space allowance, social behaviour and human-animal relationship
As can be seen for the criteria conformed by a single measure, such as absence of
hunger (assessed by % of lean animals), space allowance (sq m/100kg pig), social
behaviour (negative behaviour) and human-animal relationship (panic), the welfare
measures had greater influence to improve or worsen the level of welfare than measures
aggregated to form criteria, although the range of variation for some of these measures
was low, as is the case for lean animals. What this suggests is that by aggregating the
85
measures into criteria the sensitivity of the model for the measures was diluted,
although compensation between measures was always limited.
4.3.2 Sensitivity analysis at principle level
By aggregating the criteria into principles, the sensitivity of the model to an
improvement or worsening of the values of the measures was lower than at criteria
level. Only two measures which led to important differences in the confidence intervals
of the means at criteria level also led to important differences at principle level
(confidence intervals of the differences of means in which at least one of the confidence
limits reached 10 units). These measures were manure on the body 1 (worsened) and
space allowance (improved and worsened). It can be concluded that by following a three
aggregation step the sensitivity of the model is reduced, and thus, it may be difficult to
distinguish between farms with different levels of welfare at principle level, and
furthermore, this effect can be even more marked by aggregating the four welfare
principles into an overall evaluation.
5 Conclusions
By using the MAUT, it has been proven that the main difficulties described by Botreau
et al. (2007) faced by a multi-criteria aggregation model can be solved by allowing this
method to assign different importance to the measures, by limiting the compensation
between them and by working with data collected on different types of scales.
Furthermore, the model’s flexibility allowed us to fit the WQ assessment, obtaining
slight differences between our results and the ones obtained by implementing the WQ
protocol, both at criteria and principle level. Thus, it can be concluded that this model
could be implemented to produce an overall assessment of animal welfare in the context
of the WQ protocol for growing pigs. Furthermore this methodology could be also used
as a framework to produce an overall assessment of welfare for other livestock species.
However, from the sensitivity analysis carried out in this study, two main points were
observed which may need to be reconsidered. First, the use of weighted sums to
aggregate moderate and sever conditions as well as pen and enrichment investigation
should be reconsidered. Second, the conversion of disease measures into ordinal scores
which makes it impossible to distinguish between farms which slightly or largely
86
exceed thresholds. Finally, the suitability of the three-step aggregation process to
distinguish between farms may need to be studied further, due to the fact that by
aggregating the criteria into principles, the sensitivity of the model to an improvement
or worsening of the values of the measures was reduced due to the aggregation of the
values. Running the model on a larger scale of farms may be needed to know the actual
variation in the measures on farms. In the case of no variation between the farms at
principle level, as occurred in our observations, or at overall assessment level, the three-
step aggregation process should be reconsidered.
6 Acknowledgements
The present study is part of the PHENOMICS research project which is funded by the
German Federal Ministry of education and research.
7 References
Bana e Costa CA, de Corte JM and Vansnick JC 1999. The MACBETH approach:
Basic ideas, software, and an application, in: Meskens, N., Roubens, M., (Eds.),
Advances in Decision Analysis. Kluwer Academic Publishers, Book Series:
Mathematical Modelling: Theory and Applications, vol. 4, pp.131-157.
Bana e Costa CA, de Corte JM and Vansnick JC 2004. On the mathematical
foundations of MACBETH. In MCDA, Multiple Criteria Decision Analysis (eds J
Figueira, S Greco and M Ehrgott), pp. 409 - 442. Kluwer Academic Publishers,
Dordrecht, Netherlands.
Bana e Costa CA, Lourenço JC, Oliveira MD and Bana e Costa JC 2014. A socio-
technical approach for group decision support in public strategic planning: The
Pernambuco PPA case. Group decision and negotiation 23, 5-29.
Botreau R, Bracke MBM, Perny P, Butterworth A, Capdeville J, van Reenen CG and
Veissier I 2007. Aggregation of measures to produce an overall assessment of
animal welfare. Part 2: Analysis of constraints. Animal 1, 1188-1197.
Botreau R, Capdeville J, Perny P and Veissier I 2008. Multicriteria evaluation of animal
welfare at farm level: an application of MCDA methodologies. Foundations of
Computing and Decision Science 33, 1-18.
Botreau R, Veissier I and Perny P 2009. Overall assessment of animal welfare: Strategy
adopted in Welfare Quality. Animal Welfare 18, 363-370.
87
Choquet G 1953. Theory of capacities. Annales de l’Institut Fourier 5, 131-295.
de Vries M, Bokkers EAM, van Schaik, G, Botreau R, Engel B, Dijkstra T and de Boer
M 2013. Evaluating results of the Welfare Quality multi-criteria evaluation model
for classification of dairy cattle welfare at herd level. Journal of Dairy Science 96,
1-10.
Farm Animal Welfare Council 1992. FAWC updates the five freedoms. The Veterinary
Record 17, 357.
Geissman JR and Schultz RD 1991. Verification and validation of expert system. In
Validating and Verifying Knowledge-Based Systems (Ed. UG Gupta), pp. 12 - 19.
IEEE Computer Society Press, Washington, USA.
Grabisch M 1996. The application of fuzzy integrals in multicriteria decision making.
European Journal of Operational Research 89, 445-456.
Grabisch M, Kojadinovic I and Meyer M, 2008. A review of capacity identification
methods for Choquet Integral based multi-attribute utility theory, Applications of
the Kappalab R package. European Journal of Operational Research 186, 766-
785.
Harrison SR 1991. Validation of agricultural expert systems. Agricultural Systems 35,
265-285.
O’Keefe RM, Osman B and Smith EP 1991. Validating expert system performance. In
Validating and Verifying Knowledge-Based Systems (Ed. UG Gupta), pp. 2 - 11.
IEEE Computer Society Press, Washington, USA.
Keeney LR and Raiffa H 1976. Decisions with multiple objectives: Preferences and
values tradeoffs. Wiley, New York.
Mayag B, Grabisch M and Labreuche C 2010. An interactive algorithm to deal with
inconsistencies in the representation of cardinal information, in: Hüllermeier E,
Kruse R and Hoffmann F (Eds), Information processing and management of
uncertainty in knowledge-based systems. Theory and Methods. Springer, Book
Series: Communication in computer and information science, vol.80, pp. 148-157.
Mayag B, Grabisch M and Labreuche C 2011. A characterization of the 2-additive Choquet
integral through cardinal information. Fuzzy sets and Systems 184, 84-105.
Murofushi T and Sugeno M 1989. An interpretation of fuzzy measure and the Choquet
integral as an integral with respect to a fuzzy measure. Fuzzy sets and systems 29,
201-227.
88
Parnell GS, Brensik TA, Tani SN and Johnson ER 2013. Handbook of decision
analysis. New York: John Wiley and sons.
Qureshi ME, Harrison SR and Wegener MK 1999. Validation of multicriteria analysis
models. Agricultural systems 62, 105-116.
Ramsay JO 1988. Monotone regression splines in action. Statistical Science. 3, 425-
442.
Temple D, Manteca X, Velarde A, and Dalmau A 2011a. Assessment of animal welfare
through behavioural parameters in Iberian pigs in intensive and extensive
conditions. Applied Animal Behaviour Science 131, 29-39.
Temple D, Dalmau A, Ruiz de la Torre JL, Manteca X & Velarde A 2011b. Application
of the welfare quality protocol to assess growing pigs kept under intensive
conditions in Spain. Journal of Veterinary Behavior: Clinical Applications and
Research 6, 138-149.
Temple D, Courboulay V, Manteca X, Velarde A and Dalmau A 2012a. The welfare of
growing pigs in five different production systems: Assessment of feeding and
housing. Animal 6, 656-667.
Temple D, Courboulay C, Velarde A, Dalmau A and Manteca X 2012b. The welfare of
growing pigs in five different production systems in France and Spain:
Assessment of health. Animal Welfare 21, 257-271.
Temple D, Manteca X, Dalmau A and Velarde A 2013. Assessment of test-retest
reliability of animal-based measures on growing pig farms. Livestock Science
151, 35-45.
Tierschutzbund, D. 2013. Kriterienkatalog für eine tiergerechte haltung und behandlung
von mastschweinen im rahmen des tierschutzlabels "Für mehr tierschutz".
Deutscher Tierschutzbund e.v., Bonn.
Veissier, I., K. K. Jensen, R. Botreau, and P. Sandoe. 2011. Highlighting ethical
decisions underlying the scoring of animal welfare in the Welfare Quality scheme.
Animal Welfare 20, 89–101.
Welfare Quality 2009. Welfare Quality® Assessment Protocol for Pigs. Lelystad:
Wefare Quality® Consortium.
89
ANNEX 1 Aggregation of growing pigs’ welfare measures into criteria
1.1 Criterion ‘Absence of prolonged hunger’
Absence of prolonged hunger is assessed by one quantitative measure: percentage of
lean animals.
1.1.1 Welfare Quality® (WQ)
In the WQ protocol they first calculate an ‘Index’ from the % of lean animals.
Afterwards they resorted this ‘Index’ to a non-linear function (l-spline function)
producing a ‘Score’.
When I ≤ 80 then:
When I≥ 80 then:
1.1.2 Multi-attribute utility theory (MAUT)
In the present study, we calculated the utility function directly with MACBETH from
the % of lean animals. We stablished performance levels which vary in one unit
between 0 and 20% lean animals and intervals of 10 units between 20 and 100% lean
animals.
Figure 1. Utility function for lean animals calculated with MACBETH.
90
1.2 Criterion ‘Absence of prolonged thirst’
Absence of prolonged thirst is assessed by 3 qualitative measures, the number of
drinking places, the functioning of the drinkers and cleanliness of the drinkers. These
measures are taken at group level.
1.2.1 Welfare Quality®
For these type of measures WQ used a lexicographic valuation tree (Figure 2). The
score attribute to the farm is equal to the worst score obtained at group level on the
condition that this represents at least 15% of the animals observed from the whole farm.
Figure 2. Lexicographic valuation tree used in the WQ protocol for Absence of
prolonged thirst criterion.
1.2.2 MAUT
In this study, the number of drinking places, the functioning of the drinkers and
cleanliness of the drinkers were defined in MACBETH as three different qualitative
measures, their performance levels were established as yes/no. In Figure 3 we can see
the MACBETH scales for each measure.
35
100
80
60
45
55
40
Score
20
Are the drinkers clean?
Are the drinkers clean?
Is the number of drinker places sufficient?
Yes
No
Are there at least 2 drinkers
available for an animal?
Are there at least 2 drinkers
available for an animal?
Are there at least 2 drinkers
available for an animal?
Yes
Are there at least 2 drinkers
available for an animal?
Yes
No
No
Yes
Yes
Yes
No
No
Yes
No
No
91
Figure 3. Utilities assigned to the performance levels of Absence of prolonged thirst
criteria.
An example of eight farms was used as learning data to determine the CI aggregation
parameters (Data in Table 1). The utilities calculated with MACBETH corresponding to
the examples’ data were used as subsets to express the WQ DMs preferences (Utilities
in Table 1). The results of the aggregation of the examples’ data following the WQ
protocol were used as initial preferences in order to use the least squares based approach
for capacity identification (WQ in Table 1). In Table 2 the Shapley values and the
interaction indices for the measures are shown.
92
Table 1. Absence of prolonged thirst measures’ values, individual utilities and overall
utilities for each selected farm.
Farm Data Utilities WQ Overall utility
number clean 2/animal number clean 2/animal
a Yes Yes Yes 100 100 100 100 100
b Yes Yes No 100 100 0 80 84.17
c Yes No Yes 100 0 100 60 64.17
d Yes No No 100 0 0 45 40.83
e No Yes Yes 0 100 100 55 59.17
f No Yes No 0 100 0 40 35.83
g No No Yes 0 0 100 35 30.83
h No No No 0 0 0 20 0
Table 2. Shapley value and interaction indices to aggregate the measures’ utilities into
the criteria with the Choquet integral.
Shapley value Interaction indices
number clean 2/animal
number 0.408 - 0.75 -0.75
clean 0.358 0.75 - -0.75
2/animal 0.233 -0.75 -0.75 -
1.3 Criterion ‘Comfort around resting’
Comfort around resting is assessed by 2 measures: bursitis and manure on the body. The
measures that form this criteria have in common that they are recorded at individual
level.
1.3.1 Welfare Quality®
Briefly, in the WQ protocol for this type of measures, they first produced an ‘Index’ by
combining the percentage of animals in each severity category with a weighted sum. For
instance, for bursitis:
93
Afterwards they resorted this ‘Index’ to a non-linear function (l-spline function)
producing a ‘Score’.
When Index ≤ 50 then:
When Index ≥ 50 then:
For manure on the body:
Afterwards they resorted this ‘Index’ to a non-linear function (l-spline function)
producing a ‘Score’.
In the verification step we found that the I-spline functions proposed in the WQ
protocol for manure on the body were not working properly. The formulae proposed for
this measure are the same as the ones proposed for Space allowance, so we assumed that
there was an errata in the protocol and thus we substituted this formulae for an
approximation of the I-spline function derived from the Figure proposed in the protocol
for manure on the body.
To produce the criterion score they combine the partial scores obtained with the I-spline
function for the two measures with the CI.
1.3.2 MAUT
In the present study, before determining the utility functions of bursitis and manure on
the body we produced an Index as carried out in the WQ protocol to combine the
percentage of animals with a moderate problem and the percentage of animals with a
severe problem. We implemented the same weights used in the WQ protocol. For
bursitis:
For manure on the body:
In MACBETH, the measures that form this criteria were defined as quantitative
measures. We stablished performance levels which varied in five units between 0 and
94
20% animals with manure on the body, we stablished intervals of 10 units between 20
and 100% affected animals. For bursitis we stablished intervals of 10 units between 0
and 100% affected animals.
Figure 4. Utility function for bursitis calculated with MACBETH.
95
Figure 5. Utility function for manure on the body calculated with MACBETH.
An example of four farms was used as learning data to determine the CI aggregation parameters (Index in Table 3). The utilities calculated with MACBETH corresponding to the examples’ data were used as subsets to express the WQ DMs preferences (Utilities in Table 3). The results of the aggregation of the examples’ data following the WQ protocol were used as the WQ DMs’ initial preferences in order to use the least squares based approach for capacity identification (WQ in Table 3). The Shapley values for each measure are shown in Table 4. Manure on the body was considered more important than bursitis. As we can also see in Table 4, all the interaction between measures were positive, thus, the measures were defined as complementary. Table 3. Comfort around resting measures data for selected farms. Measures’ values, individual utilities and overall utilities for each selected farm.
Farm Index Utility Overall
utility WQ
bursitis manure bursitis manure
a 60 40 40 60 43.2 43.2
b 50 50 50 50 50 50
c 40 60 60 40 41.4 41.4
d 25 75 75 25 28.5 28.5
96
Table 4. Shapley value and interaction indices to aggregate the measures’ utilities into the criteria with the Choquet integral.
Shapley value Interaction indices
bursitis manure on the body
bursitis 0.455 - 0.77
manure on the
body 0.545 0.77
-
1.4 Criterion ‘Thermal comfort’
Thermal comfort is assessed by 3 qualitative measures, huddling, shivering and panting.
These measures are taken at group level. If no pig is displaying
huddling/shivering/panting a score of 0 is assigned to the group, if up to 20% of the
animals in the group are displaying huddling/shivering/panting a score of 1 is assigned
to the group, and if more than 20% of the animals in the group are displaying
huddling/shivering/panting a score of 2 is assigned to the group.
1.4.1 Welfare Quality®
For these type of measures WQ used a lexicographic valuation tree (Figure 6). The
score attribute to the farm is equal to the worst score obtained at group level on the
condition that this represents at least 15% of the animals observed from the whole farm.
Figure 6. Lexicographic valuation tree used in the WQ protocol for the thermal comfort
criteria.
Huddling?
Shivering?
Shivering?
Shivering?
Panting?
Panting?
Panting?
Panting?
Panting?
Panting?
Panting?
Panting?
Panting?
100
59
24
26
46
20
56
35
3
34
18
0
1
2
0
1
2
0
1
0
1
0
2
0
0
0
0
0
2
1
0
2
0
0
Score
97
1.4.2 MAUT
In this study, huddling, shivering and panting were defined in MACBETH as qualitative
measures, their performance levels were established as no huddling/shivering/panting,
<20% huddling/shivering/panting and >20% huddling/shivering/panting. Figure 7
shows the MACBETH scales for each measure. An example of 11 farms was used as
learning data to determine the CI aggregation parameters (Data in Table 5). The utilities
calculated with MACBETH corresponding to the examples’ data were used to express
the WQ DMs preferences (Utilities in Table 5). The results of the aggregation of the
examples’ data following the WQ protocol were used as initial preferences in order to
use the LS based approach for capacity identification (WQ in Table 5).
Table 6 shows the Shapley values and the interaction indices for the measures.
Figure 7. Utilities assigned to the performance levels of the thermal comfort.
98
Table 5. Thermal comfort. Measures’ values, individual utilities and overall utilities for
each selected farm.
Farm Data Utilities WQ Overall utiity
huddling shivering panting huddling shivering panting
a No No No 100 100 100 100 100
b No No <20% 100 100 45 59 66.72
c No No >20% 100 100 -20 24 27.39
d No <20% No 100 45 100 46 55.35
e No >20% No 100 14 100 26 30.18
f <20% No No 35 100 100 56 62.10
g <20% <20% No 35 45 100 35 39.17
h <20% >20% No 35 14 100 20 17.95
i >20% No No -5 100 100 34 38.78
J >20% <20% No -5 45 100 18 15.85
k >20% >20% no -5 14 100 3 2.92
Table 6. Shapley value and interaction indices to aggregate the measures’ utilities into
the criteria ‘Thermal comfort’ with the Choquet integral.
Shapley value Interaction indices
huddling shivering panting
huddling 0.291 - 0.394 0.188
shivering 0.406 0.394 - 0.417
panting 0.303 0.188 0.417 -
1.5 Criterion ‘Ease of movement’
Ease of movement is assessed by one quantitative measure: space allowance. Space
allowance is expressed in m2/100 kg animal.
1.5.1 Welfare Quality®
In the WQ protocol they first calculate an index from the space allowance.
Afterwards they resorted this ‘Index’ to a non-linear function (l-spline function)
producing a ‘Score’.
99
When Index≤ 20 then:
When Index≥ 20 then:
1.5.2 MAUT
In this study, we calculated the utility function with MACBETH. The performance
levels of this measure were defined according to the WQ protocol, where 0.3 m2 /100
kg is considered the very minimal space allowance and 10 m2 /100 kg is considered the
maximum.
Figure 8. Utility function for space allowance.
100
Since this criteria is assessed by a single measure there is no need of aggregation.
1.6 Criterion ‘Absence of injuries’
Absence of injuries is assessed by 3 measures: lameness, wounds on the body and tail
biting. The measures that form this criteria have in common that they are recorded at
individual level.
1.6.1 Welfare Quality®
Briefly, in the WQ protocol for this type of measures, particularly for lameness and
wounds on the body, they first produced an ‘Index’ by combining the percentage of
animals in each severity category with a weighted sum. For instance, for lameness:
For wounds on the body,
Afterwards they resorted this ‘Index’ to a non-linear function (l-spline function)
producing a ‘Score’. For instance, for lameness:
When Index ≤ 85 then:
When Index ≥ 85 then:
For wounds on the body:
When Index ≤ 40 then:
When Index ≥ 40 then:
101
For tail biting the I-spline function is directly calculated due to the fact that just the
absence or presence of it is scored and thus there is no need of a weighted sum to
combine the scores regarding the severity of the problem.
To produce the criterion score they combine the partial scores obtained with the I-spline
function for the three measures with the CI.
1.6.2 MAUT
In this study, before determining the utility functions of lameness and wounds on the
body we produced an Index as was carried out in the Welfare quality protocol to
combine the percentage of animals with a moderate problem and the percentage of
animals with a severe problem. We implemented the same weights used in the WQ
protocol. For instance, for lameness:
For wounds on the body:
For tail biting the utility function of the percentage of animals with presence of the
problem assessed by the measure was calculated directly.
The measures that form this criteria were defined as quantitative measures in
MABETH. We stablished performance levels which vary in one unit between 0 and
10% animals with lameness, we stablished intervals of 10 units between 10 and 100%
lame animals (Figure 9). For wounds on the body we stablished intervals of 5 units
between 0 and 100% affected animals (Figure 10). For tail biting we stablished intervals
of 1 unit between 0 and 20% affected animal and intervals of 10 units between 20 and
100% affected animals (Figure 11).
102
Figure 9. Utility function for lameness calculated with MACBETH.
Figure 10. Utility function for wounds on the body calculated with MACBETH.
103
Figure 11. Utility function for tail biting calculated with MACBETH.
An example of 10 farms was used as learning data to determine the CI aggregation
parameters (Data in Table 7). The utilities calculated with MACBETH corresponding to
the examples’ data were used to express the WQ DMs preferences (Utilities in Table 7).
The results of the aggregation of the examples’ data following the WQ protocol were
used as the WQ DMs’ initial preferences in order to use the LS based approach for
capacity identification (WQ in Table 7).
104
Table 7. Absence of injuries measures data for selected farms. Measures’ values,
individual utilities and overall utilities for each selected farm.
Farm Index Utility WQ Overall
utility Lameness Wounds Tail biting Lameness Wounds Tail biting
a 75 50 25 25 50 75 25 24.5
b 75 25 50 25 75 50 25 24.25
c 50 75 25 50 25 75 32.5 32
d 25 75 50 75 25 50 39.5 39.25
e 60 50 40 40 50 60 40 39.5
f 60 40 50 40 60 50 40 39.4
g 50 60 40 50 40 60 42.9 42.5
h 50 50 50 50 50 50 50 49.5
i 50 25 75 50 75 25 34.25 33.5
j 25 50 75 75 50 25 41.5 41
k 50 40 60 50 60 40 43.7 43.1
l 40 60 50 60 40 50 45.8 45.4
o 40 50 60 60 50 40 46.6 46.1
¹Percentage of animals affected with lameness (L) /wounds on the body (W) scored 1
²Percentage of animals affected with lameness/wounds on the body/bitten tails (BT) scored 2
The Shapley values for each measure are shown in Table 8. As we can also see in Table
8, all the interaction between measures were positive, thus, the measures were defined
as complementary.
Table 8. Shapley value and interaction indices to aggregate the measures’ utilities into
the criteria with the Choquet integral.
Shapley value Interaction indices
lameness wounds tail biting
lameness 0.54 - 0.395 0.315
wounds 0.24 0.395 - 0.315
tail biting 0.21 0.315 0.315 -
1.7 Criterion ‘Absence of disease’
Absence of disease is assessed by 13 measures. The measures used to check this
criterion lead to data expressed on different scales.
105
1.7.1 Welfare Quality®
Due to the different nature of the measures (for instance, mortality is recorded as the
percentage of mortality on farm during the last 12 months whereas coughing and
sneezing are assessed as the average frequency of coughs/sneezes per animal during 5
minutes) WQ decided to compare the data to alarm thresholds that represent the limit
between what is considered abnormal and that considered to be normal. When the
incidence observed on a measure reaches approximately half the alarm threshold, a
warning is attributed (Table 9). The measures are grouped into 6 areas. The severity of
the problem is estimated per area: if in an area, the frequency of one symptom is above
the warning threshold and the other are below, then a warning is attributed to the area; if
in an area, the frequency of one symptom is above the alarm threshold, then the alarm is
attributed to the area; if neither, there is no problem recorded.
Table 9. Warning and alarm thresholds for the absence of disease measures.
Area Symptom Warning
threshold
Alarm threshold
Respiratory area coughing (frequency per pig and 5 min) 15 46
Sneezing (frequency per pig and 5 min) 27 55
%pigs with twisted snout 1.1 3.5
%pigs pumping 1.8 5
%slaughter pigs with pleuritis 28 55
%slaughter pigs with pericarditis 5 20
%slaughter pigs with pneumonia 2.7 6
Digestive area % pigs in herd with rectal prolapse 0.7 2.5
% pens in herd with rectal faeces 6 15
Liver %slaughter pigs with white spot on the liver
(parasites)
10 23
Skin % with 10% or more skin inflamed 3.1 8
Ruptures and hernias % pigs with hernias/ ruptures not bleeding, not
touching the floor
2.4 5
% pigs with hernias/ ruptures bleeding or
touching the floor
0.6 1.5
Mortality % mortality 2.6 4.5
The number of alarms and warnings detected on a farm are calculated and they are used
to calculate an ‘Index’ with a weighted sum.
106
Finally the ‘Index’ is transformed into a score using I-spline functions.
When I≤ 10 then:
When I≥ 10 then:
1.7.2 MAUT
In this study, for the measures used to check this criteria, a transformation into an
ordinal scale was carried out in a first step, before determining the utility functions. The
data was compared to the warning and alarm thresholds defined in the WQ protocol.
The measures were grouped into 6 areas, mortality, respiratory, digestive, liver, skin and
hernias. The area was attributed with a warning or an alarm when one of the measures
was above the warning or the alarm threshold. The utility function was calculated per
area. We defined the 6 disease areas as qualitative measures where the performance
levels could be no problem recorded, a warning attributed to the area and an alarm
attributed to the area. In MACBETH when the area was attributed with a warning an
utility of 40 was assigned to the area, when the area was assigned with an alarm an
utility of 0 was assigned, and when there was no problem recorded the utility assigned
to the area was 100 (Figure 12).
Figure 12. Utilities assigned to the performance levels of the Absence of disease areas.
107
An example of 10 farms was used as learning data to determine the CI aggregation
parameters (Data in Table 10). The utilities calculated with MACBETH corresponding
to the examples’ data were used as subsets to express the WQ DMs (Utilities in Table
10). The results of the aggregation of the examples’ data following the WQ protocol
were used as initial preferences in order to use the LS based approach for capacity
identification (WQ in Table 10).
We found that the initial Shapley values resulted of aggregating the utilities with the CI,
varied between each area slightly, and in the WQ protocol all the areas are consider
equally important. After imposing additional constraint to the Shapley values, the
importance attached to each area was the same and the overall utility remained equal.
The interaction indices (Table 11) varied from the initial calculation of the CI and the
second constrained calculation, but in both cases all the areas were performing as
complementary measures.
Table 10. Absence of disease Measures’ values for each selected farm. Measures’
values, individual utilities and overall utilities for each selected farm.
Farm
Data WQ Overall
utility Mortality Respiratory condition
Digestive
condition
Parasites Skin
condition
Ruptures and
hernias
M1 C2 Sn2 P3 TS3 RP3 Sc4 P Sk5 H5 H6
a 0.3 5 2 0.2 0.1 0.1 2 0 0.4 0.5 0.1 99.99 100
b 0.7 12 5 0.3 0.2 0.8 3 0 1 1 0.3 83.971 83.8
c 1 14 24 1.4 1 0.6 20 0 3 2.3 0.3 74.126 73
d 1.3 16 10 0.5 0.3 0.3 6 0 1.3 1.5 0.5 69.457 69.457
e 1.8 20 16 1 0.7 0.5 10 0 2.4 2 0.8 56.380 58.297
f 2 6 24 1.4 1 0.7 12 0 9 2.4 0.9 48.418 48.418
g 3 30 38 1.8 1.3 1 10 0 3.6 3 1 34.225 41.806
h 2.6 33 42 2 1.6 1.2 16 0 4 3.2 1.1 27.937 31
i 3 37 44 6.1 2 1.5 17 0 4.3 7 1.2 16.88 14.004
j 5.3 50 46 3 2.4 1.7 18 0 9.7 3.8 1.7 7.675 3.01
¹Percentage of mortality (M) on farm during the last 12 months.
² Average frequency of cough(C)/sneezes (Sn) per animal during 5 minutes. 3Percentage of pigs with evidence of laboured breathing (P)/twisted snouts (TS)/rectal prolapse (RP) 4Percentage of pigs in herd with liquid faeces (Sc) 5Percentage of pigs scored as 2 in skin condition (Sk)/ hernias (H) 6Percentage of pigs scored as 1 in hernias
108
Table 11. Shapley value and interaction indices to aggregate the measures’ utilities into
the criteria with the Choquet integral.
Shapley value Interaction indices
Mortality Respiratory Digestive Liver Skin Hernias
Mortality 0.165 - 0.024 0.046 0.029 0.018 0.024
Respirato
ry 0.167 0.024
- 0.017 0.055 0.046 0.035
Digestive 0.168 0.046 0.017 - 0.077 0.037 0.025
Liver 0.163 0.029 0.055 0.077 - 0.056 0.049
Skin 0.166 0.018 0.046 0.037 0.056 - 0.021
Hernias 0.168 0.0214 0.035 0.025 0.049 0.021 -
1.8 Criterion ‘Absence of pain induced by management procedures’
Absence of pain induced by management procedures is assessed by 2 qualitative
measures, Castration and Tail docking. These measures are taken at farm level. The
farms are classified in relation to the presence or absence of these mutilation
procedures, and in case of presence of the procedures, the use or not of anaesthetics.
1.8.1 Welfare Quality®
For these type of measures WQ used a lexicographic valuation tree (Figure 13).
Figure 13. Tree created in the MACBETH decision support system for the criteria
Absence of pain induced by management procedures.
109
1.8.2 MAUT
In this study, Castration and Tail docking were defined in MACBETH as qualitative
measures, their performance levels were established as no castration/no tail docking,
castration/tail docking with anaesthetics and castration/tail docking without
anaesthetics, according to the WQ protocol. In Figure 14 we can see the MACBETH
scales for each measure.
Figure 14. Utilities assigned to the performance levels of the Absence of pain induced
by management procedures.
An example of 9 farms was used as learning data to determine the CI aggregation
parameters (Data in Table 12). The utilities calculated with MACBETH corresponding
to the examples’ data were used to express the WQ DMs preferences (Utilities in Table
12). The results of the aggregation of the examples’ data following the WQ protocol
were used as initial preferences in order to use the LS based approach for capacity
identification (WQ in Table 12).
After an initial calculation of the CI we decided not to impose any additional constraint
for the aggregation of Absence of injuries measures since the WQ DMs preferences
were satisfied. As shown in table 4 the utilities were adjusted as much as possible to the
scores defined in the WQ protocol for this criteria for the 9 possible situations that we
can find on a farm regarding Castration and Tail docking. When adjusting the utilities to
the WQ DMs preferences the CI parameters obtained indicated that Tail docking was
considered more important that Castration, and that both measure were performing in a
complementary way (Table 13).
110
Table 12. Absence of pain induced by management procedures. Measures’ values,
individual utilities and overall utilities for each selected farm.
Farm Data Utilities WQ Overall
utility Castration2 Tail docking Castration Tail Docking
a No No 100 100 100 100
b No With1 100 45 60 67.34
c No Without2 100 0 38 40.62
d With1 No 60 100 77 79.36
e With1 With1 60 45 53 51.09
f With1 Without2 60 0 35 24.37
g Without2 No 0 100 47 48.40
h Without2 With1 0 45 27 21.78
i Without2 Without2 0 0 8 0
1Castration/tail docking with anaesthesia
2Castration/tail docking without anaesthesia
Table 13. Shapley value and interaction indices to aggregate the measures’ utilities into
the criteria with the Choquet integral.
Shapley value Interaction indices
Castration Tail docking
Castration 0.461 - 0.109
Tail
docking 0.539 0.000
-
1.9 Criterion ‘Expression of social behaviours’
Expression of social behaviours is assessed by the proportion of negative behaviour out
of all social behaviour.
1.9.1 Welfare Quality®
In the WQ protocol they first calculate an index:
)
Afterwards this ‘Index’ is transformed into a ‘Score’ using l-spline functions:
When Isb≤ 70 then:
111
When Isb≥ 70 then:
1.9.2 MAUT
Here, we calculated the utility function directly with MACBETH from the proportion of
negative social behaviour out of all social behaviours. We stablished performance levels
which vary in one unit between 0 and 10%negative behaviour and intervals of 10 units
between 10 and 100% lean animals.
Figure 15. Utility function for negative behaviour calculated with MACBETH.
1.10 Criterion ‘Expression of other behaviours’
Expression of other behaviours is assessed by the percentage of active behaviour spent
in exploration of the pen and by the percentage of active behaviour spent in exploration
of the enrichment material.
112
1.10.1 Welfare Quality®
In the WQ protocol they first calculate an Index:
Afterwards this ‘Index’ is transformed into a ‘Score’ using l-spline functions:
When Iob≤ 60 then:
When Iob≥ 60 then:
1.10.2 MAUT
Here, before determining the utility function of expression of other behaviours we
produced an Index as was carried out in the Welfare quality protocol to combine the
percentage of active behaviours spent exploring the pen and the enrichment material.
We implemented the same weights used in the WQ protocol. For instance, for lameness:
Afterwards we calculated the utility function with MACBETH. We stablished
performance levels which vary in one unit between 0 and 10% of other behaviours and
intervals of 10 units between 10 and 100% of other behaviours.
Figure 16. Utility function for other behaviours calculated with MACBETH.
113
1.11 Criterion ‘Good human-animal relationship’
Good human-animal relationship is assessed by the percentage of pens showing a panic
response (score 2).
1.11.1 Welfare Quality®
In the WQ protocol they first calculate an Index:
Afterwards this ‘Index’ is transformed into a ‘Score’ using l-spline functions:
When Iob≤ 10 then:
When Iob≥ 10 then:
1.11.2 MAUT
Here, we calculated the utility function with MACBETH. We stablished performance
levels which vary in ten units between 0 and 100% of pens showing a panic response
scored 2.
Figure 17. Utility function for other behaviours calculated with MACBETH.
114
1.12. Criterion ‘Positive emotional state’
Positive emotional state is assessed by the 20 measures of the Qualitative Behaviour
Assessment.
1.12.1 Welfare Quality®
In the WQ protocol, the values (between 0 and 125) are turned into an index with a
weighted sum:
With Nk, the value obtained by a farm for a given term k, and Wk, the weight attributed
to a given term k (Table 14). In the verification of the WQ protocol, we found
difficulties in the calculations of the Positive emotional state that were solved by
substituting the weight for the measure fearful with the same value but negative (-
0.00475) and by substituting the I-spline function when the Index for positive emotional
state was greater than 0 with the spline function proposed in the dairy cattle protocol for
the same criteria:
When I≤ 0 then:
Table 14. Weights used in the calculation of the Positive emotional state Index.
Measures Weights Active 0.01228 Relaxed 0.01087 Fearful -0.00475 Agitated -0.00711 Calm 0.01122 Content 0.01184 Tense -0.00971 Enjoying 0.01030 Frustrated -0.01496 Sociable 0.00544 Bored -0.01230 Playful 0.00463 Positively occupied 0.01193 Listless -0.01448 Lively 0.01002 Indifferent -0.00747 Irritable -0.00883 Aimless -0.01193 Happy 0.01193 Distressed -0.00175
115
Afterwards this ‘Index’ is transformed into a ‘Score’ using l-spline functions:
When I≤ 0 then:
When I≥ 0 then:
1.12.2 MAUT
In the present study for the criterion positive emotional state the same methodology as
in the WQ protocol was implemented.
2. Aggregation of growing pigs’ welfare measures into criteria
As well as in the WQ protocol in this study Choquet integrals were used to combine the
criteria into the corresponding principles. In WQ different data sets, combining different
criteria values, were presented to panels of experts who were asked to give absolute
scores at principle level for each of the combinations. From the mean of the experts’
answers the parameters of the CI were elicited. In this section we present the different
data sets used in the WQ protocol as well as the CI parameters they obtained. We used
the same data sets and overall scores given by the WQ DM’s preferences in order to
determine the CI parameters by least squares based approach. The parameters used in
the WQ protocol as well as our parameters are presented here.
2.1 Good feeding
Table 15. Examples of scores for ‘Good feeding’ according to combinations of
Criterion scores for absence of prolonged hunger and absence of prolonged thirst.
Criteria Principle
Absence of hunger Absence of thirst Good feeding 40 60 46 50 50 50 60 40 41 75 25 28
116
2.1.1 Welfare Quality
Table 16. Choquet integral capacities, Shapley values and interaction indices for
absence of prolonged hunger and Absence of prolonged thirst.
Capacity Shapley values Interaction indices Absence of hunger 0.05 0.39 - Absence of thirst 0.28 0.61 - Absence of hunger & absence of thirst - - 0.66
2.1.2 MAUT
Table 17. Choquet integral Shapley values and interaction indices for absence of
prolonged hunger and absence of prolonged thirst.
Shapley values
Interaction indices Absence of hunger Absence of thirst
Absence of hunger 0.38 - 0.64 Absence of thirst 0.62 0.64 -
2.2 Good housing
Table 18. Examples of scores for ‘Good housing’ according to combinations of
Criterion scores for comfort around resting, thermal comfort and ease of movement.
Criteria Principle
Comfort around resting Thermal comfort Ease of movement Good housing 25 50 75 35 25 75 50 34 50 25 75 37 75 25 50 38 40 50 60 44 40 60 50 44 50 40 60 45 50 50 50 50 50 75 25 34 75 50 25 37 50 60 40 44 60 40 50 45 60 50 40 45
117
2.2.1 Welfare Quality
Table 19. Choquet integral capacities, Shapley values and interaction indices for
comfort around resting, thermal comfort and ease of movement.
Capacity Shapley values
Interaction indices
Comfort around resting 0.20 0.37 Thermal comfort 0.11 0.28 Ease of movement 0.16 0.35 Comfort around resting & thermal comfort 0.26 0.27 Comfort around resting &ease of movement 0.33 0.28 Thermal comfort & ease of movement 0.25 0.29 Comfort around resting & thermal comfort &ease of movement
- 0.62
2.2.2 MAUT
Table 20. Choquet integral Shapley values and interaction indices for Comfort around
resting, Thermal comfort and Ease of movement.
Shapley values
Interaction indices
Absence of
hunger Absence of
thirst Ease of
movement Comfort around resting
0.372 - 0.271 0.271
Thermal comfort 0.289 0.271 - 0.305 Ease of movement
0.339 0.271 0.305 -
118
2.3 Good health
Table 21. Examples of scores for ‘Good health’ according to combinations of Criterion
scores for absence of injuries, absence of disease and absence of pain induced by
management procedures.
Criteria Principle
Absence of injuries
Absence of disease
Absence of pain induced by management procedures
Good health
25 50 75 32 25 75 50 35 50 25 75 30 75 25 50 28 40 50 60 43 40 60 50 44 50 40 60 42 50 50 50 50 50 75 25 38 75 50 25 34 50 60 40 45 60 40 50 41 60 50 40 44
2.2.1 Welfare Quality
Table 22. Choquet integral capacities, Shapley values and interaction indices for
absence of injuries, absence of disease and absence of pain induced by management
procedures.
Capacity Shapley values
Interaction indices
Absence of injuries 0.04 0.30 - Absence of disease 0.20 0.43 - Absence of pain induced by management procedure 0.09 0.27 - Absence of injuries & absence of disease 0.31 - 0.43 Absence of injuries & absence of pain induced by management procedures
0.09 - 0.33
Absence of disease & absence of pain induced by management procedures
0.20 - 0.28
Absence of injuries & absence of disease &absence of pain induced by management procedures
- - 0.73
119
2.2.2 MAUT
Table 23. Choquet integral Shapley values and interaction indices for absence of
injuries, absence of disease and absence of pain induced by management procedures.
Shapley values
Interaction indices
Absence of
injuries Absence of
disease
Absence of pain induced by management
procedures Absence of injuries
0.318 - 0.448 0.188
Absence of disease
0.413 0.448 - 0.348
Absence of pain induced by management procedures
0.268 0.118 0.348 -
2.4 Appropriate behaviour
Figure 24. Examples of scores for ‘Appropriate behaviour’ according to combinations
of Criterion scores for Expression of social behaviours, Expression of other behaviours,
Good human-animal relationship and Positive emotional state.
Criteria Principle
Expression of social
behaviours
Expression of other
behaviours
Good human-animal relationship
Positive emotional state
Appropriate behaviour
35 35 65 65 42 35 50 50 65 44 35 50 65 50 42 35 65 35 65 40 35 65 50 50 42 35 65 65 35 39 50 35 50 65 44 50 35 65 50 43 50 50 35 65 46 50 50 50 50 50 50 50 65 35 43 50 65 35 50 45 50 65 50 35 43 65 35 35 65 43 65 35 50 50 45 65 35 65 35 40 65 50 35 50 47 65 50 50 35 46 65 65 35 35 42
120
2.2.1 Welfare Quality
Table 25. Choquet integral capacities, Shapley values and interaction indices for
Expression of social behaviours, Expression of other behaviours, Good human-animal
relationship (HAR) and Positive emotional state.
Capacity Shapley values
Interaction indices
Social behaviours 0.17 0.31 - Other behaviours 0.01 0.23 - Human-animal relationship 0.01 0.19 - Positive emotional state 0.10 0.27 - Social behaviours & Other behaviours 0.22 - 0.14 Social behaviours & HAR 0.17 - 0.06 Social behaviours & Positive emotional state 0.27 - 0.09 Other behaviours & HAR 0.13 - 0.14 Other behaviours & Positive emotional state 0.18 - 0.14 HAR & Positive emotional state 0.22 - 0.12 Social behaviours & Other behaviours& HAR 0.53 - 0.07 Social behaviours & Other behaviours& Positive emotional state
0.63 -
0.11
Social behaviours & Positive emotional state & HAR 0.52 - 0.00 Other behaviours & HAR& Positive emotional state 0.48 - -0.05 Social behaviours & Other behaviours& HAR& Positive emotional state
- -
-0.25
2.2.2 MAUT
Table 26. Choquet integral Shapley values and interaction indices for Expression of
social behaviours, Expression of other behaviours, Good human-animal relationship and
Positive emotional state.
Shapley values
Interaction indices
Social
behaviours Other
behaviours Good human-
animal relationship
Positive emotional
state Social behaviours
0.325 - 0.182 0.112 0.173
Other behaviours
0.242 0.182 - 0.154 0.149
Good human-animal relationship
0.177 0.112 0.154 - 0.089
Positive emotional state
0.254 0.173 0.149 0.089 -
121
GENERAL DISCUSSION
The main aim of the present study was to develop a multi-criteria evaluation system to
assess animal welfare. Thereby, the welfare assessment of growing pigs proposed by
Welfare Quality® was used as a framework to develop the multi-criteria methodology.
A comparison of different multi-criteria methods indicated MACBETH and the
Choquet integral (CI) in the context of the multi-attribute utility theory (MAUT) to be
the best suitable methodology to solve the main problems faced by a multi-criteria
evaluation system for animal welfare. Therefore, MACBETH and the CI were used
throughout this thesis.
General methodology
The main difficulties faced by a multi-criteria evaluation system for animal welfare are
that data is collected on different types of scales, criteria may have different levels of
importance, and interactions may exist between them – this being a key aspect that
welfare criteria may not fully compensate each other (Botreau et al., 2007b).
Accordingly, a comparison of different multi-criteria methods which could be applied to
animal welfare was carried out in Chapter One of this thesis. As a result, the use of
MACBETH together with the CI in the context of the MAUT was identified as the best
suitable methodology to assess animal welfare. The use of MACBETH, compared to
other techniques for utility function determination, as the standard sequence method or
the I-spline function proposed in the WQ protocol presented several advantages. First,
by using MACBETH, the utility function determination process remained more
transparent, which can help the stakeholders gain confidence in the model. Second, the
use of MACBETH could help to facilitate consensus between stakeholders (Parnell et
al., 2013, Bana e Costa et al., 2014), which is one of the difficulties when panels of
different DMs are consulted to determine the utility functions and the aggregation
parameters. Third, by using MACBETH, it is easier to judge the different attractiveness
of options with an increasing number of criteria, due to its interactive software and due
to the use of qualitative judgments, and moreover, a scale of indifferent categories
(‘very weak’, ‘weak’, ‘moderate’, ‘strong’, ‘very strong’ or ‘extreme’), Bana e Costa et
al. (2004). Fourth, MACBETH allows for a comparison of not only quantitative
122
performance levels but qualitative performances too, with no need for a previous
conversion of the qualitative scales into a quantitative scale, allowing a solution to one
of the problems presented by Botreau et al. (2007b). Finally, the assessment remains
more flexible. With this method, all the parameters can be changed according to new
scientific knowledge (inclusion or exclusion of measures based on new studies on their
influence in animal welfare), due to changes in societal expectations (if the welfare of
animals improves significantly on all farms, stakeholders may want to be more selective
when considering a farm as excellent), etc. The main drawback in using MACBETH is
related to the implementation of the M-MACBETH software, as it does not allow the
possibility of exporting the utility functions formulae to other environments, while
typing the information into the software can be indeed extremely tedious when working
with large amounts of data.
The use of the CI as an aggregator presented an important advantage with regard to
other methods proposed for the overall evaluation of animal welfare, which only allow
the user or investigator to assign different importance to the measures/criteria, such as
sum of ranks and sum of scores (Botreau et al., 2007a). It allowed interaction between
measures to be taken into account, thus allowing the possibility to limit the interaction
between them, and in this way, solving one of the main problems described by Botreau
et al. (2007b). The CI was also used in the WQ protocol for the aggregation of some
measures into criteria and for the aggregation of criteria into principles (Welfare
Quality, 2009). The main difficulty in implementing the least squares-based approach
for CI capacity identification is that it depends on information which the DM cannot
always provide, such as the overall scores for each criteria (Grabisch et al., 2008). Due
to the fitting of our results in accordance with the WQ DMs’ preferences, the results
obtained from the WQ model were used as initial preferences, thus avoiding this issue.
However, following the study of Merad et al. (2013), in other circumstances, it may be
difficult for the DMs to provide overall scores. Nevertheless, there are easier methods
for capacity identification proposed in the literature, such as the minimum variance
approach, which requires only a partial order over the farms as preference information.
In order to apply this methodology in the framework of the WQ protocol we found
some key points to be taken into account.
123
Use of weighted sums
According to the WQ protocol, weighted sums were used for some measures before
determining the utility functions. Weighted sums were used to aggregate the moderate
and severe conditions for bursitis, manure on the body, lameness and wounds on the
body. Pen investigation and enrichment investigation were also aggregated with a
weighted sum before determining the utility function for exploratory behaviour.
Although it was emphasised throughout the development of the WQ model that welfare
scores should not compensate each other (Botreau et al., 2007b and Veissier et al.,
2011), as shown in Chapter Two by means of small examples, compensation occurred
in the first stages by using linear combinations, which were both used in the WQ
protocol and in this alternative methodology. The extent of these problem was estimated
in Chapter Three, in which a sensitivity analysis was performed in order to demonstrate
the relative importance of welfare measures in the different steps of the multi-criteria
aggregation process. Although the severe conditions were assigned with a higher value
in the weighted sum than the moderate conditions, a higher influence of the severe
conditions to modify the level of welfare at criteria or principle level was not found.
This could be explained by two main facts. First, the severe conditions had low
variations between farms, and thus the first and the third quartiles were not
representative of an improvement in or a worsening of the level of welfare. Second,
compensation between severity measures occurred and did not allow the model to
distinguish between small variations in the severe conditions. Providing an individual
utility function for each severity measure and aggregating them afterwards by using the
CI instead of aggregating them with a weighted sum could prove to be an alternative
solution. On one hand, the compensation issue would be avoided increasing the
sensitivity of the model, but on the other, so would the complexity of the decision
process demanding from the DMs that they interpret a higher number of measures in
terms of welfare.
Conversion to ordinal scores
Due to the different nature of the disease measures (for instance, mortality is recorded
as the percentage of mortality on farm during the last 12 months, whilst coughing and
sneezing are assessed as the average frequency of coughs/sneezes per animal over 5
minutes), WQ decided to compare the disease data with alarm thresholds which
124
represent the limit between what is considered abnormal and what is considered normal.
When the incidence observed for a measure reaches approximately half the alarm
threshold, a warning is attributed. The measures are grouped into six areas: mortality,
respiratory, digestive, liver, skin and hernias. The severity of the problem is estimated
per area: if the frequency of one symptom within an area is above the warning threshold
and the other is below, a warning is attributed to the area. On the other hand, if the
frequency of one symptom within an area is above the alarm threshold, the alarm is
attributed to the area; if neither occurs, no problem is recorded. In order to simulate the
WQ DMs’ preferences, we compared the data for the absence of disease measures with
the warning and alarm thresholds established in the protocol. However, in the
development of the methodology in Chapter Two, by converting the original,
quantitative data into an ordinal scale (three qualitative levels: no problem recorded, a
warning or an alarm), it was impossible for the model to distinguish between herds
which slightly or greatly exceeded the thresholds. Furthermore, to stay in line with the
WQ protocol preferences, we decided to create a utility function per area rather than
calculate a utility per measure. Following this methodology allows large compensation
between disease measures per area. For instance, for the respiratory area, a farm with
only one measure of the respiratory area (for example pneumonia) assigned with a
warning is assigned with a warning in this area, as well as a farm which has the six
measures of the area (pneumonia, pleuritis, pericarditis, laboured breathing, coughing
and sneezing) is also assigned with a warning. What we can conclude from the
sensitivity analysis carried out in Chapter Three is that due to the comparison of the data
to warning and alarm thresholds and due to the compensation between measures in
between the disease areas the original values at criteria level for absence of disease only
changed when alarm or warning thresholds were reached. Due to this fact, the model
was only sensitive when the number of warnings or alarms were changed by improving
or worsening the measures values. Thus, the model was only sensitive to large
variations in the measures data, which makes it difficult to distinguish between different
levels of welfare between farms. Furthermore, conversion to an ordinal scale and
compensation of measures between each disease area are crucial points which might be
reconsidered, and the measures should be treated as quantitative, using the warning and
alarm thresholds as references for the DMs to build utility functions per measure instead
of per area.
125
Learning data
Small datasets were used as learning data to determine the CI aggregation parameters, to
aggregate the measures into criteria and the criteria into principles. In Chapter Two and
Chapter Three we show that our results perfectly fit the WQ results for the criteria
assessed by just one welfare measure, such as absence of hunger, space allowance,
social behaviour and positive emotional state. The results were also completely in
accordance with the WQ results for exploratory behaviour, which although assessed by
two measures, they were combined using a weighted sum in both methodologies before
determining the utility and the I-spline functions. From this, it can be concluded that the
utility functions determined in MACBETH perfectly fitted the I-spline functions
proposed in the WQ protocol. There were small differences for the qualitative criteria
(absence of thirst, thermal comfort and absence of pain induced by management
procedures) and almost no differences between the methods although the methodologies
used in the WQ and the MAUT were very different. This was due to the fact that the
datasets used to determine the aggregation parameters of the CI covered all the possible
scenarios found on a farm, and thus, once the model was adjusted, there could be no
further variations.
However, differences were found for the criteria comfort around resting and absence of
injuries, which are assessed by several measures. These differences appear to be related
to the aggregation step, not with the utility function determination, since the utility
functions determined in MACBETH perfectly fitted the I-spline functions proposed in
the WQ protocol, also for the measures which form these criteria.. Two key points were
identify in the aggregation step. First, the selection of the learning data was found to be
the most important step in the determination of the parameters of the CI. It has to be
representative of all posible scenarios found on farms, otherwise, when these parameters
are implemented in a large dataset, the results may not be in accordance with the DMs’
preferences. In Chapter Three, the learning data used to determine the CI parameters for
absence of injuries was modified since large differences between our method and the
WQ method were found when the parameters determined in Chapter Two were applied
in order to aggregate the absence of injuries data for the 44 observations. By selecting
the learning data more carefully we could better approach the WQ DMs’ preferences,
and the differences between the methods were minor. Second, although the differences
of the CI parameters derived from our learning data and the CI parameters used in the
126
WQ protocol for comfort around resting and absence of injuries were minor, differences
between the methods occurred when these parameters were implemented to aggregate
the data of the 44 observations. Although the differences were minor, this highlights the
importance of the aggregation of the parameters, even though varying them slightly can
produce differences in the results.
Further development and prospects
By using the MAUT, it has been proven that the main difficulties described by Botreau
et al. (2007b) faced by a multi-criteria aggregation model can be solved by allowing this
method to assign different importance to the measures, by limiting the compensation
between them and by working with data collected on different types of scales.
Furthermore, the model’s flexibility allowed us to fit the WQ assessment, obtaining
small differences between our results and the ones obtained by implementing the WQ
protocol, both at criteria and principle level. Thus, it can be concluded that this model
could be implemented to produce an overall assessment of animal welfare in the context
of the WQ protocol for growing pigs. Furthermore this methodology could be also use
as a framework to produce an overall assessment of welfare for other livestock species.
However, from the sensitivity analysis carried out in this study, two main points were
observed which may need to be studied further. First, it was found that the model was
not sensitive to variations in some measures at criteria level. The low variation of some
measures could explain the low influence on improving or worsening the welfare both
at the criteria and principle levels of these measures due to the fact that the first and the
third quartiles of the measures were not representative of an improvement in or a
worsening of the level of welfare. Comparable values for some measures were found in
other studies, for instance, the study of Temple et al. (2011), and thus, we could assume
that these measures have a low influence in improving or worsening the level of welfare
due to the general low variance of these measures in farms (comparable to other
studies). However, it may be necessary to run observations on a larger scale on farms to
obtain more information on the distribution of the measures. The second key point was
that the sensitivity of the model to an improvement or worsening of the values of the
measures was lower than at criteria level by aggregating the criteria into principles.
Thus, in order to prove if the three-step aggregation process is suitable to distinguished
between farms, running the model on a larger scale of farms may be needed to know the
127
actual variation in the measures on the farms. In the case of no variation between the
farm at principle level, as occurred in our observations, or at overall assessment level,
the three-step aggregation process should be reconsidered.
References
Bana e Costa CA, de Corte JM and Vansnick JC 2004. On the mathematical
foundations of MACBETH. In MCDA, Multiple Criteria Decision Analysis (eds.
J Figueira, S Greco and M Ehrgott), pp. 409 - 442. Kluwer Academic Publishers,
Dordrecht, Netherlands.
Bana e Costa CA, Lourenço JC, Oliveira MD and Bana e Costa JC 2014. A socio-
technical approach for group decision support in public strategic planning: The
Pernambuco PPA case. Group decision and negotiation. 23, 5-29.
Botreau R, Bonde M, Butterworth A, Perny P, Bracke MBM, Capdeville J and Veissier
I 2007a. Aggregation of measures to produce an overall assessment of animal
welfare. Part 1: A review of existing methods. Animal 1, 1179-1187.
Botreau R, Bracke MBM, Perny P, Butterworth A, Capdeville J, van Reenen CG and
Veissier I 2007b. Aggregation of measures to produce an overall assessment of
animal welfare. Part 2: Analysis of constraints. Animal 1, 1188-1197.
Grabisch M, Kojadinovic I and Meyer M, 2008. A review of capacity identification
methods for Choquet Integral based multi-attribute utility theory, Applications of
the Kappalab R package. European Journal of Operational Research 186, 766-
785.
Parnell GS, Brensik TA, Tani SN and Johnson ER 2013. Handbook of decision
analysis. New York: John Wiley and sons.
Temple D, Dalmau A, Ruiz de la Torre JL, Manteca X, Velarde A 2011. Application of
the Welfare Quality® protocol to assess growing pigs kept under intensive
conditions in Spain. Journal of Veterinary Behaviour 6, 138-149.
Veissier, I., K. K. Jensen, R. Botreau, and P. Sandoe. 2011. Highlighting ethical
decisions underlying the scoring of animal welfare in the Welfare Quality scheme.
Animal Welfare 20, 89–101.
Welfare Quality 2009. Welfare Quality® Assessment Protocol for Fattening Pigs.
Lelystad: Wefare Quality® Consortium.
128
GENERAL SUMMARY
Consumers’ concern about livestock living conditions has increased considerably in the
last few years. These consumers’ preferences create economic incentives for
stakeholders to meet animal welfare standards, as established by legislation or voluntary
certification schemes. It is a generally accepted fact that animal welfare is a multi-
dimensional concept and due to this fact, a multi-criteria evaluation model is required
for the assesment of an animal unit. Therefore, the current study deals with the
development of a multi-criteria evaluation system to assess animal welfare on farms,
based on the Welfare Quality® (WQ) protocol, with an example of growing pigs’
welfare assessment. In this regard, its main objective was to find a more transparent and
flexible methodology than the one proposed in the WQ protocol while solving the main
difficulties that such a model faces, which are that criteria may have different
importance, and interactions may exist between them, this being a key aspect that the
welfare criteria may not fully compensate for each other.
The Multi-attribute Utility Theory (MAUT) was applied in this study. A comparison of
different MAUT methods was provided in Chapter One. A theoretical model of a
welfare assessment for growing pigs was used considering only four criteria, good
feeding, good housing, good health and appropriate behaviour. Data for growing pig’s
farms was generated, with each farm receiving one score for each welfare criteria. Ten
farms were used as learning data and the complete dataset generated was used to
exemplify the differences between the methods. The utility functions and the
aggregation functions were constructed in two separated steps. Two utility function
determination methods (the standard sequences method and the MACBETH method),
and two aggregation functions (the weighted sum and the Choquet integral (CI)) were
compared. The utilities derived from MACBETH allowed us to model more adequately
the preferences of the decision-maker regarding the different importance of the criteria
and the interaction between them. A comparison of the weighted sum and the CI results
obtained from each method was carried out. The results showed that there were
interactions between the criteria, assuming independence among the criteria (weighted
sum) led to important differences in the classification of the farms. The use of the
MACBETH method together with the CI seemed to be the model which better solved
the difficulties presented before.
129
In Chapter Two, the application of the MACBETH method together with the CI based
on a real welfare assessment, such as the WQ protocol for growing pigs, was presented
by means of examples. The WQ decision-makers’ preferences were fit to construct the
utility functions and to determine the CI parameters. Throughout this study the different
multi-criteria methods used in the WQ protocol were compared with the unique
methodology proposed in this study. The flexibility of the MAUT model allowed us to
fit the WQ assessment, obtaining results that were comparable to the ones obtained by
implementing the WQ protocol. Additionally, this flexibility allows the possibility of
modify the model, according, for instance, to new scientific knowledge. Due to the use
of an interactive approach like MACBETH the model remained more transparent for
stakeholders than the model proposed by WQ.
After the development of any multi-criteria evaluation system, a validation of the model
must be carried out in order to prove that it works as intended in practical conditions. In
Chapter Three, the MAUT methodology proposed above was implemented to
aggregate welfare data which was collected in different growing pig farms in
Schleswig-Holstein, Germany. In total, 44 visits were carried out. The whole WQ
assessment protocol for growing pig farms was implemented in each visit. The results
obtained for each observation were compared with the results obtained by implementing
the multi-criteria methodology proposed in the WQ protocol. Also, the influence of
variations in the welfare measure values was estimated in order to assess the sensitivity
of the model. Using the MAUT, similar results were obtained to the ones obtained
applying the WQ protocol aggregation methods, both at criteria and principle level.
Two main facts can be concluded from the sensitivity analysis, first, a limited number
of measures had a strong influence on improving or worsening the level of welfare at
criteria level and second, the MAUT model was not very sensitive to an improving or a
worsening of single welfare measures at principle level.
The findings of this study indicate that the MAUT model could be implemented to
produce an overall assessment of animal welfare in the context of the WQ protocol for
growing pigs. Furthermore this methodology could also be used as a framework to
produce an overall assessment of welfare for other livestock species. However, the use
of weighted sums and the conversion of disease measures into ordinal scores should be
reconsidered. Additionally, it may be necessary to run observations on a larger scale of
130
farms to obtain more information about the distribution of the welfare measures and the
sensitivity of the model.
131
ZUSAMMENFASSUNG
In den letzten Jahren rückten die Haltungsbedingungen von Nutztieren und damit die
Frage nach dem Tierwohl vermehrt in den Fokus der Verbraucher. Dabei schaffen die
gesellschaftlichen Präferenzen ökonomische Anreize für Interessengruppen, die
gesetzlich vorgeschriebenen oder freiwillig angesetzten Qualitätsstandards in Bezug auf
das Tierwohl einzuhalten. Aufgrund der Vielzahl an Einflüssen, die das Tierwohl
bedingen, wird eine Multi-Criteria-Analyse zur Bewertung eines tierhaltenden Betriebes
notwendig. Daher beschäftigt sich die vorliegende Studie mit der Entwicklung eines
mehrfaktoriellen Bewertungssystems zur Einschätzung des tierischen Wohlbefindens
auf Betrieben. Der Ansatz basiert auf dem Welfare Quality® (WQ)-Protokoll für
Mastschweine. Das Hauptziel dieser Arbeit war es, eine transparentere und flexiblere
Methode als die dem WQ-Protokoll zugrunde liegende zu entwickeln. Dabei sollte in
erster Linie Beachtung finden, dass Kriterien innerhalb des Protokolls eine
unterschiedliche (kontrollierbare) Gewichtung annehmen können und eine
Kompensation zwischen einzelnen Kriterien begrenzt ist.
In dieser Studie wurde die Multi-Attribute Utility Theorie (MAUT) verwendet. Das
erste Kapitel beinhaltet den Vergleich verschiedener MAUT-Methoden. Unter
Einbeziehung der vier Kriterien Fütterung, Haltungsbedingungen, Gesundheit und
Verhalten wurde hierfür ein theoretisches Modell der Einschätzung des Tierwohls für
Mastschweine genutzt. Die Daten wurden für Schweinemastbetriebe generiert, wobei
jeder Betrieb eine Bewertung für die vier Kriterien erhielt. Zehn Betriebe dienten als
Lernstichprobe und der komplette Datensatz wurde dazu genutzt, die Unterschiede
zwischen den einzelnen Methoden herauszustellen.
Dabei erfolgte die Bildung der Nutzenfunktion und der Funktion für Aggregation in
zwei getrennten Schritten. Zwei Methoden zur Bestimmung der Nutzenfunktion, die
Standard Sequences Methode und die MACBETH-Methode, sowie zwei Funktionen für
Aggregation (die gewichtete Summe und das Choquet Integral (CI)) wurden einem
Vergleich unterzogen. Die aus der MACBETH-Methode abgeleitete Nutzenfunktion
ermöglicht es, die Präferenzen des Entscheidungsträgers in Bezug auf die
unterschiedliche Gewichtung der Kriterien und deren mögliche Interaktionen in
angemessenerer Weise abzubilden. Ein Vergleich der gewichteten Summe mit den
Ergebnissen des CI wurde vorgenommen, wobei die Ergebnisse Interaktionen zwischen
132
den Kriterien bestätigten. Setzt man die Unabhängigkeit zwischen den Kriterien voraus
(gewichtete Summe), führte dies zu entscheidenden Unterschieden in der Bewertung der
Betriebe. Als Konsequenz aus diesen Resultaten wurde im Folgenden die MACBETH-
Methode in Kombination mit dem CI angewendet.
Im zweiten Kapitel, wurde die Anwendung der MACBETH-Methode in Kombination
mit dem CI basierend auf dem WQ-Protokoll für Mastschweine mit Hilfe von
Beispielen untersucht. Die Präferenzen der WQ Entscheidungsträger wurden angepasst,
um die Nutzenfunktion zu erstellen und die Parameter des CI zu ermitteln. Die
verschiedenen Multi-Criteria Methoden des WQ-Protokolls wurden mit der in dieser
Arbeit vorgestellten Methode verglichen. Die Flexibilität der MAUT-Methode erlaubte
eine Anpassung an das WQ-Protokoll, was zu vergleichbaren Ergebnissen der beiden
Methoden führte. Zudem erlaubt es eine flexiblere Anpassung an sich ändernde
Voraussetzungen. Aufgrund der Anwendung eines interaktiven Ansatzes bleibt die
MACBETH-Methode transparenter für Interessengruppen gegenüber dem Modell,
welches vom WQ vorgeschlagen wird.
Nach der Entwicklung eines mehrdimensionalen Bewertungssystems muss die
Validierung des Modells folgen, um dessen Praxistauglichkeit zu überprüfen. Im
dritten Kapitel wird die im vorangegangenen Abschnitt vorgeschlagene MAUT-
Methode eingesetzt, um Daten in Bezug auf das Tierwohl zu aggregieren. Dazu
erfolgten insgesamt 44 Besuche auf Mastbetrieben in Schleswig-Holstein, Deutschland,
bei denen das gesamte WQ-Protokoll für Mastschweine angewendet wurde. Die
erzielten Resultate wurden mit den Ergebnissen der Multi-Criteria-Analyse des WQ
verglichen. Darüber hinaus wurde der Einfluss der Variation der Messwerte zur
Bewertung des Wohlbefindens geschätzt, um die Sensitivität des Modells ableiten zu
können. Aus der Verwendung von MAUT ergaben sich sowohl auf der Ebene der
Kriterien als auch für die Prinzipien ähnliche Ergebnisse wie beim Einsatz der
Aggregierungsmethode des WQ-Protokolls. Zwei wesentliche Fakten können aus der
Sensitivitätsanalyse abgeleitet werden. Auf Kriterienebene zeigte sich, dass nur wenige
Tierwohlindikatoren einen deutlichen Einfluss auf die Bewertung des Tierwohls haben,
während auf Prinzipienebene eine Verbesserung oder Verschlechterung einzelner
Indikatoren sich kaum auf die Bewertung des Tierwohls auswirken.
Die Ergebnisse aus dieser Studie zeigen, dass die Nutzung gewichteter Summen und die
Umwandlung krankheitsassoziierter Merkmale in Ordinalskalen überdacht werden
133
sollten. Darüber hinaus sollte diese Studie mit einer größeren Anzahl an Betrieben
durchgeführt werden, um weitere Informationen über die Verteilung der
Tierwohlindikatoren und der Sensitivität des Modells zu erhalten. Dennoch zeigte sich,
dass das MAUT-Modell eingesetzt werden kann, um eine generelle Einschätzung des
Wohlbefindens von Mastschweinen in Bezug auf das WQ-Protokoll zu gewinnen.
Zudem kann die vorgestellte Methode auch für die Bewertung des Tierwohls bei
anderen Nutztierspezies angewendet werden.
134
ACKNOWLEDGMENTS
At this point I would like to thank all those who have contributed in various ways
to my
research.
First of all I want to thank my supervisor Prof. Joachim Krieter for the
opportunity he gave me, for his support and his belief in me, for his time and
effort, thank you for everything.
I warmly thank Carlos Buxadé for embracing me in Madrid. Thanks to Antonio
Callejo, Martina Pérez and Andrea Luciana do Santos for all the coffees and
touching conversations we shared.
I am also indebted to my co-authors and colleagues from my working group for
the valuable contribution to my papers and all the lively discussions. Above all, I
would like to mention Imke Trauslen, Kathrin Büttner and Irena Czycholl.
My research was made possible through the financial support I received from
the German Federal Ministry of Education and Research (BMBF) within the
PHENOMICS research project.
I also want to thank my fellow PhD students for regular lunch times, for quality
times on courses and conferences, for their companionship and for their
friendship. My special thanks go to Julia Aulrich, Birte Tietgen, Christina Veit,
Anita Ehret and Karo Reckmann.
Finally, I would like to thank my family who always encouraged me to pursuit my
aims and supported me wherever they could. Last but not least this PhD would
not have been possible without Julia Kreuer and Gloria Heredia. Thank you for
your deep friendship and your assistance in all situations of life. Finally, I want
to thank Ignacio Santa-Cruz Rubio for being my person, my partner in crime
and for giving me unconditional support during the whole PhD.
135
CURRICULUM VITAE
GENERAL INFORMATION
Name: Paula Martín Fernández
Date of Birth: 31.July.1986 in Madrid
Nationality: Spanish
EDUCATION
2004-2010: AGRONOMIC ENGINEERING
POLYTECHNIC UNIVERSITY OF MADRID.
RELEVANT WORK EXPERIENCE
Since 2011: PHD STUDENT
INSTITUT FÜR TIERZUCHT UND TIERHALTUNG, CAU