Humboldt-Universität
zu Berlin
Edda Klipp
Systembiologie6 –
Netzwerke
und Graphen
Sommersemester 2010
Humboldt-Universität
zu
BerlinInstitut
für
BiologieTheoretische
Biophysik
Humboldt-Universität
zu Berlin
Metabolic Networks
Barabasi
& Oltvai, Nature Rev
Gen 5, 101 (2004)
To study
the
network
characteristics
of the
metabolism
a graph
theoretic
description
needs
to be
established. (a) illustrates
the
graph
theoretic
description
for
a simple pathway
(catalysed
by
Mg2+-dependant enzymes).(b) In the
most
abstract
approach
all interacting
metabolites
are
considered
equally. The
links between
nodes
represent
reactions
that
interconvert
one
substrate
into
another. For many
biological
applications
it
is
useful
to ignore
co-factors, such as the
high-energy-phosphate
donor
ATP, which
results(c) in a second type
of mapping
that
connects
only
the
main
source
metabolites
to the
main
products.
Humboldt-Universität
zu Berlin
Metabolic Network
Human Glycolysis and GluconeogenesisAs taken from KEGG
Contains metabolites and enzymes
Humboldt-Universität
zu Berlin
Layers of Metabolic Regulation
Metabolite Metabolite
Enzyme
mRNA
Genes
Humboldt-Universität
zu Berlin
Signaling Networks
Bhalla
& Iyengar, 1999, Science
Humboldt-Universität
zu Berlin
Yeast Protein-Protein Interactions
A map of protein–protein
interactions in Saccharomyces cerevisiae, which is based on early yeast two-hybrid measurements, illustrates that a few highly connected nodes (which are also known as hubs) hold the network together.
The largest cluster, which contains 78% of all proteins, is shown. The color of a node indicates the phenotypic effect of removing the corresponding protein (red = lethal, green = non-lethal, orange = slow growth, yellow = unknown).
Barabasi
& Oltvai, Nature Rev
Gen 5, 101 (2004)
Humboldt-Universität
zu Berlin
Human Disease Network, 1
Humboldt-Universität
zu Berlin
Human Disease Network, 2
Humboldt-Universität
zu Berlin
Human Disease Network, 3
Humboldt-Universität
zu Berlin
Temporal protein interaction network of the yeast mitotic cell cycle.
Cell cycle proteins that
are part of complexes or other physical interactions are shown within the circle. For the dynamic proteins, the time of peak expression is shown by the node color; static proteins are represented by white nodes. Outside the circle, the dynamic proteins without interactions are both positioned and colored according to their peak time and thus also serve as a legend for the color scheme in the network. More detailed versions of this figure (including all proteinnames) and the underlying data are available online at www.cbs.dtu.dk/cellcycle.Lichtenberg et al., Science, 2005
Humboldt-Universität
zu Berlin
Textmining: Protein-Protein Interaction
(A) The known pheromone signalling
pathway [17]. (B) Thick lines indicate the ‘backbone’
linking a cell-surface receptor (Ste2) to a transcription factor (Cln1). The backbone follows the most reliable edges in a yeast interaction network based on statistical associations in Medline abstracts. The thin lines link ‘associated factors’
to the backbone. These nodes are generally connected to the backbone proteins. Lappe
et al., 2005, Biochem. Soc. Trans.
Humboldt-Universität
zu Berlin
A Protein Interaction Map of Drosophila melanogaster
Drosophila melanogaster is a proven model system for many aspects of human biology. Here we present a twohybrid–based protein-interaction map of the flyproteome. A total of 10,623 predicted transcripts were isolated and screened against standard and normalized complementary DNA libraries to produce a draft map of 7048 proteins and 20,405 nteractions. A computational method of rating two-hybrid interaction confidence was developed to refine this draft map to a higher confidence map of 4679 proteins and 4780 interactions. Statistical modeling of the network showed two levels of organization: a short-range organization, presumably corresponding to multiprotein
complexes, and a more global organization, presumably corresponding to intercomplex
connections. The network recapitulatedknown pathways, extended pathways, and uncovered previously unknown pathway components. This map serves as a starting point for a systems biology modeling of multicellular
organisms including humans.
Giot
et al, 2003, ScienceExpress
Humboldt-Universität
zu Berlin
Global views of the protein interaction map
(A) Protein family/human disease ortholog
view. Proteins are color-coded according to protein family as annotated by the Gene Ontology hierarchy. Proteins orthologous
to human disease proteins have a jagged starry border. Interactions were sorted according to interaction confidence score and the top 3000 interactions are shown with their corresponding3522 proteins. This corresponds roughly to a confidence score of 0.62 and higher. (B) Subcellular
localization view. This view shows the fly interaction map with each protein colored by its Gene Ontology Cellular Component annotation. This map has been filtered by only showing proteins with less than or equal to 20 interactions and with at least one Gene Ontology annotation (not necessarily a cellular component annotation). We show proteins for all interactions with a confidence score of 0.5 or higher. This results in a map with 2346 proteins and 2268 interactions.
Giot
et al, 2003, ScienceExpress
Humboldt-Universität
zu Berlin
PPI Local View
Splicing complex associated with sex determination. Giot
et al, 2003, ScienceExpress
Humboldt-Universität
zu Berlin
Transcriptional regulatory networks
RegulonDB: database
with
information
on transcriptional
regulation
and operon organization
in E.coli; 105 regulators
affecting
749 genes
7 regulatory
proteins
(CRP, FNR, IHF, FIS, ArcA, NarL
and Lrp) are
sufficient to directly
modulate
the
expression
of more
than
half of all E.coli
genes.
Out-going
connectivity
followsa power-law
distribution
In-coming
connectivity
followsexponential distribution
(Shen-Orr).
Martinez-Antonio, Collado-Vides, Curr
Opin
Microbiol
6, 482 (2003)
Humboldt-Universität
zu Berlin
Regulatory cascades
The
TF regulatory
network
in E.coli. When
more
than
one
TF regulates
a gene, the
order of their
binding
sites
is
as given
in the
figure. An arrowhead
is
used
to indicate
positive regulation
when
the
position
of the
binding
site
is
known.
Horizontal bars
indicates
negative regulation
when
the
position
of the
binding
site
is
known. In cases
where
only
the
nature of regulation
is
known, without
binding
site
information, + and –
are
used
to indicate
positive and negative regulation.
The
DBD families
are
indicated
by
circles
of different colours
as given
in the
key. The
names
of global regulators
are
in bold. Babu, Teichmann, Nucl. Acid
Res. 31, 1234 (2003)
Humboldt-Universität
zu Berlin
Gene Regulation Network Sea Urchin Embryo
Davidson, 2002,Dev Biol
Humboldt-Universität
zu Berlin
Basic Principles of Graph Theory
Literature:
J. Sedlácek
(1968) Einführung
in die Graphentheorie. Teubner Verlagsgesellschaft, Leipzig.
Albert & Barabási
(2002) Statistical mechanics of complex networks. Rev Mod Physics, 74, 47-97.
Barabási
& Oltvai
(2004) Network biology: understanding the cell’s functional organization, Nature Review Genetics, 5, 101-113.
Humboldt-Universität
zu Berlin
Classical ExamplesThe problem of “Fährmann, Ziege, Wolf und Heu”
(F,Z,W,H)
(W,H)
(F,W,H)
(W)
(H)
(F,Z,W)
(F,Z,H)
(F,Z)
(Z) (0)
Humboldt-Universität
zu Berlin
The Bridges of Königsberg
Die Brücken von KönigsbergIm
Zentrum
der
preussischen
Stadt
Königsberg
(heute
Kaliningrad) bildet
der
Fluss
Pregel
beim
Zusammenfluss
zweier
Arme
eine
Insel. Im
18. Jahrhundert
verbinden
7 Brücken
die Flussufer
mit
der
Insel. Es stellt
sich
die Frage, ob es
einen
Rundweg
gibt, bei
dem
man alle
7 Brücken
genau
einmal
überquert
und wieder
zum
Ausgangspunkt
zurück
gelangt. GeschichteDas Problem der
Königsberger
Brücken
stammt
von Leonhard Euler. Im
Jahre
1736 beweist
er, dass
es
keinen
solchen
Rundweg
geben
kann. Er
betrachtet
den allgemeinen
Fall mit
einer
beliebigen
Anzahl
Inseln
und Brücken
und zeigt, dass
ein
Rundweg
der
gesuchten
Art genau
dann
möglich
ist, wenn
sich
an keinem
der
Ufer
eine
ungerade
Zahl
von Brücken
befindet. Gibt
es
an genau
zwei
Ufern
eine
ungerade
Anzahl
Brücken, dann
existiert
ein
Weg, der
bei
diesen
beiden
Ufern
beginnt
und endet
und dabei
alle
Brücken
genau
einmal
überquert. Gibt
es, wie
in Königsberg, mehr
als
zwei
Gebiete, zu
denen
eine
ungerade
Zahl
von Brücken
führt, dann
kann
kein
Weg
existieren, der
genau
einmal
alle
Brücken
überquert.
Humboldt-Universität
zu Berlin
Graphs: Definitions
A,C,A,BA,B,C
EV
A edgevertex,node
A graph is a tuple
(V,E) with V a set of n
vertices and a set of m
edges E:
G=(V,E)
Example: Proteins –
vertices, interactions –
edges
B
C
vertex –
Knotenedge –
Kantetuple
–
Tupel, geordneteMenge
set –
Menge
VVE
Humboldt-Universität
zu Berlin
Graphs: Completeness
A,C,A,BA,B,C
EV
VVE
A edgevertex
B
C
Edge AB is has vertices A and B.
Knoten
A ist
inzidiert
mit
Kante
AB.
Be E0 the set of all
sub-sets of V with two elements.
A graph is complete, if E=E0 . a) b) c) d)
If G1 =(V1 ,E1 )G2 =(V2 ,E2 )
and021
21
EEEEE
: G1 and G2 are complementary.
d)
Humboldt-Universität
zu Berlin
Graph Types
BA
BA,E
,V
ABBA
BA,,,E
,V
Undirected graphs:
Directed graphs
(digraphs): directed edge (i,j) œ
E with i
denoting the head and j
denoting the tail of the edge.
A B
A B
Extension: Directed edge (i,j,s) E with s
{+1,-1} to represent activatory
or inhibitory influences.
A B
Humboldt-Universität
zu Berlin
Graph Types: Biparite
Graphs
A
B D
C
R1
Set of graph vertices decomposed into two disjoint sets such thatno two graph vertices within the same set are adjacent.
Graphs must represent two distinct classes of nodes such as metabolites (blue, circles) and reactions (yellow, boxes)
ATP
Fruc-6-P Fruc-1,6-P2
ADPR1
Humboldt-Universität
zu Berlin
Graph Representation: Adjacency Matrix
A B C D E F G
A 0 1 0 0 0 0 0
B 0 0 1 1 0 0 0
C 0 0 0 0 0 0 0
D 0 0 1 0 0 1 0
E 0 0 0 1 0 0 0
F 0 0 0 0 1 0 1
G 0 0 0 0 0 0 0
A B
C
ED
F G
◊ Adjacency matrix A: non-zero entries represent edges- quadratic-
unique assignment of adjacency matrix to graph
-
unique assignment of graph to adjacency matrix
◊ Bipartite graphs: sub-matrices for the two classes of nodes
◊ Alternative formats: edge lists, vertex lists
Adjacency matrix –Inzidenzmatrix
Humboldt-Universität
zu Berlin
Graph Theoretical Measures: DegreeA B C D E F G
A 0 1 0 0 0 0 0
B 0 0 1 1 0 0 0
C 0 0 0 0 0 0 0
D 0 0 1 0 0 1 0
E 0 0 0 1 0 0 0
F 0 0 0 0 1 0 1
G 0 0 0 0 0 0 0
2
13
Bo
Bi
B
k
kk
A B
C
ED
F G
◊ Number of edges to which a vertex is connected: Degree
k.
◊ For directed graphs: in-degree –
edges ending at a vertexout-degree –
edges starting a vertex
◊ Vertices with degree 0: isolated
Degree –Knotengrad
Be G a finite graph, v
the number of nodes, k
the number of edges and s1 , s2 ,…su the degrees of the individual nodes, then holds:
ksv
ii 2
1
Humboldt-Universität
zu Berlin
Graph Theoretical Measures: Degree
A B
C
ED
F G
◊ Global connectivity properties of a graph:
Average degree <k>
Degree distribution P(k)
kin012
P(kin
)1/74/72/7
kout012
P(kout
)1/74/72/7
<kin
> = (4x1 + 2x2)/7=8/7≈1,14
Degree distributions allow to distinguish between different types of networks
Humboldt-Universität
zu BerlinEinschub: Diskrete Wahrscheinlichkeitsverteilungen
Binomialverteilung:
Summe
aller
Wahrscheinlichkeiten
E(X) = np Erwartungswert
(vgl.: Mittelwert
für
sehr
viele
Wiederholungen)
Var(X) = np(1-p) Varianz
knk ppkn
kP
1
Eigenschaften
einer
Stichprobe: Wenn
das gewünschte
Ergebnis
eines
Versuchesdie Wahrscheinlichkeit
p
besitzt, und die Zahl
der
Versuche
n
ist, dann
gibt
die Binomialverteilung
an, mit
welcher
Wahrscheinlichkeit
sich
insgesamt
k
Erfolge
einstellen.
P(k) ist
die Wahrscheinlichkeit
(z.B. mit
n
Versuchen
aus
einem
Topf
von Bällen
k
schwarze
zu
ziehen)
p=1/2
P(k)
Humboldt-Universität
zu BerlinEinschub: Diskrete Wahrscheinlichkeitsverteilungen
Poissonverteilung:
Eigenschaften
einer
Stichprobe: Wie
vorher, nur
bei
sehr
kleiner
Wahrscheinlichkeit
der
Einzelereignisse, z.B. weil
n
sehr
groß.
-
Ereignisrate
(z.B. Fehlerrate
bei
der
DNS-Replikation)
E(X) = Erwartungswert
(vgl.: Mittelwert
für
sehr
viele
Wiederholungen)
Var(X) = Varianz
Exponentialverteilung:
0 20 40 60 80 100
0.8
0.6
0.4
0.2
0.0
E(X) = 1/
Var(X) = 1/
Humboldt-Universität
zu Berlin
Degree Distributions
Degree distribution of the World Wide Web from two different measurements: h, the 325 729-node sample of Albert et al. (1999); s, the measurements of over 200 million pages by Broder
et al. (2000);
(a)
degree distribution of the outgoing edges;
(b)
degree distribution of the incoming edges. The data have been binned logarithmically to reduce noise.
Albert & Barabasi, 2002, Rev Mod Phys
Humboldt-Universität
zu Berlin
Degree Distributions
Albert & Barabasi, 2002, Rev Mod Phys
The degree distribution of several real networks: (a) Internet at the router level. Data courtesy of Ramesh
Govindan;(b) movie actor collaboration network. After Barabasi
and Albert 1999. Note that if TV series are included as well,which aggregate a large number of actors, an exponential cutoff emerges for large k (Amaral
et al., 2000); (c) co-authorship network of high-
energy physicists. After Newman (2001a,2001b); (d) co-authorship network of neuroscientists. After Barabasi
et al. (2001).
Humboldt-Universität
zu Berlin
Degree Distributions
Jeong
H et al, 2000, Nature
Connectivity distributions P(k) for substrates. a, Archaeoglobus fulgidus (archae);
b, E. coli (bacterium);
c, Caenorhabditis elegans (eukaryote),
shown on a log±log
plot, counting separately the incoming (In) and outgoing links (Out) for each substrate.
kin (kout) corresponds to the number of reactions in which a substrate participates as a product (educt).
d, The connectivity distribution averaged over all 43 organisms.
Humboldt-Universität
zu Berlin
Random Graphs
First well-know example: Model of Paul Erdős
and Alfréd
Rényi
History: Erdős
number
beschreibt die Distanz im Graphen der Koautorenschaft
bezogen auf den Mathematiker Paul Erdős. Im Graphen werden die publizistisch verwandten Autoren als Knoten repräsentiert, zwischen denen jeweils dann eine Kante existiert, wenn sie eine Publikation
gemeinsam verfasst haben.Paul Erdős
selbst hat die Erdős-Zahl
0, alle Koautoren, mit welchen er publiziert hat, haben die Erdős-Zahl
1. Autoren, die mit Koautoren von Paul Erdős
publiziert haben, haben die Erdős-Zahl
2 usw. Wenn keine Verbindung in dieser Form zu einer Person herstellbar ist, ist ihre Erdős-Zahl
∞.Es zeigt sich, dass die Erdős-Zahl
der meisten Personen entweder unendlich oder erstaunlich gering ist. Letzteres rührt vor allem daher, dass Erdős
mit über 500 verschiedenen Wissenschaftlern gemeinsam publizierte und er in vielen Teilbereichen der Mathematik bewandert war.
Humboldt-Universität
zu Berlin
Random Graphs
n1
n2
n3
n4
n5
n1
x - x -
n2
- - x
n3
x -
n4
-
n5
A well-know example: Model of Paul Erdős
and Alfréd
Rényi
Start with N
nodes.
Connect every pair of nodes with probability p
Dice number z
[0;1]. If z<p
then connection
Obtain graph with approx. ½ pN
(N-1) edges
Degree distribution: Poisson distribution
Average degree: <k> = ½ pN
(N-1) * 2/N
= p(N-1) @
pN
Humboldt-Universität
zu Berlin
Random Graphs: Evolution
Construction of random graphs is called evolution.
Starting with a set of isolated vertices, the graph develops by the successive addition of random edges.
The graphs obtained at different stages of this process correspond to larger and larger connection probabilities p, eventually obtaining a fully connected graph
having the maximum number of edges n=N(N-1)/2 for p1.
Humboldt-Universität
zu Berlin
Random Networks
Questions:
Are real networks really random?
Display real networks organization principles?
Is a typical graph connected? (depending on p)
Does it contain a triangle of connected nodes?
Does its diameter depends on its size?
Humboldt-Universität
zu Berlin
Random Networks: Subgraphs
A graph G1 consisting of a set V1 of nodes and a set E1 of edges is a subgraph of a graph G={V,E} if all nodes in V1 are also nodes of V
and all edges in E1
are also edges of E.
A cycle of order k
is a closed loop of k
edges such that every two consecutive edges and only those have a common node.Average degree: 2
The opposite of cycles are the trees, which cannot form closed loops. More precisely, a graph is a tree of order k
if it has k
nodes and k-1 edges, and none
of its subgraphs
is a cycle. Average degree of a tree of order k: <k>=2-2/k
(2 for large trees)
TriangleRectangle
Humboldt-Universität
zu Berlin
Random Networks: Subgraphs
The threshold probabilities at which different subgraphs
appear in a random graph. For pN3/20 the graph consists of isolated nodes and edges. For p~N-3/2 trees of order 3 appear, while for p~N-4/3 trees of order 4 appear. At p~N-1 trees of all orders are present, and at the same time cycles of all orders appear. The probability p~N-2/3 marks the appearance of complete subgraphs
of order 4 and p~N-1/2 corresponds to complete subgraphs
of order 5. As z
approaches 0, the graph contains complete subgraphs
of increasing order.
Humboldt-Universität
zu Berlin
Degree Distribution
The degree distribution that results from the numerical simulation of a random graph. We generated a single random graph with N=10 000 nodes and connection probability P=0.0015, and calculated the number of nodes with degree k, Xk. The plot compares Xk
/N
with the expectation value of the Poisson distribution (13), E(Xk
)/N=P(ki
=k), and we can see that the deviation is small.
Humboldt-Universität
zu Berlin
Graph-theoretical Measure: Distance
Path: Connection between two vertices u and v without repetition of nodes (i.e. no backtracking, no loops)
Shortest path length
l(u,v) : Local measure for two nodes
Average shortest path length
<l>Global network property indicating navigability
A B
C
ED
F G
Humboldt-Universität
zu Berlin
Graph-theoretical Measure: Distance
Breadth-first search: Exploration of all nodes in a graph starting from those adjacent to a current node.
Dijkstra’s
algorithm: Construct shortest-path tree from a source to every other vertex (vertex number N: O(N2) )
A B
C
ED
F G
Humboldt-Universität
zu Berlin
Graph-theoretical Measure: Diameter
Strictly speaking, the diameter of a disconnected graph (i.e., one made up of severalisolated clusters) is infinite, but it can be defined as the maximum diameter of its clusters.
Random graphs tend to have small diameters, provided p is not too small.
• If <k> = pN
< 1, a typical graph is composed of isolated trees and its diameter
equals the diameter of a tree.• If <k> > 1, a giant cluster appears. The diameter of the graph equals the diameter of the giant cluster if <k> >3.5, and is proportional to ln(N)/ln(<k>).• If <k> >ln(N), almost every graph is totally connected. The diameters of the graphs having the same N
and <k> are concentrated on a few values around ln(N)/ln(<k>).
A B
C
ED
F G
The diameter
of a graph is the maximal distance between any pair of its nodes.
Humboldt-Universität
zu Berlin
Graph-theoretical Measures: Clustering
A B
C
ED
F G
Clustering coefficient C(v) for node v: Ratio between the number of edges linking nodes adjacent to v
and the total number of
possible edges among them (at most kv
(kv
-1)/2 for kv
neighbors)
C(D) =1/3
Adjacent nodes: B, C, E, FNumber of links:
2
Possible number of links: 6
Idea behind: In many networks, if node A is connected to B, and B is connected to C, then it is highly probable that A also has a directlink to C.
Humboldt-Universität
zu Berlin
Graph-theoretical Measures: Clustering
A B
C
ED
F G
Average clustering coefficient <C>:
Tendency of the network to form clusters or groups
Average clustering coefficient for all nodes with k
links C(k) :Diversity of cohesiveness of local neighborhoods
C(A) =0C(B) =1/3C(C) =1C(D) =1/3C(E) =1C(F) =1/3C(G) =0
<C>=3/7
Humboldt-Universität
zu Berlin
Graph-theoretical Measures: Clustering
A B
C
ED
F G
Complex networks exhibit a large degree of clustering. If we consider a node in a random graph and its nearest neighbors, the
probability that two of these neighbors are connected is equal to the probability that two randomly selected nodes are connected.
Humboldt-Universität
zu Berlin
Graph-theoretical Measures: Clustering
Clustering coefficients as predicted for random networks and Clustering coefficients for real networks(WWW, movie actors, co-authorship, E.coli
substrate graph, E.coli
reaction graph, food webs, word co-occurrence, power grids,…)
Humboldt-Universität
zu Berlin
Random Networks: Scale-free Networks
Most “natural”
networks do nothave a typical degree value,they are free of a characteristic“scale”: they are scale-free.
Their degree distribution followsa Power law:
P(k) ~ k-
Heterogeneity: hubs with highconnectivity, many nodes with low connectivity.
Humboldt-Universität
zu Berlin
Random Networks: Scale-free Networks
The probability that a node is highly connected is statistically
more significant than in a random graph, the network’s properties often being determined by a relatively small number of highly connected nodes that are known as hubs.
Barabási–Albert model of a scale-free network: Growth: Start with small number of nodes (m0 )At each time point add a new node with m ≤
m0 links to the network
Preferential attachment:Probability for a new edge is
where kI is the degree of node I and J is the index denoting the sum over network nodes.
The network that is generated by this growth process has a power-law degree distribution that is characterized by the degree exponent
= 3.
jj
ii k
kk
Humboldt-Universität
zu Berlin
Evolution of Scale-free Networks
After t
time steps
network with N = t + m0
nodes and m t
edges.
Continuum theory: ki
as continuous real variable, change proportional to (ki
)
322 kmkPt ~
jj
ii k
kk
Degree distribution independent of time tAnd independent of network size N
But proportional to m2
Humboldt-Universität
zu Berlin
Evolution of Scale-free Networks
After t
time steps
network with N = t + m0
nodes and m t
edges.
Master equation approach: probability p(k,ti
,t) that a node i
introduced at time ti
has degreek
at time t.
21
12
for2
2
1for121
12
112
11
kkkmmkP
mkm
mkkPkk
kP
ttkpt
kP
ttkpt
kttkpt
kttkp
iti
t
iii
,,lim
,,,,,,Main idea:
Dorogovtsev&Mendes-2001
Humboldt-Universität
zu Berlin
Scale-free Networks: Degree Distribution
Humboldt-Universität
zu Berlin
Scale-free Networks: Clustering coefficient
No inherent clustering coefficent
C(k)
Humboldt-Universität
zu BerlinScale-free Networks: Average Path Length
lk
1 l
lkN ~
Shortest path length l: distance between two vertices u
and v
with unit length edges
Fully connected network:
Rough estimation for random network:
Average number of nearest neighbors: <k>
vertices are at distance l
or closer
total number of vertices
kNl
lnln~
Humboldt-Universität
zu Berlin
Scale-free Networks: Average Path Length
NNl
CBNAl
lnlnln~
ln
Is obtained by fitting
1
12
1 zzzNl
lnln
Humboldt-Universität
zu Berlin
Scale-free Networks: Error and Attack Tolerance
Question: consider arbitrary connected graph of N
nodes and assume that a p
fraction of edges have been removed.
What is the probability of the resulting graph being still connected?Usually: existence of a threshold probability pc
.
More severe: removal of nodes
Humboldt-Universität
zu Berlin
Scale-free Networks: Error and Attack Tolerance
Node RemovalThe relative size S
(a),(b) and average path length l (c),(d) of the largest cluster in an initially connected network when a fraction f of the nodes are removed. (a),(c) Erdös-Renyi
random network with N=10 000 and <k>=4;(b),(d) scale-free network generated by the Barabasi-
Albert model with N=10 000 and
<k>=4. Ñ, random node removal; º, preferential removal of the most connected nodes.
Humboldt-Universität
zu Berlin
Random Networks: Scale-free Networks
Difference to classical random networks:
Growth of the network –
given number of nodes
Preferential attachment –
equal probability for all edges
Examples:
References in www: connections frequently to existing hubs
Metabolism: many molecules are involved in only a few (1,2) reactions, others (like ATP or water) in many
Wagner/Fell: highly connected molecules are evolutionary “old”
Humboldt-Universität
zu Berlin
Properties of Scale-Free Networks
Small-world –
short paths between arbitrary points
Robustness –
Topological robustness
Humboldt-Universität
zu Berlin
Watts & Strogatz
Model
Problem: real networks have short average path length and greater clustering coefficients than classical random graphs
Construction: Initially, a regular one dimensional lattice with periodical boundary conditions is present. Each of L vertices has z ≥
4 nearest neighbors. Then one takes all the edges of the lattice in turn and with probability p
rewires to randomly chosen vertices. In such a way, a number of far connections appears. Obviously, when p
is small, the situation has to be close to the original regular lattice. For large enough p, the network is similar to the classical random graph.
Humboldt-Universität
zu Berlin
Watts-Strogatz-Model
By definition of Watts and Strogatz, the smallworld
networks are those with “small”
average shortest
path lengths and “large” clustering coecients.
<k> remains constant During rewiring
Humboldt-Universität
zu Berlin
Random Networks: Hierarchical Networks
To account for the coexistence of modularity, local clustering and scale- free topology in many real systems it has to be assumed that clusters
combine in an iterative manner, generating a hierarchical network. The starting point of this construction is a small cluster of four densely linked nodes.Next, three replicas of this module are generated and the three externalnodes of the replicated clustersconnected to the central node ofthe old cluster, which produces alarge 16-node module. Threereplicas of this 16-node moduleare then generated and the 16peripheral nodes connected tothe central node of the oldmodule, which produces a newmodule of 64 nodes….
Humboldt-Universität
zu Berlin
Random Networks: Hierarchical Networks
Clustering coefficent
scales with the degree of the nodes
Humboldt-Universität
zu Berlin
Humboldt-Universität
zu Berlin
Humboldt-Universität
zu Berlin
Application to Metabolic Networks
Jeong
H et al, 2000, Nature
Humboldt-Universität
zu Berlin
Metabolic Networks: Degree Distributions
Humboldt-Universität
zu Berlin
Metabolic Networks: Hubs
Humboldt-Universität
zu Berlin
Metabolic Networks: Pathway Lengths
E.coli 43 different organisms
Humboldt-Universität
zu Berlin
Metabolic Networks: Robustness
Hubs removed first
Random removal
M=60 –
8% of substrates
Humboldt-Universität
zu Berlin
Protein-Protein-Interaction Networks
Yeast proteomea)
Map of protein-protein interactions
Largest cluster: 78% of all proteinsRed –
lethalGreen –
non-lethalOrange –
slow-growthYellow –
unknown
Cut-off: kc
=20
b) Connectivity distribution
c) Fraction of essential proteins withk
links
Random removal –
no effectHubs removal –
lethal
Humboldt-Universität
zu Berlin
Protein-Protein-Interaction Networks
Random removal –
no effectHubs removal –
lethal
93% of proteins have 5 or less linksonly 21% of them are essential
0.7% of proteins have more than 15 links62% of them are lethal
Humboldt-Universität
zu Berlin
Protein-Protein-Interaction Networks