fakultät für informatikinformatik 12
technische universität dortmund
Worst-CaseExecution Time Analysis
- Session 19 -
Heiko FalkTU DortmundInformatik 12
Germany
Slides use Microsoft cliparts. All Microsoft restrictions apply.
- 2 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Schedule of the course
Time Monday Tuesday Wednesday Thursday Friday
09:30-11:00
1: Orientation, introduction
2: Models of computation + specs
5: Models of computation + specs
9: Mapping of applications to platforms
13: Memory aware compilation
17: Memory aware compilation
11:00 Brief break Brief break Brief break Brief break
11:15-12:30
6: Lab*: Ptolemy
10: Lab*: Scheduling
14: Lab*: Mem. opt.
18: Lab*: Mem. opt.
12:30 Lunch Lunch Lunch Lunch Lunch
14:00-15:20
3: Models of computation + specs
7: Mapping of applications to platforms
11: High-level optimizations*
15: Memory aware compilation
19: WCET & compilers*
15:20 Break Break Break Break Break
15:40-17:00
4: Lab*: Kahn process networks
8: Mapping of applications to platforms
12: High-level optimizations*
16: Memory aware compilation
20: Wrap-up
* Dr. Heiko Falk
- 3 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Outline
Worst-Case Execution Time Definition Static WCET Analysis WCET-aware Compilation and Optimization
Function Specialization / Procedure Cloning Introduction and Code Examples Function Specialization and WCETEST
ResultsReferences & Summary
- 4 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Worst-Case Execution Time
Definition of the Worst-Case Execution Time (WCET): Upper bound for the execution time of a program irrespective of the program’s input data.
Problem:Determination of a program’s actual WCET is not computable!(WCET computation can be reduced to the Halting Problem)
- 5 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Estimation of Worst-Case Execution Times
Solution: Estimation of upper bounds for the actual (unknown) WCET
Requirements on WCET estimates: Safeness: WCET WCETEST! Tightness: WCETEST – WCET minimal
- 6 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Loop BoundAnalysis
AbsInt’sC R L2
W C ETEstim ates
D ecoderexec2crl
ValueAnalysis
PathAnalysis
P ipelineAnalysis
Executable C acheAnalysis
Workflow of the static WCET-Analyzer aiT
Input: Binary executable program P to be analyzed.
exec2crl: Disassembler, translates P into aiT’s own LIR CRL2.
Value Analysis: Determines potential contents of processor registers for any point in P’s execution time.Note: P is never executed, emulated or simulated by aiT! Only static analyses are applied.
- 7 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Loop BoundAnalysis
AbsInt’sC R L2
W C ETEstim ates
D ecoderexec2crl
ValueAnalysis
PathAnalysis
P ipelineAnalysis
Executable C acheAnalysis
Workflow of the static WCET-Analyzer aiT
Loop Bound Analysis: Tries to determine lower and upper iteration bounds for each loop of P.
Cache Analysis: Uses formal cache model, classifies each memory access within P as guaranteed cache hit or as cache miss.
- 8 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Loop BoundAnalysis
AbsInt’sC R L2
W C ETEstim ates
D ecoderexec2crl
ValueAnalysis
PathAnalysis
P ipelineAnalysis
Executable C acheAnalysis
Workflow of the static WCET-Analyzer aiT
Pipeline Analysis: Includes accurate model of processor pipeline.Depending on initial pipeline states, possible cache states etc., possible final pipeline states are determined per basic block.Result of pipeline analysis is the WCETEST of each basic block.
- 9 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Loop BoundAnalysis
AbsInt’sC R L2
W C ETEstim ates
D ecoderexec2crl
ValueAnalysis
PathAnalysis
P ipelineAnalysis
Executable C acheAnalysis
Workflow of the static WCET-Analyzer aiT
Path Analysis: Models all possible execution paths of P under consideration of the WCETEST of all basic blocks and computes the overall longest execution path leading to the global WCETEST of P.Output of aiT is e.g. the length of this longest path, i.e. the WCETEST of P.
- 10 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Loop BoundAnalysis
AbsInt’sC R L2
W C ETEstim ates
D ecoderexec2crl
ValueAnalysis
PathAnalysis
P ipelineAnalysis
Executable C acheAnalysis
Complexity of Static WCET-Analysis
Problem: WCET analysis intractable for computers! If WCET were computable, one could solve the Halting Problem by checking in O(1) if WCET < holds.
Reason: It is not computable how long P stays in a loop. Automatic loop bound analysis only feasible for simple classes of loops.(Similar problems arise for recursive functions.)
- 11 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Loop BoundAnalysis
AbsInt’sC R L2
W C ETEstim ates
D ecoderexec2crl
ValueAnalysis
PathAnalysis
P ipelineAnalysis
Executable C acheAnalysis
User Annotations for WCET Analysis
Way out: aiT user must provide information about e.g. minimal / maximal loop iterations and recursion depths.
Annotations file: Contains such user annotations (“Flow Facts”) and is – amongst program P to be analyzed – one more input to aiT.
Loop BoundAnalysis
AbsInt’sC R L2
C R L2 w ithW C ET
Estim ates
D ecoderexec2crl
ValueAnalysis
PathAnalysis
P ipelineAnalysis
Executable C acheAnalysis
U ser Anno-tations
[AbsInt Angewandte Informatik GmbH,
http://www.absint.com ]
- 12 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
ICD-CParser
ANSI C ICD-CIR
CodeSelector
LLIR
RegisterAllocator
LLIR CRL2
CRL2
aiT WCET
Analysis
CRL2 +
WCETEST
CRL2 LLIR
WCET-Opt. ASM
Analyses,
Optimi-
zations
A WCET-Aware C-Compiler (WCC)
Distinctive Feature:Tight integration of WCET analyzer aiT into WCC’s backend.
- 13 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Tight Coupling WCC aiT
• CRL2: Low-level intermediate representation (LIR) of aiT.• Structure of CRL2: Control Flow Graph (CFG) consisting of routines, basic blocks, instructions and operations.
• WCET analysis applied to low-level machine code coupling WCC aiT done in compiler backend.• Converter LLIR2CRL: translates WCC’s LIR into CRL2, executes aiT in the background.• After successful WCET analysis: relevant WCET data extracted from CRL2, imported into WCC.
- 14 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Imported WCET Data & Flow Facts in WCC
Most important WCET data imported into WCC: WCETEST of the entire program WCETEST of each single basic block Worst-Case execution frequencies of each CFG edge
Flow Fact Annotation within WCC: Annotation of e.g. loop iteration bounds directly in C source code: _Pragma( “loopbound min 10 max 10” ); Since compiler optimizations may restructure loops and thus their annotated bounds, WCC automatically keeps
Flow Facts consistent during all applied optimizations.
- 15 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Problems during WCETEST Minimization
The Worst-Case Execution Path (WCEP): WCET of a program P = length of longest execution path of P (WCEP) To minimize P’s WCETEST, optimizations must exclusively focus on those parts of P lying on the WCEP.
Optimization of parts not lying on the WCEP don’t reduce WCETEST at all!
Optimization strategies for WCETEST Minimization must have detailed knowledge about the WCEP.
WCET Analyzer aiT provides this detailed information about the WCEP, but…
- 16 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Switching WCEPs (1)
main() {
if ( … )
a();
else
d();
return;
}
a() {
…
b();
…
}
b() {
…
c();
…
}
d() {
…
c();
…
}
main
a
b
c
d
10 Cyc.
50 Cyc.
80 Cyc.
65 Cyc.
120 Cyc.WCETEST of a for
a in Main Mem.
Example:Scratchpad allocation of functions for WCETEST minimizationAbove: Function Call Graph
- 17 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Switching WCEPs (2)
main() {
if ( … )
a();
else
d();
return;
}
a() {
…
b();
…
}
b() {
…
c();
…
}
d() {
…
c();
…
}
main
a
b
c
d
10 Cyc.
50 Cyc.
80 Cyc.
65 Cyc.
120 Cyc.
WCETEST = 205 Cyc.
• Initial WCEP: main, a, b, c• Length of WCEP = WCETEST: 205• SPM-Allocation of b reduces WCETEST of b down to 40
- 18 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Switching WCEPs (3)
main() {
if ( … )
a();
else
d();
return;
}
a() {
…
b();
…
}
b() {
…
c();
…
}
d() {
…
c();
…
}
main
a
b
c
d
10 Cyc.
50 Cyc.
40 Cyc.
65 Cyc.
120 Cyc.
• Initial WCEP: main, a, b, c• Length of WCEP = WCETEST: 205• SPM-Allocation of b reduces WCETEST of b down to 40
- 19 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Switching WCEPs (4)
main() {
if ( … )
a();
else
d();
return;
}
a() {
…
b();
…
}
b() {
…
c();
…
}
d() {
…
c();
…
}
main
a
b
c
d
10 Cyc.
50 Cyc.
40 Cyc.
65 Cyc.
120 Cyc.
WCETEST = 195 Cyc.
• New WCEP: main, d, c• New WCETEST: 195WCEP has switched as side effect of optimization!
- 20 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Consequences for Compiler Optimizations
Optimizations aiming at WCETEST minimization… …always have to consider that the WCEP may switch after each and every decision taken by the optimization. …should not take their decision where to optimize on the basis of local information. Instead, they should always consider the global effects of a decision.
(SPM allocation of b in previous example leads to a local reduction of b’s WCETEST by 40 cycles. However, a global reduction of only 10 cycles was achieved!)
Challenge: To design completely new optimization strategies matching the above requirements and always considering the global CFG with its current WCEP.
- 21 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Function Specialization
Also known as “Procedure Cloning”: Well-known optimization technique (1993) Goals: To enable further standard optimizations and to reduce overhead for parameter passing during function calls Approach: For worthwhile functions, specialized copies (“Clones”) are created having less parameters than their original versions.
- 22 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Typical Function Calls
int f( int *x, int n, int p ) { for (i=0; i<n; i++) { x[i] = p * x[i];
if ( i == 10 ) { ... }
}
return x[n];
}
int main() {
... f( y, 5, 2 ) ...
... f( z, 5, 2 ) ...
return f( a, 5, 2 );
}
Observations: f is called several times. f has three parameters. Constants 5 and 2 are passed several times for parameters n and p.
- 23 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Further Observations
int f( int *x, int n, int p ) { for (i=0; i<n; i++) { x[i] = p * x[i];
if ( i == 10 ) { ... }
}
return x[n];
}
int main() {
... f( y, 5, 2 ) ...
... f( z, 5, 2 ) ...
return f( a, 5, 2 );
}
• f is so-called general purpose function: Via parameter n, an array x of arbitrary size can be processed.• Control flow within f depends on function parameter.
Within main, f is used for special purpose:Processing of arrays ofsize n = 5.
- 24 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
int f( int *x, int n, int p ) { for (i=0; i<n; i++) { x[i] = p * x[i];
if ( i == 10 ) { ... }
}
return x[n];
}
int main() {
... f_5_2( y ) ...
... f_5_2( z ) ...
return f_5_2( a );
}
int f_5_2( int *x ) { for (i=0; i<5; i++) { x[i] = 2 * x[i];
if ( i == 10 ) { ... }
}
return x[5];
}
Specialization of General Purpose Routines
- 25 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Expected Effects of Function Specialization
Reduction of the Average-Case Execution Time (ACET): Enabling of further standard optimizations in cloned function:
• Constant propagation and constant folding• Strength reduction• Control flow simplification
Less code to be executed in order to pass arguments to cloned function.
Increase of Code Size: Potentially, many new highly specialized clones are generated.
- 26 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Why Function Specialization and WCETEST?
Motivation: In Embedded Software: General Purpose functions heavily used in special purpose contexts. In particular, loop bounds are very often controlled by function parameters. Recall: Loop bounds are extremely critical for a precise WCET analysis. Function Specialization enables the extremely precise annotation of loop bounds!
[P. Lokuciejewski, H. Falk et al., Influence of Procedure Cloning on WCET Prediction, Salzburg, CODES+ISSS 2007]
- 27 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Loop Bound Annotations & Cloning
int f( int *x, int n, int p ) { // loopbound min 0, max n for (i=0; i<n; i++ ) {
x[i] = p * x[i];
if ( i == 10 ) { ... }
}
return x[n];
}
Original Code:Loop annotations extremely imprecise since annotation must cover all calls of f in a safe way.
If f is called somewhere with n = 2000 as maximal value, the annotation must always consider a maximum of 2000 iterations.
For calls like e.g. f(a, 5, 2), 2000 iterations are considered as well Massive WCET overestimation.
- 28 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Loop Bound Annotations & Cloning
int f_5_2( int *x ) { // loopbound min 5, max 5 for (i=0; i<5; i++) {
x[i] = 2 * x[i];
if ( i == 10 ) { ... }
}
return x[5];
}
Specialized Code:Loop annotations extremely precise since annotation covers all calls of f_5_2 exactly.
For calls like e.g. f_5_2(a), exactly 5 iterations are considered Exact WCET estimation.
Original function f or other additional clones of f fully independent of annotations within f_5_2 no interdependencies during WCET analysis for all these different clones of f.
- 29 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Workflow of Function Specialization
Attention when applying Function Specialization: Code size increases should be kept clearly in mind during specialization! Cloning of each function which is called at least twice with the same constant parameter usually unacceptable.
Parameterized Application of Function Specialization: Size G: Never specialize a function f having a size larger than G. Fraction K of same constant parameters: clone f only iff at least K% of all calls of f use the same constant parameters.
- 30 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Workflow of Function Specialization
Com pilerBackend
B inaryExecutable
ANSI-CSource
O ptim izedH igh-Level
IR
W CET
Tim ingAnalyzer
S im ulator
SimulatedTime
FlowFacts
IC D -CO ptim izer
• Function Specialization• Const. Folding & Propagation• Strength Reduction• If-Statement Optimization
• G = 2000 ICD-C Expressions• K = 50%
Considered Processors:• Infineon TriCore TC1796• ARM 7 TDMI (ARM & THUMB instruction sets)
- 31 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Relative WCETEST after Function Specialization
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
TriCore ARM Thumb
Rel
. WC
ET
ES
T [
%]
EPIC MPEG2 GSM Average
WCETEST improvements between 13% and up to 95%!
- 32 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Relative ACETs after Function Specialization
0%10%
20%30%40%50%
60%70%80%90%
100%110%
TriCore ARM Thumb
Re
l. A
CE
T [
%]
EPIC MPEG2 GSM Average
• Only marginal ACET improvements of max. 3%. Influence of overhead for parameter passing and effect of successive optimizations on ACET obviously marginal.
- 33 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Discussion of Results
Very astonishing: How can one optimization impact two apparently similar objectives like WCET and ACET so differently?
Reasons: Code before specialization only poorly annotatable aiT provides very imprecise WCET estimates
Code after specialization very precisely annotatable aiT provides highly precise WCET estimates
Function Specialization obviously effective in increasing the tightness of WCETEST (cf. Slide 4).The actually unknown (since intractable) WCET should be reduced by only a similar order of magnitude as the ACET.
- 34 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Relative Code Sizes after Specialization
0%25%50%75%
100%125%150%175%200%225%250%275%300%325%350%
TriCore ARM Thumb
Re
l. C
od
e S
ize
[%
]
EPIC MPEG2 GSM Average
Code size increases between 2% up to 325%!
- 35 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
References
Function Specialization: D. Bacon, S. Graham, O. Sharp, Compiler Transformations for High-Performance Computing, ACM Computing Surveys 26 (4), 1994. P. Lokuciejewski, H. Falk, M. Schwarzer et al., Influence of Procedure Cloning on WCET Prediction, CODES+ISSS, Salzburg 2007.
- 36 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
Summary
WCET Upper bound of a program’s execution time Impossible to compute a program’s actual WCET Static WCET analysis using aiT
Function Specialization / Procedure Cloning Specialization of general purpose functions Enabling of standard compiler optimizations in specialized functions, simplification of code required for parameter passing Small impact on ACET, huge impact on WCETEST
- 37 -technische universitätdortmund
fakultät für informatik
h. falk, informatik 12, 2008
TU Dortmund
12th International Workshop on Software & Compilersfor Embedded Systems April 23-24, 2009
Acropolis, Nice, France Co-located with DATE 2009 Paper Submission: around mid November 2008
Website & CfP: http://www.scopesconf.orgMailing List: http://www.scopesconf.org/list