Query optimization - Politecnico di Torino

Database Management Systems
Query optimization
Database Management Systems
Query optimization
DB
MG
DBMS Architecture
SQL INSTRUCTION
OPTIMIZER
CONCURRENCY CONTROL
MANAGEMENT OF ACCESS
METHODS
RELIABILITY
MANAGEMENT
BUFFER MANAGER
Index Files
Data Files
System
Catalog
DATABASE
DB
MG
Elena Baralis, Silvia Chiusano
Pag. 1
Politecnico di Torino
1
Database Management Systems
Query optimization
Query optimizer
It selects an efficient strategy for query execution
It is a fundamental building block of a relational
DBMS
It guarantees the data independence property
The form in which the SQL query is written does
not affect the way in which it is implemented
A physical reorganization of data does not require
rewriting SQL queries
3
DB
MG
Query optimizer
It automatically generates a query execution plan
It was formerly hard-coded by a programmer
The automatically generated execution plan is
usually more efficient
It evaluates many different alternatives
It exploits statistics on data, stored in the system
catalog, to make decisions
It exploits the best known strategies
It dynamically adapts to changes in the data
distribution
DB
MG
4
Elena Baralis, Silvia Chiusano
Pag. 2
Politecnico di Torino
2
Database Management Systems
Query optimization
Query optimizer
SQL
QUERY
LEXICAL, SYNTACTIC
AND SEMANTIC
ANALYSIS
DB
MG
Lexical, syntactic and semantic analysis
Analysis of a statement to detect
Lexical errors
e.g., misspelled keywords
Syntactic errors
errors in the grammar of the SQL language
Semantic errors
references to objects which do not actually exist in
the database (e.g, attributes or tables)
information in the data dictionary is needed
DB
MG
6
Elena Baralis, Silvia Chiusano
Pag. 3
Politecnico di Torino
3
Database Management Systems
Query optimization
Lexical, syntactic and semantic analysis
Output
Internal representation in (extended) relational
algebra
Why relational algebra?
It explicitly represents the order in which operators
are applied
It is procedural (different from SQL)
There is a corpus of theorems and properties
exploited to modify the initial query tree
7
DB
MG
Query optimizer
SQL
QUERY
LEXICAL, SYNTACTIC
AND SEMANTIC
ANALYSIS
DATA
DICTIONARY
INTERNAL REPRESENTATION
BASED ON RELATIONAL ALGEBRA
ALGEBRAIC
OPTIMIZATION
DB
MG
Elena Baralis, Silvia Chiusano
Pag. 4
Politecnico di Torino
4
Database Management Systems
Query optimization
Algebraic optimization
Execution of algebraic transformations
considered to be always beneficial
Example: anticipation of selection with respect to
join
Should eliminate the difference among different
formulations of the same query
This step is usually independent of the data
distribution
Output
Query tree in “canonical” form
9
DB
MG
Query optimizer
SQL
QUERY
LEXICAL, SYNTACTIC
AND SEMANTIC
ANALYSIS
DATA
DICTIONARY
INTERNAL REPRESENTATION
BASED ON RELATIONAL ALGEBRA
ALGEBRAIC
OPTIMIZATION
“CANONICAL” QUERY TREE
COST BASED
OPTIMIZATION
DB
MG
Elena Baralis, Silvia Chiusano
Pag. 5
Politecnico di Torino
5
Database Management Systems
Query optimization
Cost based optimization
Selection of the “best” execution plan by
evaluating execution cost
Selection of
the best access method for each table
the best algorithm for each relational operator
among available alternatives
Based on a cost model for access methods and
algorithms
Generation of the code implementing the best
strategy
11
DB
MG
Cost based optimization
Output
Access program in executable format
It exploits the internal structures of the DBMS
Set of dependencies
conditions on which the validity of the query plan
depends
e.g., the existence of an index
DB
MG
12
Elena Baralis, Silvia Chiusano
Pag. 6
Politecnico di Torino
6
Database Management Systems
Query optimization
Query optimizer
SQL
QUERY
LEXICAL, SYNTACTIC
AND SEMANTIC
ANALYSIS
DATA
DICTIONARY
INTERNAL REPRESENTATION
BASED ON RELATIONAL ALGEBRA
ALGEBRAIC
OPTIMIZATION
“CANONICAL” QUERY TREE
DATA PROFILES
(STATISTICS ON
DATA)
COST BASED
OPTIMIZATION
DB
MG
ACCESS PROGRAM
SET OF DEPENDENCIES
Execution modes
Compile and go
Compilation and immediate execution of the
statement
No storage of the query plan
Dependencies are not needed
DB
MG
14
Elena Baralis, Silvia Chiusano
Pag. 7
Politecnico di Torino
7
Database Management Systems
Query optimization
Execution modes
Compile and store
The access plan is stored in the database together
with its dependencies
It is executed on demand
It should be recompiled when the data structure
changes
15
DB
MG
Database Management Systems
Algebraic optimization
DB
MG
Elena Baralis, Silvia Chiusano
Pag. 8
Politecnico di Torino
8
Database Management Systems
Query optimization
Algebraic optimization
SQL
QUERY
LEXICAL, SYNTACTIC
AND SEMANTIC
ANALYSIS
DATA
DICTIONARY
INTERNAL REPRESENTATION
BASED ON RELATIONAL ALGEBRA
ALGEBRAIC
OPTIMIZATION
“CANONICAL” QUERY TREE
COST BASED
OPTIMIZATION
DB
MG
ACCESS PROGRAM
DATA PROFILES
(STATISTICS ON
DATA)
SET OF DEPENDENCIES
Algebraic optimization
It is based on equivalence transformations
Two relational expressions are equivalent if they
both produce the same query result for any
arbitrary database instance
Interesting transformations
reduce the size of the intermediate result to be
stored in memory
prepare an expression for the application of a
transformation which reduces the size of the
intermediate result
DB
MG
18
Elena Baralis, Silvia Chiusano
Pag. 9
Politecnico di Torino
9
Database Management Systems
Query optimization
Transformations
1. Atomization of selection
σF1 Ʌ F2 (E) ≡ σF2 (σF1 (E)) ≡ σF1 (σF2 (E))
19
DB
MG
Transformations
1. Atomization of selection
σF1 Ʌ F2 (E) ≡ σF2 (σF1 (E)) ≡ σF1 (σF2 (E))
2. Cascading projections
πX(E) ≡ πX (πX,Y(E))
DB
MG
20
Elena Baralis, Silvia Chiusano
Pag. 10
Politecnico di Torino
10
Database Management Systems
Query optimization
Transformations
1. Atomization of selection
σF1 Ʌ F2 (E) ≡ σF2 (σF1 (E)) ≡ σF1 (σF2 (E))
2. Cascading projections
πX(E) ≡ πX (πX,Y(E))
3. Anticipation of selection with respect to join
(pushing selection down)
σF (E1
E2) ≡ E1
(σF (E2))
F is a predicate on attributes in E2 only
21
DB
MG
Transformations
4. Anticipation of projection with respect to join
πL(E1
p E2)
≡ πL ((πL1, J(E1)
p(πL2,J(E2)))
L1 = L - Schema(E2)
L2 = L - Schema(E1)
J = set of attributes needed to evaluate join
predicate p
DB
MG
22
Elena Baralis, Silvia Chiusano
Pag. 11
Politecnico di Torino
11
Database Management Systems
Query optimization
Transformations
5. Join derivation from Cartesian product
σF (E1 × E2 ) ≡ E1
F E2
predicate F only relates attributes in E1 and E2
23
DB
MG
Transformations
5. Join derivation from Cartesian product
σF (E1 × E2 ) ≡ E1
F E2
predicate F only relates attributes in E1 and E2
6. Distribution of selection with respect to union
σF(E1 ∪ E2 ) ≡ (σF (E1) ) ∪ (σF (E2) )
DB
MG
24
Elena Baralis, Silvia Chiusano
Pag. 12
Politecnico di Torino
12
Database Management Systems
Query optimization
Transformations
5. Join derivation from Cartesian product
σF (E1 × E2 ) ≡ E1
F E2
predicate F only relates attributes in E1 and E2
6. Distribution of selection with respect to union
σF(E1 ∪ E2 ) ≡ (σF (E1) ) ∪ (σF (E2) )
7. Distribution of selection with respect to
difference
σF(E1 – E2) ≡ (σF (E1) ) – (σF (E2) )
≡ (σF (E1)) – E2
25
DB
MG
Transformations
8. Distribution of projection with respect to union
πX(E1 ∪ E2) ≡ (πX(E1)) ∪ (πX(E2))
DB
MG
26
Elena Baralis, Silvia Chiusano
Pag. 13
Politecnico di Torino
13
Database Management Systems
Query optimization
Transformations
8. Distribution of projection with respect to union
πX(E1 ∪ E2) ≡ (πX(E1)) ∪ (πX(E2))
Can projection be distributed with respect to
difference?
πX (E1 - E2) ≡ (πX(E1)) - (πX(E2))
27
DB
MG
Transformations
8. Distribution of projection with respect to union
πX(E1 ∪ E2) ≡ (πX(E1)) ∪ (πX(E2))
Can projection be distributed with respect to
difference?
πX (E1 - E2) ≡ (πX(E1)) - (πX(E2))
This equivalence only holds if X includes the
primary key or a set of attributes with the same
properties (unique and not null)
DB
MG
28
Elena Baralis, Silvia Chiusano
Pag. 14
Politecnico di Torino
14
Database Management Systems
Query optimization
Transformations
9. Other properties
σF1 V F2(E) ≡ (σF1 (E)) ∪ (σF2 (E))
σF1 Ʌ F2(E) ≡ (σF1 (E)) ∩ (σF2 (E))
29
DB
MG
Transformations
10. Distribution of join with respect to union
E
(E1 ∪ E2) ≡ (E
E1) ∪ (E
E2)
All Binary operators are commutative and
associative except for difference
DB
MG
30
Elena Baralis, Silvia Chiusano
Pag. 15
Politecnico di Torino
15
Database Management Systems
Query optimization
Example
Tables
EMP (Emp#, ………, Dept#, Salary)
DEPT (Dept#, DName,……………)
SQL query
SELECT DISTINCT DName
FROM EMP, DEPT
WHERE EMP.Dept#=DEPT.Dept#
AND Salary > 1000;
31
DB
MG
Example: Algebraic transformations
πDName (σEMP.Dept#=DEPT.Dept# Ʌ Salary
DB
MG
>1000
(EMP × DEPT))
32
Elena Baralis, Silvia Chiusano
Pag. 16
Politecnico di Torino
16
Database Management Systems
Query optimization
Example: Algebraic transformations
πDName (σEMP.Dept#=DEPT.Dept# Ʌ Salary
>1000
(EMP × DEPT))
Prop #1
πDName(σSalary >1000 (σEMP.Dept#=DEPT.Dept# (EMP×DEPT))
33
DB
MG
Example: Algebraic transformations
πDName (σEMP.Dept#=DEPT.Dept# Ʌ Salary
>1000
(EMP × DEPT))
Prop #1
πDName(σSalary >1000 (σEMP.Dept#=DEPT.Dept# (EMP×DEPT))
Prop #5
πDName(σSalary >1000 (EMP
DB
MG
DEPT)
34
Elena Baralis, Silvia Chiusano
Pag. 17
Politecnico di Torino
17
Database Management Systems
Query optimization
Example: Algebraic transformations
πDName(σSalary >1000 (EMP
DEPT)
Prop #3
πDName(σSalary >1000 (EMP))
DEPT)
35
DB
MG
Example: Algebraic transformations
πDName(σSalary >1000 (EMP
DEPT)
Prop #3
πDName(σSalary >1000 (EMP))
DEPT)
Prop #2 and #4
πDName ((πDept# (σSalary >1000(EMP))
DB
MG
(πDept#,DName(DEPT)))
36
Elena Baralis, Silvia Chiusano
Pag. 18
Politecnico di Torino
18
Database Management Systems
Query optimization
Example: Query tree
Final query tree
πDName
πDept#
πDept#,DName
σSalary>1000 DEPT
EMP
37
DB
MG
Example: Cardinalities
Cardinality (EMP) ≈ 10,000
Cardinality (DEPT) ≈ 100
Cardinality (EMP where Salary > 1000) ≈ 50
DB
MG
38
Elena Baralis, Silvia Chiusano
Pag. 19
Politecnico di Torino
19