Database Management Systems Query optimization Database Management Systems Query optimization DB MG DBMS Architecture SQL INSTRUCTION OPTIMIZER CONCURRENCY CONTROL MANAGEMENT OF ACCESS METHODS RELIABILITY MANAGEMENT BUFFER MANAGER Index Files Data Files System Catalog DATABASE DB MG Elena Baralis, Silvia Chiusano Pag. 1 Politecnico di Torino 1 Database Management Systems Query optimization Query optimizer It selects an efficient strategy for query execution It is a fundamental building block of a relational DBMS It guarantees the data independence property The form in which the SQL query is written does not affect the way in which it is implemented A physical reorganization of data does not require rewriting SQL queries 3 DB MG Query optimizer It automatically generates a query execution plan It was formerly hard-coded by a programmer The automatically generated execution plan is usually more efficient It evaluates many different alternatives It exploits statistics on data, stored in the system catalog, to make decisions It exploits the best known strategies It dynamically adapts to changes in the data distribution DB MG 4 Elena Baralis, Silvia Chiusano Pag. 2 Politecnico di Torino 2 Database Management Systems Query optimization Query optimizer SQL QUERY LEXICAL, SYNTACTIC AND SEMANTIC ANALYSIS DB MG Lexical, syntactic and semantic analysis Analysis of a statement to detect Lexical errors e.g., misspelled keywords Syntactic errors errors in the grammar of the SQL language Semantic errors references to objects which do not actually exist in the database (e.g, attributes or tables) information in the data dictionary is needed DB MG 6 Elena Baralis, Silvia Chiusano Pag. 3 Politecnico di Torino 3 Database Management Systems Query optimization Lexical, syntactic and semantic analysis Output Internal representation in (extended) relational algebra Why relational algebra? It explicitly represents the order in which operators are applied It is procedural (different from SQL) There is a corpus of theorems and properties exploited to modify the initial query tree 7 DB MG Query optimizer SQL QUERY LEXICAL, SYNTACTIC AND SEMANTIC ANALYSIS DATA DICTIONARY INTERNAL REPRESENTATION BASED ON RELATIONAL ALGEBRA ALGEBRAIC OPTIMIZATION DB MG Elena Baralis, Silvia Chiusano Pag. 4 Politecnico di Torino 4 Database Management Systems Query optimization Algebraic optimization Execution of algebraic transformations considered to be always beneficial Example: anticipation of selection with respect to join Should eliminate the difference among different formulations of the same query This step is usually independent of the data distribution Output Query tree in “canonical” form 9 DB MG Query optimizer SQL QUERY LEXICAL, SYNTACTIC AND SEMANTIC ANALYSIS DATA DICTIONARY INTERNAL REPRESENTATION BASED ON RELATIONAL ALGEBRA ALGEBRAIC OPTIMIZATION “CANONICAL” QUERY TREE COST BASED OPTIMIZATION DB MG Elena Baralis, Silvia Chiusano Pag. 5 Politecnico di Torino 5 Database Management Systems Query optimization Cost based optimization Selection of the “best” execution plan by evaluating execution cost Selection of the best access method for each table the best algorithm for each relational operator among available alternatives Based on a cost model for access methods and algorithms Generation of the code implementing the best strategy 11 DB MG Cost based optimization Output Access program in executable format It exploits the internal structures of the DBMS Set of dependencies conditions on which the validity of the query plan depends e.g., the existence of an index DB MG 12 Elena Baralis, Silvia Chiusano Pag. 6 Politecnico di Torino 6 Database Management Systems Query optimization Query optimizer SQL QUERY LEXICAL, SYNTACTIC AND SEMANTIC ANALYSIS DATA DICTIONARY INTERNAL REPRESENTATION BASED ON RELATIONAL ALGEBRA ALGEBRAIC OPTIMIZATION “CANONICAL” QUERY TREE DATA PROFILES (STATISTICS ON DATA) COST BASED OPTIMIZATION DB MG ACCESS PROGRAM SET OF DEPENDENCIES Execution modes Compile and go Compilation and immediate execution of the statement No storage of the query plan Dependencies are not needed DB MG 14 Elena Baralis, Silvia Chiusano Pag. 7 Politecnico di Torino 7 Database Management Systems Query optimization Execution modes Compile and store The access plan is stored in the database together with its dependencies It is executed on demand It should be recompiled when the data structure changes 15 DB MG Database Management Systems Algebraic optimization DB MG Elena Baralis, Silvia Chiusano Pag. 8 Politecnico di Torino 8 Database Management Systems Query optimization Algebraic optimization SQL QUERY LEXICAL, SYNTACTIC AND SEMANTIC ANALYSIS DATA DICTIONARY INTERNAL REPRESENTATION BASED ON RELATIONAL ALGEBRA ALGEBRAIC OPTIMIZATION “CANONICAL” QUERY TREE COST BASED OPTIMIZATION DB MG ACCESS PROGRAM DATA PROFILES (STATISTICS ON DATA) SET OF DEPENDENCIES Algebraic optimization It is based on equivalence transformations Two relational expressions are equivalent if they both produce the same query result for any arbitrary database instance Interesting transformations reduce the size of the intermediate result to be stored in memory prepare an expression for the application of a transformation which reduces the size of the intermediate result DB MG 18 Elena Baralis, Silvia Chiusano Pag. 9 Politecnico di Torino 9 Database Management Systems Query optimization Transformations 1. Atomization of selection σF1 Ʌ F2 (E) ≡ σF2 (σF1 (E)) ≡ σF1 (σF2 (E)) 19 DB MG Transformations 1. Atomization of selection σF1 Ʌ F2 (E) ≡ σF2 (σF1 (E)) ≡ σF1 (σF2 (E)) 2. Cascading projections πX(E) ≡ πX (πX,Y(E)) DB MG 20 Elena Baralis, Silvia Chiusano Pag. 10 Politecnico di Torino 10 Database Management Systems Query optimization Transformations 1. Atomization of selection σF1 Ʌ F2 (E) ≡ σF2 (σF1 (E)) ≡ σF1 (σF2 (E)) 2. Cascading projections πX(E) ≡ πX (πX,Y(E)) 3. Anticipation of selection with respect to join (pushing selection down) σF (E1 E2) ≡ E1 (σF (E2)) F is a predicate on attributes in E2 only 21 DB MG Transformations 4. Anticipation of projection with respect to join πL(E1 p E2) ≡ πL ((πL1, J(E1) p(πL2,J(E2))) L1 = L - Schema(E2) L2 = L - Schema(E1) J = set of attributes needed to evaluate join predicate p DB MG 22 Elena Baralis, Silvia Chiusano Pag. 11 Politecnico di Torino 11 Database Management Systems Query optimization Transformations 5. Join derivation from Cartesian product σF (E1 × E2 ) ≡ E1 F E2 predicate F only relates attributes in E1 and E2 23 DB MG Transformations 5. Join derivation from Cartesian product σF (E1 × E2 ) ≡ E1 F E2 predicate F only relates attributes in E1 and E2 6. Distribution of selection with respect to union σF(E1 ∪ E2 ) ≡ (σF (E1) ) ∪ (σF (E2) ) DB MG 24 Elena Baralis, Silvia Chiusano Pag. 12 Politecnico di Torino 12 Database Management Systems Query optimization Transformations 5. Join derivation from Cartesian product σF (E1 × E2 ) ≡ E1 F E2 predicate F only relates attributes in E1 and E2 6. Distribution of selection with respect to union σF(E1 ∪ E2 ) ≡ (σF (E1) ) ∪ (σF (E2) ) 7. Distribution of selection with respect to difference σF(E1 – E2) ≡ (σF (E1) ) – (σF (E2) ) ≡ (σF (E1)) – E2 25 DB MG Transformations 8. Distribution of projection with respect to union πX(E1 ∪ E2) ≡ (πX(E1)) ∪ (πX(E2)) DB MG 26 Elena Baralis, Silvia Chiusano Pag. 13 Politecnico di Torino 13 Database Management Systems Query optimization Transformations 8. Distribution of projection with respect to union πX(E1 ∪ E2) ≡ (πX(E1)) ∪ (πX(E2)) Can projection be distributed with respect to difference? πX (E1 - E2) ≡ (πX(E1)) - (πX(E2)) 27 DB MG Transformations 8. Distribution of projection with respect to union πX(E1 ∪ E2) ≡ (πX(E1)) ∪ (πX(E2)) Can projection be distributed with respect to difference? πX (E1 - E2) ≡ (πX(E1)) - (πX(E2)) This equivalence only holds if X includes the primary key or a set of attributes with the same properties (unique and not null) DB MG 28 Elena Baralis, Silvia Chiusano Pag. 14 Politecnico di Torino 14 Database Management Systems Query optimization Transformations 9. Other properties σF1 V F2(E) ≡ (σF1 (E)) ∪ (σF2 (E)) σF1 Ʌ F2(E) ≡ (σF1 (E)) ∩ (σF2 (E)) 29 DB MG Transformations 10. Distribution of join with respect to union E (E1 ∪ E2) ≡ (E E1) ∪ (E E2) All Binary operators are commutative and associative except for difference DB MG 30 Elena Baralis, Silvia Chiusano Pag. 15 Politecnico di Torino 15 Database Management Systems Query optimization Example Tables EMP (Emp#, ………, Dept#, Salary) DEPT (Dept#, DName,……………) SQL query SELECT DISTINCT DName FROM EMP, DEPT WHERE EMP.Dept#=DEPT.Dept# AND Salary > 1000; 31 DB MG Example: Algebraic transformations πDName (σEMP.Dept#=DEPT.Dept# Ʌ Salary DB MG >1000 (EMP × DEPT)) 32 Elena Baralis, Silvia Chiusano Pag. 16 Politecnico di Torino 16 Database Management Systems Query optimization Example: Algebraic transformations πDName (σEMP.Dept#=DEPT.Dept# Ʌ Salary >1000 (EMP × DEPT)) Prop #1 πDName(σSalary >1000 (σEMP.Dept#=DEPT.Dept# (EMP×DEPT)) 33 DB MG Example: Algebraic transformations πDName (σEMP.Dept#=DEPT.Dept# Ʌ Salary >1000 (EMP × DEPT)) Prop #1 πDName(σSalary >1000 (σEMP.Dept#=DEPT.Dept# (EMP×DEPT)) Prop #5 πDName(σSalary >1000 (EMP DB MG DEPT) 34 Elena Baralis, Silvia Chiusano Pag. 17 Politecnico di Torino 17 Database Management Systems Query optimization Example: Algebraic transformations πDName(σSalary >1000 (EMP DEPT) Prop #3 πDName(σSalary >1000 (EMP)) DEPT) 35 DB MG Example: Algebraic transformations πDName(σSalary >1000 (EMP DEPT) Prop #3 πDName(σSalary >1000 (EMP)) DEPT) Prop #2 and #4 πDName ((πDept# (σSalary >1000(EMP)) DB MG (πDept#,DName(DEPT))) 36 Elena Baralis, Silvia Chiusano Pag. 18 Politecnico di Torino 18 Database Management Systems Query optimization Example: Query tree Final query tree πDName πDept# πDept#,DName σSalary>1000 DEPT EMP 37 DB MG Example: Cardinalities Cardinality (EMP) ≈ 10,000 Cardinality (DEPT) ≈ 100 Cardinality (EMP where Salary > 1000) ≈ 50 DB MG 38 Elena Baralis, Silvia Chiusano Pag. 19 Politecnico di Torino 19