DDB Units:2 Notes

https://drive.google.com/file/d/1uMepNKxfuQb3YtWFlGu4q5mdY9qzeTRZ/view?usp=sharing











Query processing in a Distributed Database Management System (D-DBMS) is the process of transforming a high-level user query (like SQL) into an efficient, low-level execution strategy across multiple sites. This complex process is broken down into three main steps: Parsing and Translation, Optimisation, and Evaluation. The objective is to minimise a cost function, typically the total cost or response time, by efficiently managing local processing (CPU and I/O costs) and inter-site communication costs.

Query processing is layered, beginning with Query Decomposition to translate the query into relational algebra. Next, Data Localisation maps the algebraic query to a query on physical fragments, using data distribution information. Finally, Global Query Optimisation selects the best execution strategy, considering the ordering of relational algebra operators and communication primitives to minimise cost. The effectiveness of optimisation relies heavily on database statistics and cost models.

Here are 5 key bullet points of the specific topics covered:
  • Query Decomposition: The first layer of query processing that transforms a relational calculus (SQL) query into a relational algebra query on global relations, involving steps like normalisation, analysis, elimination of redundancy, and rewriting.
  • Data Localisation: The layer that determines which fragments are involved in a query and translates the distributed query from global relations into a query on fragments, utilising data distribution information and the fragment schema.
  • Distributed Query Optimisation: The process of finding an execution strategy (ordering of relational algebra operators and communication operators) that minimises a cost function, considering additional complexities like site selection and data exchange between sites.
  • Query Processing Objectives: The primary goals for processing a query in a distributed context are to transform a high-level query into an efficient execution strategy and to minimise a cost function, typically total cost or response time.
  • Semijoins: An operation particularly useful in distributed query processing for improving distributed join operations by reducing the size of data exchanged between sites, as it reduces the size of the operand relation.

Comments