Full Table Scan

When a DBMS sees a query of the form like

WHERE <condition>

The obvious thing to do is read through the tuples of R and report these tuples that satisfy the condition.

This is called a Full Table Scan.

Selective Query

If we have to report 80% of the tuples in R, it makes sense to do a full table scan.

Otherwise, if the query is very selective, and returns just a small percentage of the tuples, we might hope to do better.

Point Query

Consider a selection query with a single equality in the condition

FROM person
WHERE birth_year=1975

This is a Point Query: We look for a single value for “year”. Point queries are easy if data is sorted by the right attribute.

Range Query

Consider a selection query of the form

FROM person
WHERE year(birthdate) BETWEEN 1975 and 1985

This is a Range Query: We look for a range of values of “birthdate”.

Range queries are also easy if data is sorted by the right attribute, but often not be as selective as point queries.

Index & Index Scan

  • To speed up queries the DBMS may build an Index on the year attribute.
  • A database index is similar to an index in the back of a book:
    • for every piece of data you might be interested in, the index says where to find it.
    • The index itself is organised such that one can quickly do the lookup.
  • Looking for information in a relation with the help of an index is called Index Scan.

Primary Index

Secondary Index

Multi-attribute Indexes

Index Scan vs Full Table Scan

Point and range queries on the attribute(s) of the Primary Index are almost always best performed using an index scan.

Secondary Index should be used with high selectivity queries: As a rule of thumb, a secondary index scan is faster than a full table scan for queries returning less than 10% of a relation.

Choosing to Use an Index

Drawbacks of Indexing

  • Space Usage
    • Small for primary index
    • Similar to data size for secondary index
  • Time usage for keeping indexes updated under data insertion & data change
    • Small to medium for primary index
    • High for secondary index
  • Some joins can be executed using a modest number of index lookups, maybe faster than looking at all data
  • Some queries may be executed by only looking at the information in the index
    • Index only queries execution plan (coving index)
    • May need to read much less data