Friday, January 18, 2008

SQL Performance

* One: only "tune" sql after code is confirmed as working correctly.

* Two: ensure repeated sql statements are written absolutely identically to facilate efficient reuse: re-parsing can often be avoided for each subsequent use.

Writing best practices: all sql verbs in upper-case i.e. SELECT; separate all words with a single space; all sql verbs begin on a new line; sql verbs aligned right or left within the initial verb; set and maintain a table alias standard; use table aliases and when a query involves more than one table prefix all column names with their aliases. Whatever you do, be consistent.

* Three: code the query as simply as possible i.e. no unnecessary columns are selected, no unnecessary GROUP BY or ORDER BY.

* Four: it is the same or faster to SELECT by actual column name(s). The larger the table the more likely the savings.
Use:
SELECT customer_id, last_name, first_name, street, city FROM customer; Rather than:
SELECT * FROM customer;

* Five: do not perform operations on DB objects referenced in the WHERE clause:
Use:
SELECT client, date, amount FROM sales WHERE amount > 0;
Rather than:
SELECT client, date, amount FROM sales WHERE amount!= 0;

* Six: avoid a HAVING clause in SELECT statements - it only filters selected rows after all the rows have been returned. Use HAVING only when summary operations applied to columns will be restricted by the clause. A WHERE clause may be more efficient.
Use:
SELECT city FROM country WHERE city!= 'Vancouver' AND city!= 'Toronto'; GROUP BY city;
Rather than:
SELECT city FROM country GROUP BY city HAVING city!= 'Vancouver' AND city!= 'Toronto';

* Seven: when writing a sub-query (a SELECT statement within the WHERE or HAVING clause of another sql statement):
-- use a correlated (refers to at least one value from the outer query) sub-query when the return is relatively small and/or other criteria are efficient i.e. if the tables within the sub-query have efficient indexes.
-- use a noncorrelated (does not refer to the outer query) sub-query when dealing with large tables from which you expect a large return (many rows) and/or if the tables within the sub-query do not have efficient indexes.
-- ensure that multiple sub-queries are in the most efficient order.
-- remember that rewriting a sub-query as a join can sometimes increase efficiency.

* Eight: minimise the number of table lookups especially if there are sub-query SELECTs or multicolumn UPDATEs.

* Nine: when doing multiple table joins consider the benefits/costs for each of EXISTS, IN, and table joins. Depending on your data one or another may be faster.
Note: IN is usually the slowest.
Note: when most of the filter criteria are in the sub-query IN may be more efficient; when most of the filter criteria are in the parent-query EXISTS may be more efficient.

* Ten: where possible use EXISTS rather than DISTINCT.

* Eleven: where possible use a non-column expression (putting the column on one side of the operator and all the other values on the other). Non-column expressions are often processed earlier thereby speeding the query.
Use:
WHERE SALES < 1000/(1 + n);
Rather than:
WHERE SALES + (n * SALES) < 1000;

* Twelve: the most efficient method for storing large binary objects, i.e. multimedia objects, is to place them in the file system and place a pointer in the DB.

* Thirteen: Use inner-joint rather than left/right/cross joint

* Fourteen: In most of cases, GROUP+HAVING < WHERE

No comments: