Call Us: US - +1 845 478 5244 | UK - +44 20 7193 7850 | AUS - +61 2 8005 4826

Building and Load Testing New SQL Servers

Performance Tuning SQL Queries

Starting here? This lesson is part of a full-length tutorial in using SQL for Data Analysis. Check out the beginning.

The lesson on subqueries introduced the idea that you can sometimes create the same desired result set with a faster-running query. In this lesson, you’ll learn to identify when your queries can be improved, and how to improve them.

The theory behind query run time

A database is a piece of software that runs on a computer, and is subject to the same limitations as all software—it can only process as much information as its hardware is capable of handling. The way to make a query run faster is to reduce the number of calculations that the software (and therefore hardware) must perform. To do this, you’ll need some understanding of how SQL actually makes calculations. First, let’s address some of the high-level things that will affect the number of calculations you need to make, and therefore your querys runtime:

  • Table size: If your query hits one or more tables with millions of rows or more, it could affect performance.
  • Joins: If your query joins two tables in a way that substantially increases the row count of the result set, your query is likely to be slow. There’s an example of this in the subqueries lesson.
  • Aggregations: Combining multiple rows to produce a result requires more computation than simply retrieving those rows.

Query runtime is also dependent on some things that you can’t really control related to the database itself:

  • Other users running queries: The more queries running concurrently on a database, the more the database must process at a given time and the slower everything will run. It can be especially bad if others are running particularly resource-intensive queries that fulfill some of the above criteria.
  • Database software and optimization: This is something you probably can’t control, but if you know the system you’re using, you can work within its bounds to make your queries more efficient.

For now, let’s ignore the things you can’t control and work on the things you can.

Reducing table size

Filtering the data to include only the observations you need can dramatically improve query speed. How you do this will depend entirely on the problem you’re trying to solve. For example, if you’ve got time series data, limiting to a small time window can make your queries run much more quickly:

  FROM benn.sample_event_table
 WHERE event_date >= '2014-03-01'
   AND event_date <  '2014-04-01'

Keep in mind that you can always perform exploratory analysis on a subset of data, refine your work into a final query, then remove the limitation and run your work across the entire dataset. The final query might take a long time to run, but at least you can run the intermediate steps quickly.