Skip to content

Retrieval Mastery


This document covers the cornerstone of SQL: the art of asking precise questions using the $\text{SELECT}$, $\text{FROM}$, and $\text{WHERE}$ clauses, and shaping the results with $\text{ORDER BY}$ and $\text{LIMIT}$.

1. The Anatomy of a Query: $\text{SELECT}$ and $\text{FROM}$

Section titled “1. The Anatomy of a Query: $\text{SELECT}$ and $\text{FROM}$”

At its core, a query has two parts: what you want to see ($\text{SELECT}$) and where to find it ($\text{FROM}$).

  • $\text{SELECT}$: Specifies the columns (attributes) you want to retrieve. This is known as projection.
  • $\text{FROM}$: Specifies the table containing those columns.

To select specific columns, you list them by name.

  • Syntax:
    SELECT column1, column2, ... FROM table_name;
  • Example: Get the name and price of all products.
    SELECT ProductName, Price FROM Products;

$\text{SELECT *}$ is a shorthand to retrieve all columns from a table. While convenient for quick, manual exploration, it is a dangerous anti-pattern in production code.

  • Interview Insight: An interviewer might ask why SELECT * is bad practice. Your answer should cover these three points:

    1. Performance: It forces the database to fetch more data than necessary, increasing disk I/O and network traffic between the database and your application.
    2. Readability & Maintainability: The code’s intent is unclear. A future developer (or you) won’t know which specific columns the application logic actually depends on.
    3. Fragility: If the table schema changes (e.g., a new column is added or column order changes), your application code, which expects a specific set of columns in a specific order, might break unexpectedly.
  • Example (Good Practice): Explicitly state the columns you need.

    -- GOOD: Resilient, performant, and clear
    SELECT ProductID, ProductName, Price, StockQuantity FROM Products;
    -- BAD: Avoid in application code
    SELECT * FROM Products;

Aliases, created with the $\text{AS}$ keyword (which is optional), let you rename columns in your result set. This is crucial for readability and for compatibility with application code (e.g., mapping results to a DTO).

  • Syntax:
    SELECT column_name AS alias_name FROM table_name;
  • Example: Present product data with more user-friendly names.
    SELECT
    ProductName AS "Product Name",
    StockQuantity AS "Items In Stock"
    FROM Products;

You can perform calculations and transformations directly in your SELECT clause.

  • Calculations: Perform arithmetic on numeric columns.

  • Scalar Functions: Functions that operate on a single value and return a single value.

  • Example: Generate a product report showing the total value of stock for each item and display the product name in uppercase.

    SELECT
    UPPER(ProductName) AS "Product Name",
    Price,
    StockQuantity,
    Price * StockQuantity AS "Total Stock Value"
    FROM Products;
    • UPPER() is a string scalar function.
    • Price * StockQuantity is a calculated column.

The $\text{DISTINCT}$ keyword eliminates duplicate rows from your result set, showing only the unique combinations of the selected columns.

  • Syntax:
    SELECT DISTINCT column_name FROM table_name;
  • Example: Get a list of all unique Category IDs that actually have products associated with them.
    -- This will return each CategoryID only once, even if multiple products belong to it.
    SELECT DISTINCT CategoryID FROM Products;

The $\text{WHERE}$ clause is the most powerful part of a basic query. It filters the rows from the table, ensuring that only those meeting your specific criteria are returned. This is known as selection.

These are the fundamental building blocks of filtering.

OperatorMeaningExample Query
=Equal toSELECT * FROM Products WHERE CategoryID = 1; (Find all Electronics)
!= or <>Not equal toSELECT * FROM Orders WHERE Status != 'Delivered'; (Find incomplete orders)
>Greater thanSELECT * FROM Products WHERE Price > 100.00; (Find expensive products)
>=Greater than or equal toSELECT * FROM Products WHERE StockQuantity >= 50;
<Less thanSELECT * FROM Customers WHERE RegistrationDate < '2024-01-01';

B. Logical Operators: Combining Conditions (AND, OR, NOT)

Section titled “B. Logical Operators: Combining Conditions (AND, OR, NOT)”
  • $\text{AND}$: All conditions must be true.
  • $\text{OR}$: At least one condition must be true.
  • $\text{NOT}$: Reverses the result of a condition.

Interview Gold: Operator Precedence In SQL, $\text{AND}$ has higher precedence than $\text{OR}$. This means all AND clauses are evaluated before any OR clauses. This is a common source of bugs and a favorite interview topic. Always use parentheses () to enforce the order you intend.

  • Example: “Find all products that are either in Category 1 (Electronics) and cost more than $500, or have a stock quantity of 0.”

    -- THIS QUERY IS WRONG AND WILL PRODUCE INCORRECT RESULTS
    -- It finds (Electronics > $500) OR (Stock = 0), which is not the intent.
    SELECT ProductName, Price, StockQuantity, CategoryID
    FROM Products
    WHERE CategoryID = 1 AND Price > 500.00 OR StockQuantity = 0;
    -- THIS QUERY IS CORRECT because parentheses enforce the intended logic
    -- It finds Electronics that are ALSO expensive, OR finds any product that is out of stock.
    SELECT ProductName, Price, StockQuantity, CategoryID
    FROM Products
    WHERE (CategoryID = 1 AND Price > 500.00) OR (StockQuantity = 0);

These provide a more readable and often more performant syntax for common filtering tasks.

  • $\text{BETWEEN … AND …}$: Checks if a value is within an inclusive range.
    • Example: Find products with a price between $50 and $100.
      -- Clean and readable
      SELECT ProductName, Price FROM Products WHERE Price BETWEEN 50.00 AND 100.00;
      -- Equivalent, but more verbose
      SELECT ProductName, Price FROM Products WHERE Price >= 50.00 AND Price <= 100.00;
  • $\text{IN (…)}$: Checks if a value matches any value in a provided list.
    • Example: Find all orders with a status of ‘Pending’ or ‘Processing’.
      -- Clean and efficient
      SELECT OrderID, Status FROM Orders WHERE Status IN ('Pending', 'Processing');
      -- Equivalent, but scales poorly for long lists
      SELECT OrderID, Status FROM Orders WHERE Status = 'Pending' OR Status = 'Processing';

The LIKE operator is used for simple pattern matching in string columns, using two special wildcards:

  • %: Matches any sequence of zero or more characters.

  • _: Matches any single character.

  • Example: “Find any customer whose first name starts with ‘J’”.

    SELECT FirstName, LastName FROM Customers WHERE FirstName LIKE 'J%';
  • Example: “Find any product with ‘book’ anywhere in its name”.

    SELECT ProductName FROM Products WHERE ProductName LIKE '%book%';
    • Performance Note: Queries with a leading wildcard (%...) are often slow because they cannot use a standard B-Tree index effectively and may result in a full table scan.

E. The Critical Importance of Handling NULL

Section titled “E. The Critical Importance of Handling NULL”

This is one of the most important concepts for a SQL professional. NULL does not mean zero or an empty string; it means “unknown” or “missing data”.

  • The Trap: You cannot use standard comparison operators (=, !=) with NULL. Any comparison to NULL results in UNKNOWN, which the WHERE clause treats as false.

  • The Correct Way: You must use the special operators $\text{IS NULL}$ and $\text{IS NOT NULL}$.

  • Example: We added a nullable PhoneNumber column to Customers. Let’s find all customers who have not provided a phone number.

    -- INCORRECT: This will return 0 rows, ALWAYS.
    SELECT CustomerID, FirstName FROM Customers WHERE PhoneNumber = NULL;
    -- CORRECT: This will find all rows where PhoneNumber is missing.
    SELECT CustomerID, FirstName FROM Customers WHERE PhoneNumber IS NULL;

After selecting and filtering your data, these clauses give you control over the final presentation.

  • $\text{ORDER BY}$: Sorts the result set based on one or more columns.
    • $\text{ASC}$: Ascending order (A-Z, 1-10). This is the default.
    • $\text{DESC}$: Descending order (Z-A, 10-1).
  • $\text{LIMIT}$: Restricts the number of rows returned.

You can sort by multiple columns. The data is sorted by the first column, and then for any rows with the same value in the first column, they are sorted by the second column, and so on.

  • Example: Get all products, ordered first by CategoryID, and then by Price from highest to lowest within each category.
    SELECT ProductName, CategoryID, Price
    FROM Products
    ORDER BY CategoryID ASC, Price DESC;

This combination is the foundation of pagination in virtually every backend application.

  • $\text{LIMIT number}$: Returns only the first number of rows.

  • $\text{LIMIT number OFFSET offset}$: Skips the first offset rows, and then returns the next number of rows.

  • Example: In our application, we want to display products 10 per page.

    • To get Page 1:
      -- Skip 0 rows, get 10
      SELECT ProductID, ProductName, Price FROM Products ORDER BY ProductName LIMIT 10 OFFSET 0;
    • To get Page 2:
      -- Skip the first 10 rows, get the next 10
      SELECT ProductID, ProductName, Price FROM Products ORDER BY ProductName LIMIT 10 OFFSET 10;

This concludes our deep dive into the foundational retrieval clauses. Master these patterns, understand the nuances of NULL and operator precedence, and you will be well-equipped to write the majority of queries required in professional development.