Data Science/SQL

[Udemy] 15 Days of SQL - Day 5 : CASE WHEN, CAST, COALESCE, REPLACE

kyra 2024. 2. 26. 21:50

1. Mathematical Functions and Operators

  • Syntax
SELECT
film_id,
rental_rate as old_rental_rate,
CEILING(rental_rate * 1.1) as new_rental_rate
FROM film
 

9.3. Mathematical Functions and Operators

9.3. Mathematical Functions and Operators # Mathematical operators are provided for many PostgreSQL types. For types without standard mathematical conventions (e.g., …

www.postgresql.org

 

* Challenge

Write a query to create a list of the films including the relation of rental rate/ replacement cost where the rental rate is less than 4% of the replacement cost.

Create a list of that film_ids together with the percentage rounded to 2 decimal places.

SELECT
film_id,
ROUND(rental_rate / replacement_cost * 100, 2) AS percentage
FROM film
WHERE ROUND(rental_rate / replacement_cost * 100, 2) < 4
ORDER BY 2 ASC


2. CASE WHEN

  • Like If/Then statement, goes through a set of conditions and returns a value if a condition is met
  • Start with 'CASE' and end with 'END'
  • Syntax
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
WHEN condition3 THEN result3
ELSE result
END

 

SELECT
actual_departure-scheduled_departure,
CASE
WHEN actual_departure is null THEN 'no departure time',
WHEN actual_departure-scheduled_departure < '00:05' THEN 'On time'
WHEN actual_departure-scheduled_departure < '01:00' THEN 'A little late'
ELSE 'Late'
END
FROM flights

 

* Challenge

1) Write a query to find out how many tickets you have sold in the following categories:

- Low price ticket : total_amount < 20,000

- Mid price ticket : total_amount between 20,000 and 150,000

- High price ticket: total_amount >= 150,000

How many high price tickets has the company sold?

1) 

SELECT
COUNT(*),
CASE
WHEN total_amount < 20000 THEN 'low price ticket'
WHEN total_amount < 150000 THEN 'mid price ticket'
WHEN total_amount > 150000 THEN 'high price ticket'
END as ticket_price
FROM bookings
GROUP BY ticket_price



2) model answer

SELECT ticket_price, count(1)
FROM(SELECT
book_ref,
CASE
WHEN total_amount < 20000 THEN 'low price ticket'
WHEN total_amount < 150000 THEN 'mid price ticket'
ELSE 'high price ticket'
END as ticket_price
FROM bookings
) a
GROUP BY ticket_price;

 

* Tips

In terms of readability and simplicity, the second approach (directly categorizing data within the main query using CASE statements) is often preferred for straightforward categorizations. It is easier to understand and maintain, especially for simpler scenarios.
On the other hand, using subqueries (like in the first approach) can be advantageous when the categorization logic is complex or when the categorization needs to be reused in multiple queries.

 

 

2) Find out how many flights have departed in the following seasons:

- Winter : December, January, February

- Spring : March, April, May

- Summer : June, July, August

- Fall : September, October, November

SELECT
COUNT(scheduled_departure),
-- COUNT(*) is also available as we are using EXTRACT and specify 'scheduled_departure' in further lines.
CASE
WHEN EXTRACT(month from scheduled_departure) IN (12, 01, 02) THEN 'Winter'
WHEN EXTRACT(month from scheduled_departure) IN (03, 04, 05) THEN 'Spring'
WHEN EXTRACT(month from scheduled_departure) IN (06, 07, 08) THEN 'Summer'
WHEN EXTRACT(month from scheduled_departure) IN (09, 10, 11) THEN 'Fall'
END as season
FROM flights
GROUP BY season

 

3) Create a tier list in the following way:

1. Rating is 'PG' or 'PG-13' or length is more than 210 min: 'Great rating or long(tier 1)'

2. Description contains 'Drama' and length is more than 90 min: 'Long drama(tier 2)'

3. Description contains 'Drama' and length is not more than 9 min: 'Short drama(tier 3)'

4. Rental_rate less than $1: 'Very cheap(tier 4)'

If one movie can be in multiple categories it gets the higher tier assigned.

How can you filter to only those movies that appear in one of these 4 tiers?

SELECT
title,
CASE
WHEN rating IN ('PG','PG-13') OR length > 210 THEN 'Great rating or long (tier 1)'
WHEN description LIKE '%Drama%' AND length>90 THEN 'Long drama (tier 2)'
WHEN description LIKE '%Drama%' THEN 'Short drama (tier 3)'
WHEN rental_rate<1 THEN 'Very cheap (tier 4)'
END as tier_list
FROM film
WHERE 
CASE
WHEN rating IN ('PG','PG-13') OR length > 210 THEN 'Great rating or long (tier 1)'
WHEN description LIKE '%Drama%' AND length>90 THEN 'Long drama (tier 2)'
WHEN description LIKE '%Drama%' THEN 'Short drama (tier 3)'
WHEN rental_rate<1 THEN 'Very cheap (tier 4)'
END is not null

--WHERE case is not null cannot be used here as AS doesn't affect WHERE clause
-- try using NULLIF / NVL / COALESCE functions


3. CASE WHEN & SUM

  • Like If/Then statement, goes through a set of conditions and returns a value if a condition is met
  • Start with 'CASE' and end with 'END'
SELECT
SUM(CASE
WHEN rating in ('PG', 'G') THEN 1
ELSE 0
END) AS no_of_ratings_with_g_or_pg
FROM film

SELECT
SUM(CASE WHEN rating in ('G') THEN 1 ELSE 0 END) AS "G",
SUM(CASE WHEN rating in ('R') THEN 1 ELSE 0 END) AS "R",
SUM(CASE WHEN rating in ('NC-17') THEN 1 ELSE 0 END) AS "NC-17",
SUM(CASE WHEN rating in ('PG-13') THEN 1 ELSE 0 END) AS "PG-13",
SUM(CASE WHEN rating in ('PG') THEN 1 ELSE 0 END) AS "PG"
FROM film


4. COALESCE

  • Returns first value of list of values which is not null
  • Syntax
COALESCE(actual_arrival, scheduled_arrival)

COALESCE(actual_arrival, '1970-01-01 0:00') -- particular values can also be used, if both data types are the same.

5. CAST

  • Changes the data type of a value
  • Syntax
CAST(scheduled_arrival AS data type)

CAST(scheduled_arrival AS VARCHAR) -- change the data type to text

CAST(schedued_arrival AS DATE) -- timestamp to date
SELECT
COALESCE(CAST(actual_arrival-scheduled_arrival AS VARCHAR), 'not arrived')
FROM flights

 

null values have changed to 'not arrived'

 

 

 

 

 

 

 

 

 


6. REPLACE

  • Replaces text from a string in a column with another text
  • can be used to remove unnecessary blanks or letters
  • Syntax
REPLACE(flight_no, 'PG', '')

 

* delete blank space

SELECT
passenger_id,
REPLACE(passenger_id, ' ', ''), -- delete blanks
CAST(REPLACE(passenger_id, ' ', '') AS BIGINT)
FROM tickets

 

* remove letters

SELECT
flight_no,
CAST(REPLACE(flight_no, 'PG', '') AS INT)
FROM flights