The dataset consists of 8 columns and their respective description is as follows:
Day of Month
The specified day of the month
Day of Week
The specified day of the week
Reporting Carrier
Unique Carrier Code. When the same code has been used by multiple carriers, a numeric suffix is used for earlier users, for example, PA, PA(1), PA(2). Use this field for analysis across a range of years.
Origin
Origin Airport
Dest
Destination Airport
Departure Delay Minutes
Difference in minutes between scheduled and actual departure time. Early departures set to 0.
Arrival Delay Minutes
Difference in minutes between scheduled and actual arrival time. Early arrivals set to 0.
Weather Delay
Weather delay in minutes
Querying data stored in HDFS with HIVE
Problem statement :
Write an HQL statement to list all flights whose departure delay time is greater than the average departure delay time and show how much their delay time is greater than the average.
Sample Solution