Fly to minimise delays

  The 2009 ASA Statistical Computing and Graphics Data Expo consisted of flight arrival and departure details for all commercial flights on major carriers within the USA, from October 1987 to April 2008. This is a large dataset; there are nearly 120 million records in total, and takes up 1.6 gigabytes of space compressed and 12 gigabytes when uncompressed. The complete dataset along with supplementary information and variable descriptions can be downloaded from the Harvard Dataverse at https://doi.org/10.7910/DVN/HG7NV7 Choose any subset of (at least two) consecutive years and any of the supplementary information provided by the Harvard Dataverse to answer the following questions using the principles and tools you have learned in this course: 1. When is the best time of day, day of the week, and time of year to fly to minimise delays? 2. Do older planes suffer more delays? 3. How does the number of people flying between different locations change over time? 4. Can you detect cascading failures as delays in one airport create delays in others? 5. Use the available variables to construct a model that predicts delays.    
In order to identify when is the best time of day, day of the week, and time of year to fly in order minimize delays we can start by looking at how average delays are distributed across various categories such as hour day month . Through data visualization techniques like box-whisker plots histograms scatterplots could get better understanding which factors tend lead most consistent least varying levels flights’ timings compare performance different airline companies during certain seasons. We then take closer look relationship between age planes (measured number cycles flown) rate delays they experience over period sample. This would involve further analyzing data set plotting graph showing relationship between these two variables find out more insight into correlation any exists between them. Next , create map display movement passengers flights originating different parts country based on available variables mentioned above determine whether there noticeable fluctuations over course sample could related changing economic climate political situation fluctuation procurement cost fuel etc.. Moreover we can conduct detailed investigations reveal potential cascading failures whereby delays one airport result same occurring another station due chain reaction caused ripple effect around system . Finally , use all relevant parameters dataset construct predictive model able accurately predict pote​‌‍‍‍‌‍‍‌‍‌��̶̷͇͇͟n​tial level delays given combinations origin destination hours days months taking into account seasonal trends changes external environment as well weather conditions which also common cause disturbances air traffic control operations.

Sample Solution

The 2009 ASA Statistical Computing and Graphics Data Expo consists of flight arrival and departure details for all commercial flights on major carriers within the USA, from October 1987 to April 2008. For this analysis I have chosen a subset of consecutive years from 2006-2007. Additionally, I will be using supplementary information such as origin/destination airports, scheduled departure/arrival times and delay durations among other variables provided by the Harvard Dataverse.