, you are tasked with undertaking a country case study, using the data visualization, data
summary, and univariate and bivariate data analysis techniques covered in class.
To complete this assignment, you are requested to identify a country (Colombia) of interest, and undertake the
following analyses:
(1) Time Series Analysis:
a. For your country of interest, find, download and properly format a time-series dataset
with at least four variables and at least 15 years of data [Note: it is ok if some of your
variables are missing observations in some years].
b. Generate a table of descriptive statistics for this dataset, as well as a correlation matrix.
c. Based on your descriptive statistics and correlation matrix, come up with three
interesting ways to visualize (i.e. plot) your time-series data. In each case, make sure you
are demonstrating something essential about the distribution of and/or relationship
between one or several variables in your dataset.
d. In 750 words or less, describe your time-series dataset, including:
i. Where you obtained your data, and any challenges you had in formatting it.
ii. What you observe in your table of descriptive statistics, and what the correlation
matrix tells you about the relationships in your data. E.g. Are there any concerns
with the distribution of specific variables? Which variables are more or less
strongly related to each other? What aspects of your dataset would you like to
explore further?
iii. Why you chose the three particular plots that you have provided, and what they
convey about your data. Briefly describe what each plot is intended to illustrate,
pointing out key features about the dataset that they are intended to visualize.
Cross-Section Analysis:
a. For your country of interest, now select a group of comparator countries and find,
download and properly format a cross-section dataset with at least four variables for a
particular year of your choosing. Ideally, you will select a grouping of at least 20
countries. Again, it is ok if some of your countries are missing observations for some of
your variables. Possible suggestions for country groupings might include:
2
i. Selecting countries in the same region or on the same continent;
ii. Selecting according to the World Bank’s Region, Income, or Lending Groups, or
utilizing the UN Development Programme’s four levels of human development as
defined in their 2020 Human Development Report;
iii. Selecting by governance type (e.g. according to the Polity IV index we reviewed in
class), or by membership in groups such as the OECD or the EU;
iv. or use some other clearly defined country grouping.
b. Generate a table of descriptive statistics for this dataset, as well as a correlation matrix.
c. Based on your descriptive statistics and correlation matrix, come up with three
interesting ways to visualize (i.e. plot) your cross-section data. In each case, make sure
you are demonstrating something essential about the distribution of and/or relationship
between one or several variables in your dataset.
d. In 750 words or less, describe your cross-section dataset, including: i. Where you obtained your data, and any challenges you had in formatting it.
ii. What you observe in your table of descriptive statistics, and what the correlation
matrix tells you about the relationships in your data. E.g. Are there any concerns
with the distribution of specific variables? Which variables are more or less
strongly related to each other? What aspects of your dataset would you like to
explore further?
iii. Why you chose the three particular plots that you have provided, and what they
convey about your data. Briefly describe what each plot is intended to illustrate,
pointing out key features about the dataset that they are intended to visualize.
To Submit your Report: You are asked to upload three separate files on the Brightspace page for
Assignment 1. These should include:
- An Excel workbook for your time-series dataset. This will include a tab with your formatted
dataset, and separate tabs/sheets your table of descriptive statistics, your correlation matrix, and
each of your three plots (your time-series workbook should therefore have 6 tabs total).