The National Heart, Lung, and Blood Institute (NHLBI) created a teaching dataset that includes real but anonymized data collected as part of the Framingham Heart Study. The Framingham Heart Study is one of the most influential and longest running epidemiological studies of risk factors for cardiovascular disease ever run. The study started in 1948 and continues today to collect extensive data from original participants, their children, and their children’s children. Much of what we know about cardiovascular disease was discovered by investigators involved in the Framingham Heart Study. In fact, studies to date using data collected in the Framingham Heart study have resulted in over 3000 publications in high impact, peer-reviewed medical journals.
The Framingham Heart Study has been widely discussed in the media. WGBH in Boston produced a video documentary for PBS entitled “The Hidden Epidemic: Heart Disease in America” that details the history of heart disease in this country and highlights the Framingham Heart Study. In 2007, CBS News did a story on the study, its participants, and its impact. Additionally, research results from the Framingham Heart Study are communicated widely, most recently highlighting the discovery of a gene that may promote obesity and new data showing declining rates of dementia. Interested readers can visit the Framingham Heart Study website for a detailed history of this incredible study and its many contributions to preventive medicine.

Datasets for Analysis
NHLBI created a longitudinal teaching dataset includes clinical, laboratory, and outcome data on n = 4434 participants. Each participant has between one and three observations—which represent examinations held approximately 6 years apart. There are a total of 11,627 observations in the full dataset. A detailed description of the Framingham Heart Study dataset and other public use datasets available from NHLBI are available on the NHLBI Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC) website.
Two datasets are available for analysis here—one is the complete dataset with n = 11,627 observations (or person-exams), and the second includes only data collected at the first examination for each participant (n = 4434). The two datasets are available as comma separated values (.csv) files for analysis in Excel, R, or other statistical computing packages. FHS-All.csv contains n = 11,627 observations and FHS-Exam1.csv contains n = 4434 observations.
Variables
The following variables are available in each dataset for analysis (extracted from the complete documentation file, available on the NHLBI BioLINCC website ).
Variable Name Description Coding Details/Range
RANDID Unique identification number for each participant 2248-9999312
SEX Participant sex 1 = Male, 2 = Female
PERIOD Exam cycle 1, 2, 3
TIME Number of days since first (baseline) exam 0–4854
AGE Age at exam, years 32–81
SYSBP Systolic blood pressure, mmHg 83–295
DIABP Diastolic blood pressure, mmHg 30–150
BPMEDS Use of anti-hypertensive medication 0 = No, 1 = Yes
CURSMOKE Currently smoking cigarettes 0 = No, 1 = Yes
CIGPDAY Number of cigarettes smoked per day 0 (non-smoker)–90
TOTCHOL Total serum cholesterol, mg/dL 107–696
HDLC* High density lipoprotein cholesterol, mg/dL 10–189
LDLC* Low density lipoprotein cholesterol, mg/dL 20–565
BMI Body mass index = weight (kg)/height (m)2 14–57
GLUCOSE Serum glucose, mg/dL 39–478
DIABETES Diabetes (glucose > 200 mg/dL or on treatment) 0 = No, 1 = Yes
HEARTRTE Heart rate, beats/minute 37–220
PREVAP Prevalent angina pectoris 0 = No, 1 = Yes
PREVCHD Prevalent coronary heart disease (CHD) 0 = No, 1 = Yes
PREVMI Prevalent myocardial infarction (MI) 0 = No, 1 = Yes
PREVSTRK Prevalent stroke 0 = No, 1 = Yes
PREVHYP Prevalent hypertension 0 = No, 1 = Yes
The following are outcome events coded 1 if the event occurred during the
follow-up (only the first event is recorded).
ANGINA Angina pectoris 0 = No, 1 = Yes
HOSPMI Hospitalized for MI 0 = No, 1 = Yes
MI_FCHD Hospitalized for MI or fatal CHD 0 = No, 1 = Yes
ANYCHD Any coronary heart disease event 0 = No, 1 = Yes
STROKE Stroke 0 = No, 1 = Yes
CVD Cardiovascular disease 0 = No, 1 = Yes
HYPERTEN Hypertension 0 = No, 1 = Yes
DEATH Death from any cause 0 = No, 1 = Yes
The following are numbers of days from the first (baseline) exam to the first event
during the follow-up. If no event occurred, time is end of follow-up,
death, or last known contact date.
TIMEAP Time from baseline to first angina
TIMEMI Time from baseline to first myocardial infarction
TIMEMIFC Time from baseline to first MI or fatal CHD
TIMECHD Time from baseline to first CHD
TIMESTRK Time from baseline to first stroke
TIMECVD Time from baseline to first cardiovascular disease
TIMEHYP Time from baseline to first hypertension
TIMEDTH Time from baseline to death
*Available only at period = 3 exam, missing otherwise

Design, conduct and summarize results of the analyses outlined below using data collected in the Framingham Heart Study using FHS-Exam1, the dataset that includes one observation per participant.
Analytic approaches and coding for solutions are detailed in the Excel file

  1. Describe the study sample.
    Complete the following table to describe the study sample using data collected at the first examination for each participant (n = 4434). Summarize your results in three to four sentences.
    Patient Characteristic* Total Sample (n = 4434)
    Age, years
    Male sex
    Systolic blood pressure, mmHg
    Hypertension
    Use of anti-hypertensive medication
    Current smoker
    Total serum cholesterol, mg/dL
    Serum Glucose
    Stroke
  • Mean (Standard deviation) or n (%)
  1. Compare risk factors in men and women.
    Complete the following table to compare men and women using data collected at the first examination for each participant (n = 4434). Summarize your results in three to four sentences.
    Patient Characteristic* Men (n = 1944) Women (n = 2490)
    Age, years
    Systolic blood pressure, mmHg
    Hypertension
    Use of anti-hypertensive medication
    Current smoker
    Total serum cholesterol, mg/dL
    Serum Glucose
    Stoke
  • Mean (Standard deviation) or n (%)
  1. What characteristics are associated with serum Glucose?
    Use simple and multivariable linear regression analysis to complete the following table relating the characteristics listed to Glucose as a continuous variable. Before conducting the analysis, be sure that all participants have complete data on all analysis variables. If participants are excluded due to missing data, the numbers excluded should be reported. Then, describe how each characteristic is related to Glucose level. Are crude and multivariable effects similar? What might explain or account for any differences?
    Outcome Variable: Glucose mg/dL

Characteristic Regression Coefficient
Crude Models p-value Regression Coefficient
Multivariable Model p-value
Age, years
Male sex
Systolic blood pressure, mmHg
Total serum cholesterol, mg/dL
Current smoker
Diabetes

  1. Who is most likely to have prevalent coronary heart disease?
    Test if there are significant differences in the following risk factors between persons with and without prevalent coronary heart disease (CHD). Summarize the statistical results in the table below and then compare risk factors in persons with and without prevalent CHD. Be sure to indicate what statistical tests were used in the footnote to the table and in a brief summary of a paragraph or less.

Patient Characteristic* History of CHD
(n = 194) No History of CHD (n = 4240) p-value*
Age, years
Systolic blood pressure, mmHg
Diastolic blood pressure, mmHg
Total serum cholesterol, mg/dL
Body mass index

  • Mean (Standard deviation). P-values are based on two independent samples t tests.
  1. Describe the Data collection Methods used in the study
  2. Provide a Summary of the data analysis as a whole . Summary should include pertinent information of all the questions above. This summary should not exceed 300 words.

Sample Solution

This question has been answered.

Get Answer