Psychological Testing: A Practical Approach to Design and Evaluation. Thousand Oaks, CA: Sage Publications
Here's how we would analyze our self-esteem measure using CTT:
- Reliability: CTT emphasizes the reliability of the entire test. We would want to know how consistently our measure yields similar scores over time (test-retest reliability), how internally consistent the items are within the test (internal consistency reliability, often measured using Cronbach's alpha), or how much agreement there is between different administrations or raters (parallel forms or inter-rater reliability, if applicable). A high reliability coefficient (e.g., above 0.70 or 0.80, depending on the context) would suggest that a larger proportion of the variance in observed scores is due to true self-esteem rather than error.
- Validity: CTT also addresses the validity of the test – whether it measures what it is intended to measure (self-esteem). We might examine:
- Content Validity: Do the items on the measure adequately represent the domain of self-esteem? This is often a subjective judgment by experts.
- Criterion-Related Validity: Does the test score correlate with other measures that it theoretically should (concurrent validity) or predict future outcomes related to self-esteem (predictive validity)? For example, we might correlate our self-esteem scores with measures of social anxiety or academic performance.
- Construct Validity: Does the test score relate to other constructs in a way that aligns with the theoretical understanding of self-esteem? This often involves examining correlations with related (convergent validity) and unrelated (discriminant validity) constructs.
- Item Analysis (in a limited way): CTT can involve some basic item analysis, such as calculating the item difficulty (the proportion of respondents who endorse the item in a particular direction) and item discrimination (the extent to which an item differentiates between individuals with high and low overall test scores). For example, we might look at the correlation between an individual item score and the total test score. However, CTT item statistics are sample-dependent; they can change depending on the characteristics of the group taking the test.
- Standard Error of Measurement (SEM): CTT provides an estimate of the SEM, which represents the average amount of error in an individual's observed score. This helps us create a confidence interval around an individual's score to estimate the range within which their true score likely falls.
Item Response Theory (IRT) Perspective:
IRT offers a more sophisticated approach by focusing on the individual items within the self-esteem measure rather than just the overall test score. IRT models the probability of a specific response to an item as a mathematical function of the individual's underlying trait level (self-esteem in this case) and certain item characteristics.
Here's how we would analyze our self-esteem measure using IRT:
- Item Parameters: IRT estimates item parameters that are assumed to be invariant across different groups of respondents (a key advantage over CTT). Common parameters include:
- Difficulty (b parameter): This indicates the level of self-esteem at which an individual has a 50% probability of endorsing the item in a particular direction (e.g., agreeing with a positive self-esteem statement).
- Discrimination (a parameter): This indicates how well the item differentiates between individuals with different levels of self-esteem. A higher discrimination parameter means the probability of endorsing the item changes more rapidly as self-esteem levels change.
- Guessing (c parameter - less common in self-esteem scales): This represents the probability that an individual with very low self-esteem would still endorse the item (more relevant for multiple-choice tests where guessing is possible).
- Person Parameters (Trait Levels): IRT estimates the trait level (self-esteem) of each individual based on their pattern of responses to all the items. This trait level is typically represented on a continuous scale. Unlike CTT, where the score is simply the sum of items, IRT uses the item parameters to derive a more precise estimate of the underlying trait.
- Item Characteristic Curves (ICCs): IRT allows us to visualize the relationship between an individual's self-esteem level and the probability of endorsing each item through ICCs. These curves provide detailed information about how each item functions across the range of self-esteem.
- Test Information Function (TIF): IRT provides a TIF, which indicates the precision of the self-esteem measure at different levels of the trait. Unlike CTT's single reliability coefficient, the TIF shows that the test may be more reliable for individuals with certain levels of self-esteem than others.
- Differential Item Functioning (DIF): IRT allows us to examine whether different groups of individuals (e.g., men and women) with the same underlying level of self-esteem respond differently to specific items. This is crucial for identifying potential bias in the measure.
Key Differences:
In summary:
- CTT would provide us with an overall sense of the reliability and validity of our self-esteem measure as a whole, along with some basic information about the difficulty and discrimination of the items within a specific sample.
- IRT would offer a more nuanced understanding of the individual items, their characteristics, and how they function across different levels of self-esteem. It would provide more precise estimates of individual self-esteem levels and allow us to assess the reliability of the measure at different points on the self-esteem continuum. IRT also offers powerful tools for detecting potential bias in individual items that CTT struggles to address effectively.
Ultimately, IRT provides a more detailed and flexible framework for understanding the psychometric properties of our self-esteem measure compared to the more traditional approach of CTT. However, IRT models are more complex and require larger datasets for accurate parameter estimation. The choice of which theory to apply often depends on the research question, the characteristics of the measure, and the available data.
Let's break down how a self-esteem measure would be analyzed from the perspectives of Classical Test Theory (CTT) and Item Response Theory (IRT).
Classical Test Theory (CTT) Perspective:
From a CTT perspective, our focus would be primarily on the overall test score as a reflection of an individual's true self-esteem, acknowledging that this observed score contains some degree of error. The central equation of CTT is:
Where:
- = Observed score on the self-esteem measure
- = True score (the individual's actual level of self-esteem, which we aim to estimate)
- = Error score (random fluctuations or factors unrelated to true self-esteem that influence the observed score)