Innovative Approaches To Hrql Measurement

Clinical service delivery can be improved by refining the measurement of common symptoms and related HRQL, and connecting that measurement to clinical practice improvement efforts as well as the comprehensive study of multiple outcomes. The availability of multidimensional self-report HRQL instruments has allowed investigators to measure the impact of disease and its treatment on well-being and functioning. These instruments can capture important group differences or changes over time and better document the adverse and positive impact of disease, treatment side effects, and tumor response.222-228 Their success over the last 15 years in articulating the impact of cancer and its treatments has led many to request the logical next step: measuring HRQL in individuals, tracking their progress over time, and using responses to inform care. Unfortunately, these compact multidimensional instruments are too coarse for individual assessment.43,229-231 Their brevity, critical for inclusion in large-scale studies, is a major limitation in the individual assessment of cancer survivors. Error in individual assessment results is unacceptably wide. The confidence intervals for diagnostic and treatment decision-making purposes are too broad. The recent introduction of item response theory (IRT) and its application in computerized adaptive testing offer a solution by providing brief assessment precise enough for individual classification.231-236 We 229,232,237-239 and others have shown that using computer adaptive testing with refined, well-defined banks of questions can select only those that provide the most health information and therefore increase precision.

5.1. Item Response Theory and Item Banks

Item response theory is a family of mathematical models used to determine the characteristics (difficulty) of test questions and to estimate the level of people on the underlying dimension being measured.239-241 It posits an underlying, unobserved trait on which items are hierarchically arrayed. The three most popular unidimensional IRT models are the one-, two-, and three-parameter logistic models, based on the number of item parameters each incorporates.242

An item bank is comprised of carefully calibrated questions that develop, define, and quantify a common theme and thus provide an operational definition of a "trait." 238,243-249 a good bank covers the entire continuum of the latent trait being measured (Figure 2). The items in the bank are concrete manifestations of positions along the continuum that represent differing amounts of that trait. An HRQL item bank can provide a basis for designing the best possible set of questions for any particular application. A well-calibrated item bank makes it possible to compare the amount of a given trait for survivors who complete different sets of questions in the bank. Not only does this allow for tailored, "adaptive" testing, it also allows comparison across studies using different items from the same bank. Because all items are calibrated onto one common scale, one can compare HRQL results across diverse groups of survivors and item sets. A well-organized item bank with wide ranging item difficulties can also enable one to select items to construct a wide variety of tests, depending on the target populations and purpose of assessment. At a given difficulty level, any chosen item should provide the same increment of information. In the specific context of HRQL (or one of its dimensions), the content of questions at comparable difficulty levels may vary in clinical relevance. By using item bank information, the user is then able to select that item, within a given difficulty level, according to its clinical relevance. Specific items can thus be selected from among those in the bank to maximize precision of the estimate and clinical relevance of the questions. With computer adaptive tests, collaborative interaction between clinicians and programmers of the algorithm allows one to select the best set of items to obtain an estimate.

Figure 2. Calibrated Item Bank.

Figure 2 illustrates how three different researchers, studying three different types of cancer, can access the same generic item bank and select unique short forms of varying length and clinical content, and yet still produce a score for each person across the three trials that is on the same metric. An added feature is that, because the IRT measurement model uses logistic regression as its basis, this similar metric across the three short forms and clinical trials is on an interval scale. The interval scale nature of the metric comfortably allows for parametric statistics to be applied to trial data, offering more power in the statistical test, and perhaps having a beneficial effect on sample size requirements for trials in which the sample size is driven by the HRQL endpoint. Finally, also indicated in Figure 2, IRT item banks permit a degree of precision in assessing the individual person. Because they are replete with related items that are calibrated on a continuum of the concept being measured, IRT item banks are built for computerized adaptive testing, which selects the most informative questions from the bank until a sufficiently precise estimate of the person's score is obtained.

5.2. Computer Adaptive Testing

Computer adaptive testing is a method of administering tests by computer, based on the psychometric framework of IRT. Adaptive tests are greatly facilitated by a computer because of the computational requirements of the algorithm and the logistics of item and data management.250 251 Items are selected on the basis of the examinee's responses to previously administered items.239 252-254 This process uses an algorithm to estimate person "ability" and then chooses the best next item, enabling test administration based on specifications such as content coverage and test length. The capacity to rank all examinees on the same continuum, even if they have not been given any common items, allows for a test that is individually tailored to each examinee. With item banking, each patient need only answer a subset of items to obtain a measure that accurately estimates what would have been obtained by administering the entire set of items.

