Online estimation of ability in adaptive testing
In adaptive testing the updating of the ability estimate plays a central role. Generally, adaptive testing consists of the following steps:
Select an optimal item from an item pool given the current ability estimate of the test-taker
Present the item and await the response of the test-taker
Update the ability estimate of the test-taker given the new responses
Additionally we need to provide an inital ability estimate for the test-taker before the first item is responded to. Also a stopping criterion is required to terminate the test when the criterion is met.
In this guide we are going to implement a simple adaptive test using PersonParameters.jl to update the ability values in step 3 of the testing procedure.
Preparation
Setting up the item pool
To keep things simple our item pool will consist of 100 items with item parameters given by a Rasch model. We draw the item difficulty values from a standard normal distribution.
item_pool = randn(100)
100-element Vector{Float64}:
-1.9085844610100242
-0.3193501309246828
-1.3706003089561634
-0.4765395263704167
-0.18406197886030887
0.07956351890906067
0.9653035713225864
-0.7412845170310357
-0.6708770393386021
2.210995604236078
⋮
0.17993853097914825
0.26604455391591014
-0.388363107314558
-0.13520179370852234
0.290395546661944
-0.198443768875884
1.0461801589213602
-0.41526658756575163
0.5608774661943376
Defining the item selection procedure
Next, we need to define a function to select the optimal item (step 1). In adaptive testing this is usually the item that maximises the item information given the current ability estimate.
Therefore we need a function that takes the item_pool
and current ability estimate theta
as inputs, calculates the information for each item in item_pool
at theta
and returns the id of the item with maximum information.
Information functions for the Rasch model are available in ItemResponseFunctions.jl
.
using ItemResponseFunctions: OnePL, information
function select_item(item_pool, theta)
infos = [information(OnePL, theta, beta) for beta in item_pool]
return argmax(infos)
end
select_item (generic function with 1 method)
WARNING
For simplicity items are selected with replacement from the item pool.
In a real-world application one would prefer to track the exposed items and only select items that weren't previously exposed to the test-taker.
Calling the function at theta = 0.0
returns a valid item id.
select_item(item_pool, 0.0)
68
Defining the stopping criterion
Our stopping criterion in this example is also based on the ability estimate. The test should only be stopped if the accuracy of the estimate is higher than a predefined threshold. In other words: We stop the test only if the standard error of the ability estimate is below threshold
.
The stopping criterion returns false
if the criterion is not met, meaning the test continues and a new item is selected. If the stopping criterion is met, true
is returned and the test is stopped.
using PersonParameters: PersonParameter, se
function stop(estimate::PersonParameter; threshold = 0.5)
return se(estimate) < threshold
end
stop (generic function with 1 method)
A small test confirms the stopping rule works as intended.
stop(PersonParameter(0.0, 0.6)) # should return false
false
stop(PersonParameter(0.0, 0.2)) # should return true
true
Implementing the test logic
Now that the item selection and stopping criterion are defined, we can move on to code the test logic. Recall that we must
Await the response of the test-taker,
Update the ability value given the new response,
Evaluate the stopping rule and either present the next item to the test-taker according to
select_item
, or stop the test.
Also we likely want to store responses, administered items, and intermediate ability values like so.
responses = Int[]
estimates = [PersonParameter(0.0, Inf)]
items = [rand(eachindex(item_pool))]
1-element Vector{Int64}:
1
INFO
The objects estimates
and items
already include initial values. For the ability estimate the initial value was fixed at 0.0
. For the initial item a random item was chosen from the item pool.
The following function update
implements the test logic as described above. It makes use of Observables.jl
, running every time a new response
is observed.
using Observables: Observable, on
using PersonParameters: person_parameter, value, WLE
response = Observable(0)
is_stopped = Observable(false)
update = on(response) do y
if !is_stopped[]
push!(responses, y)
@info "new response: y = $y"
theta = person_parameter(OnePL, responses, item_pool[items], WLE())
push!(estimates, theta)
@info "new ability estimate: theta = $(value(theta)), se = $(se(theta))"
if stop(theta)
@info "stopping criterion reached: se = $(se(theta)) < 0.5"
is_stopped[] = true
else
@info "stopping criterion not reached: se = $(se(theta)) > 0.5"
new_item = select_item(item_pool, value(theta))
push!(items, new_item)
@info "current item: $(new_item)"
return new_item
end
end
end
Step-by-step explanation
The first part of the implementation is to define our observables. On one hand we need an observable for new responses, response
. The test logic will run whenever response
is updated.
On the other hand we need an observable to tell us if the test is active or stopped according to our stoppping rule. The observable is_stopped
tells us exactly that.
response = Observable(0)
is_stopped = Observable(false)
The function update
contains the updating procedure once a new response is observed. It will run always if response
is updated and the test is not stopped, c.f. is_stopped[] == false
.
update = on(response) do y
if !is_stopped[]
# ...
end
end
Within update
the steps described above are executed. First, we commit the new response to storage
update = on(response) do y
if !is_stopped[]
push!(responses, y)
# ...
end
Then the new ability is estimated using person_parameter
and also commited to storage. For adaptive testing purposes we choose the WLE
algorithm, since it provides ability estimates even if all responses are 0 or 1 respectively.
update = on(response) do y
if !is_stopped[]
push!(responses, y)
theta = person_parameter(OnePL, responses, item_pool[items], WLE())
push!(estimates, value(theta))
# ...
end
end
Finally the stopping criterion is evaluated. If it is reached, the test is terminated by setting is_stopped[] = true
. Otherwise, a new item is selected by select_item
, commited to storage and provided to the test-taker.
update = on(response) do y
if !is_stopped[]
push!(responses, y)
theta = person_parameter(OnePL, responses, item_pool[items], WLE())
push!(estimates, value(theta))
if stop(theta)
is_stopped[] = true
else
new_item = select_item(item_pool, value(theta))
push!(items, new_item)
return new_item
end
end
end
INFO
In the original definition of update
@info
statements are placed throughout for logging purposes.
Administering the test
With all our test logic in place we can administer the test to a virtual test-taker. We assume that the test taker has a true ability and their response follows the Rasch model. Thus, we can define a respond
function that gives us a random response to the item, given the expected probability of a correct response under the Rasch model.
using Distributions: Bernoulli
using ItemResponseFunctions: irf
function respond(beta; true_theta = 0.0)
prob = irf(OnePL, true_theta, beta, 1)
return Int(rand(Bernoulli(prob)))
end
respond (generic function with 1 method)
The virtual test-taker then responds to the administered items until the stopping criterion is met.
while !is_stopped[]
# get the item difficulty
current_item = last(items)
beta = item_pool[current_item]
# respond
response[] = respond(beta)
end
[ Info: new response: y = 1
[ Info: new ability estimate: theta = -0.8099721723419143, se = 2.309401076758503
[ Info: stopping criterion not reached: se = 2.309401076758503 > 0.5
[ Info: current item: 58
[ Info: new response: y = 0
[ Info: new ability estimate: theta = -1.3638735496687246, se = 1.4669900004712164
[ Info: stopping criterion not reached: se = 1.4669900004712164 > 0.5
[ Info: current item: 3
[ Info: new response: y = 1
[ Info: new ability estimate: theta = -0.8180648042890541, se = 1.221596564798906
[ Info: stopping criterion not reached: se = 1.221596564798906 > 0.5
[ Info: current item: 58
[ Info: new response: y = 1
[ Info: new ability estimate: theta = -0.32297299266066554, se = 1.1153930153077691
[ Info: stopping criterion not reached: se = 1.1153930153077691 > 0.5
[ Info: current item: 2
[ Info: new response: y = 1
[ Info: new ability estimate: theta = 0.14767452484942956, se = 1.063568994642235
[ Info: stopping criterion not reached: se = 1.063568994642235 > 0.5
[ Info: current item: 92
[ Info: new response: y = 1
[ Info: new ability estimate: theta = 0.6163450733547612, se = 1.0391410827075063
[ Info: stopping criterion not reached: se = 1.0391410827075063 > 0.5
[ Info: current item: 32
[ Info: new response: y = 0
[ Info: new ability estimate: theta = 0.2924461709293114, se = 0.8710063456102501
[ Info: stopping criterion not reached: se = 0.8710063456102501 > 0.5
[ Info: current item: 96
[ Info: new response: y = 0
[ Info: new ability estimate: theta = 0.02506826847145942, se = 0.7765744102156711
[ Info: stopping criterion not reached: se = 0.7765744102156711 > 0.5
[ Info: current item: 79
[ Info: new response: y = 1
[ Info: new ability estimate: theta = 0.27362580920349505, se = 0.7409616105883491
[ Info: stopping criterion not reached: se = 0.7409616105883491 > 0.5
[ Info: current item: 42
[ Info: new response: y = 0
[ Info: new ability estimate: theta = 0.06104764935875984, se = 0.6827492937228274
[ Info: stopping criterion not reached: se = 0.6827492937228274 > 0.5
[ Info: current item: 48
[ Info: new response: y = 0
[ Info: new ability estimate: theta = -0.1300889399884594, se = 0.6413268157271816
[ Info: stopping criterion not reached: se = 0.6413268157271816 > 0.5
[ Info: current item: 95
[ Info: new response: y = 0
[ Info: new ability estimate: theta = -0.3039828466805544, se = 0.6103446676458212
[ Info: stopping criterion not reached: se = 0.6103446676458212 > 0.5
[ Info: current item: 46
[ Info: new response: y = 0
[ Info: new ability estimate: theta = -0.4647848285824869, se = 0.5862946551081328
[ Info: stopping criterion not reached: se = 0.5862946551081328 > 0.5
[ Info: current item: 4
[ Info: new response: y = 1
[ Info: new ability estimate: theta = -0.31806563476643934, se = 0.5606163850467831
[ Info: stopping criterion not reached: se = 0.5606163850467831 > 0.5
[ Info: current item: 2
[ Info: new response: y = 0
[ Info: new ability estimate: theta = -0.456548088082302, se = 0.5415356093594295
[ Info: stopping criterion not reached: se = 0.5415356093594295 > 0.5
[ Info: current item: 56
[ Info: new response: y = 1
[ Info: new ability estimate: theta = -0.327251565883756, se = 0.5212687420604935
[ Info: stopping criterion not reached: se = 0.5212687420604935 > 0.5
[ Info: current item: 2
[ Info: new response: y = 1
[ Info: new ability estimate: theta = -0.20594676096866088, se = 0.5046800914037389
[ Info: stopping criterion not reached: se = 0.5046800914037389 > 0.5
[ Info: current item: 43
[ Info: new response: y = 1
[ Info: new ability estimate: theta = -0.09122983058723957, se = 0.49089385133439956
[ Info: stopping criterion reached: se = 0.49089385133439956 < 0.5
As is evident from the logging statements, the test stops after 18 items have been administered. The final estimate is about -0.09
with a standard error of 0.49
.
last(estimates)
PersonParameter{Float64}(-0.09122983058723957, 0.49089385133439956)
The following table contains all tracked data from the virtual test.
using MarkdownTables
init = (; step = 0, item = "", response = "", estimate = value(estimates[1]), se = se(estimates[1]))
data = [(;
step = i,
item = items[i],
response = responses[i],
estimate = value(estimates[i + 1]),
se = se(estimates[i + 1])
) for i in eachindex(items)]
markdown_table(vcat(init, data))
step | item | response | estimate | se |
---|---|---|---|---|
0 | 0.0 | Inf | ||
1 | 1 | 1 | -0.8099721723419143 | 2.309401076758503 |
2 | 58 | 0 | -1.3638735496687246 | 1.4669900004712164 |
3 | 3 | 1 | -0.8180648042890541 | 1.221596564798906 |
4 | 58 | 1 | -0.32297299266066554 | 1.1153930153077691 |
5 | 2 | 1 | 0.14767452484942956 | 1.063568994642235 |
6 | 92 | 1 | 0.6163450733547612 | 1.0391410827075063 |
7 | 32 | 0 | 0.2924461709293114 | 0.8710063456102501 |
8 | 96 | 0 | 0.02506826847145942 | 0.7765744102156711 |
9 | 79 | 1 | 0.27362580920349505 | 0.7409616105883491 |
10 | 42 | 0 | 0.06104764935875984 | 0.6827492937228274 |
11 | 48 | 0 | -0.1300889399884594 | 0.6413268157271816 |
12 | 95 | 0 | -0.3039828466805544 | 0.6103446676458212 |
13 | 46 | 0 | -0.4647848285824869 | 0.5862946551081328 |
14 | 4 | 1 | -0.31806563476643934 | 0.5606163850467831 |
15 | 2 | 0 | -0.456548088082302 | 0.5415356093594295 |
16 | 56 | 1 | -0.327251565883756 | 0.5212687420604935 |
17 | 2 | 1 | -0.20594676096866088 | 0.5046800914037389 |
18 | 43 | 1 | -0.09122983058723957 | 0.49089385133439956 |
Additional information
using Pkg
Pkg.status()
Status `~/work/PersonParameters.jl/PersonParameters.jl/docs/Project.toml`
[31c24e10] Distributions v0.25.113
[e30172f5] Documenter v1.8.0
[4710194d] DocumenterVitepress v0.1.3
[18e85bec] ItemResponseFunctions v0.2.0
[1862ce21] MarkdownTables v1.1.0
[510215fc] Observables v0.5.5
[ede86a6c] PersonParameters v0.2.1 `~/work/PersonParameters.jl/PersonParameters.jl`