Online estimation of ability in adaptive testing

In adaptive testing the updating of the ability estimate plays a central role. Generally, adaptive testing consists of the following steps:

Select an optimal item from an item pool given the current ability estimate of the test-taker
Present the item and await the response of the test-taker
Update the ability estimate of the test-taker given the new responses

Additionally we need to provide an inital ability estimate for the test-taker before the first item is responded to. Also a stopping criterion is required to terminate the test when the criterion is met.

In this guide we are going to implement a simple adaptive test using PersonParameters.jl to update the ability values in step 3 of the testing procedure.

Preparation

Setting up the item pool

To keep things simple our item pool will consist of 100 items with item parameters given by a Rasch model. We draw the item difficulty values from a standard normal distribution.

julia

item_pool = randn(100)

100-element Vector{Float64}:
 -1.9085844610100242
 -0.3193501309246828
 -1.3706003089561634
 -0.4765395263704167
 -0.18406197886030887
  0.07956351890906067
  0.9653035713225864
 -0.7412845170310357
 -0.6708770393386021
  2.210995604236078
  ⋮
  0.17993853097914825
  0.26604455391591014
 -0.388363107314558
 -0.13520179370852234
  0.290395546661944
 -0.198443768875884
  1.0461801589213602
 -0.41526658756575163
  0.5608774661943376

Defining the item selection procedure

Next, we need to define a function to select the optimal item (step 1). In adaptive testing this is usually the item that maximises the item information given the current ability estimate.

Therefore we need a function that takes the item_pool and current ability estimate theta as inputs, calculates the information for each item in item_pool at theta and returns the id of the item with maximum information.

Information functions for the Rasch model are available in ItemResponseFunctions.jl.

julia

using ItemResponseFunctions: OnePL, information

function select_item(item_pool, theta)
    infos = [information(OnePL, theta, beta) for beta in item_pool]
    return argmax(infos)
end

select_item (generic function with 1 method)

WARNING

For simplicity items are selected with replacement from the item pool.

In a real-world application one would prefer to track the exposed items and only select items that weren't previously exposed to the test-taker.

Calling the function at theta = 0.0 returns a valid item id.

julia

select_item(item_pool, 0.0)

Defining the stopping criterion

Our stopping criterion in this example is also based on the ability estimate. The test should only be stopped if the accuracy of the estimate is higher than a predefined threshold. In other words: We stop the test only if the standard error of the ability estimate is below threshold.

The stopping criterion returns false if the criterion is not met, meaning the test continues and a new item is selected. If the stopping criterion is met, true is returned and the test is stopped.

julia

using PersonParameters: PersonParameter, se

function stop(estimate::PersonParameter; threshold = 0.5)
    return se(estimate) < threshold
end

stop (generic function with 1 method)

A small test confirms the stopping rule works as intended.

julia

stop(PersonParameter(0.0, 0.6))  # should return false

false

julia

stop(PersonParameter(0.0, 0.2))  # should return true

true

Implementing the test logic

Now that the item selection and stopping criterion are defined, we can move on to code the test logic. Recall that we must

Await the response of the test-taker,
Update the ability value given the new response,
Evaluate the stopping rule and either present the next item to the test-taker according to select_item, or stop the test.

Also we likely want to store responses, administered items, and intermediate ability values like so.

julia

responses = Int[]
estimates = [PersonParameter(0.0, Inf)]
items = [rand(eachindex(item_pool))]

1-element Vector{Int64}:
 1

INFO

The objects estimates and items already include initial values. For the ability estimate the initial value was fixed at 0.0. For the initial item a random item was chosen from the item pool.

The following function update implements the test logic as described above. It makes use of Observables.jl, running every time a new response is observed.

julia

using Observables: Observable, on
using PersonParameters: person_parameter, value, WLE

response = Observable(0)
is_stopped = Observable(false)

update = on(response) do y
    if !is_stopped[]
        push!(responses, y)
        @info "new response: y = $y"

        theta = person_parameter(OnePL, responses, item_pool[items], WLE())
        push!(estimates, theta)
        @info "new ability estimate: theta = $(value(theta)), se = $(se(theta))"

        if stop(theta)
            @info "stopping criterion reached: se = $(se(theta)) < 0.5"
            is_stopped[] = true
        else
            @info "stopping criterion not reached: se = $(se(theta)) > 0.5"
            new_item = select_item(item_pool, value(theta))
            push!(items, new_item)
            @info "current item: $(new_item)"
            return new_item
        end
    end
end

Step-by-step explanation

The first part of the implementation is to define our observables. On one hand we need an observable for new responses, response. The test logic will run whenever response is updated.

On the other hand we need an observable to tell us if the test is active or stopped according to our stoppping rule. The observable is_stopped tells us exactly that.

julia

response = Observable(0)
is_stopped = Observable(false)

The function update contains the updating procedure once a new response is observed. It will run always if response is updated and the test is not stopped, c.f. is_stopped[] == false.

julia

update = on(response) do y
    if !is_stopped[]
        # ...
    end
end

Within update the steps described above are executed. First, we commit the new response to storage

julia

update = on(response) do y
    if !is_stopped[]
        push!(responses, y) 
        # ...
end

Then the new ability is estimated using person_parameter and also commited to storage. For adaptive testing purposes we choose the WLE algorithm, since it provides ability estimates even if all responses are 0 or 1 respectively.

julia

update = on(response) do y
    if !is_stopped[]
        push!(responses, y)

        theta = person_parameter(OnePL, responses, item_pool[items], WLE())  
        push!(estimates, value(theta))  
        # ...
    end
end

Finally the stopping criterion is evaluated. If it is reached, the test is terminated by setting is_stopped[] = true. Otherwise, a new item is selected by select_item, commited to storage and provided to the test-taker.

julia

update = on(response) do y
    if !is_stopped[]
        push!(responses, y)

        theta = person_parameter(OnePL, responses, item_pool[items], WLE())
        push!(estimates, value(theta))

        if stop(theta) 
            is_stopped[] = true
        else
            new_item = select_item(item_pool, value(theta)) 
            push!(items, new_item) 
            return new_item 
        end
    end
end

INFO

In the original definition of update @info statements are placed throughout for logging purposes.

Administering the test

With all our test logic in place we can administer the test to a virtual test-taker. We assume that the test taker has a true ability and their response follows the Rasch model. Thus, we can define a respond function that gives us a random response to the item, given the expected probability of a correct response under the Rasch model.

julia

using Distributions: Bernoulli
using ItemResponseFunctions: irf

function respond(beta; true_theta = 0.0)
    prob = irf(OnePL, true_theta, beta, 1)
    return Int(rand(Bernoulli(prob)))
end

respond (generic function with 1 method)

The virtual test-taker then responds to the administered items until the stopping criterion is met.

julia

while !is_stopped[]
    # get the item difficulty
    current_item = last(items)
    beta = item_pool[current_item]
    # respond
    response[] = respond(beta)
end

[ Info: new response: y = 1
[ Info: new ability estimate: theta = -0.8099721723419143, se = 2.309401076758503
[ Info: stopping criterion not reached: se = 2.309401076758503 > 0.5
[ Info: current item: 58
[ Info: new response: y = 0
[ Info: new ability estimate: theta = -1.3638735496687246, se = 1.4669900004712164
[ Info: stopping criterion not reached: se = 1.4669900004712164 > 0.5
[ Info: current item: 3
[ Info: new response: y = 1
[ Info: new ability estimate: theta = -0.8180648042890541, se = 1.221596564798906
[ Info: stopping criterion not reached: se = 1.221596564798906 > 0.5
[ Info: current item: 58
[ Info: new response: y = 1
[ Info: new ability estimate: theta = -0.32297299266066554, se = 1.1153930153077691
[ Info: stopping criterion not reached: se = 1.1153930153077691 > 0.5
[ Info: current item: 2
[ Info: new response: y = 1
[ Info: new ability estimate: theta = 0.14767452484942956, se = 1.063568994642235
[ Info: stopping criterion not reached: se = 1.063568994642235 > 0.5
[ Info: current item: 92
[ Info: new response: y = 1
[ Info: new ability estimate: theta = 0.6163450733547612, se = 1.0391410827075063
[ Info: stopping criterion not reached: se = 1.0391410827075063 > 0.5
[ Info: current item: 32
[ Info: new response: y = 0
[ Info: new ability estimate: theta = 0.2924461709293114, se = 0.8710063456102501
[ Info: stopping criterion not reached: se = 0.8710063456102501 > 0.5
[ Info: current item: 96
[ Info: new response: y = 0
[ Info: new ability estimate: theta = 0.02506826847145942, se = 0.7765744102156711
[ Info: stopping criterion not reached: se = 0.7765744102156711 > 0.5
[ Info: current item: 79
[ Info: new response: y = 1
[ Info: new ability estimate: theta = 0.27362580920349505, se = 0.7409616105883491
[ Info: stopping criterion not reached: se = 0.7409616105883491 > 0.5
[ Info: current item: 42
[ Info: new response: y = 0
[ Info: new ability estimate: theta = 0.06104764935875984, se = 0.6827492937228274
[ Info: stopping criterion not reached: se = 0.6827492937228274 > 0.5
[ Info: current item: 48
[ Info: new response: y = 0
[ Info: new ability estimate: theta = -0.1300889399884594, se = 0.6413268157271816
[ Info: stopping criterion not reached: se = 0.6413268157271816 > 0.5
[ Info: current item: 95
[ Info: new response: y = 0
[ Info: new ability estimate: theta = -0.3039828466805544, se = 0.6103446676458212
[ Info: stopping criterion not reached: se = 0.6103446676458212 > 0.5
[ Info: current item: 46
[ Info: new response: y = 0
[ Info: new ability estimate: theta = -0.4647848285824869, se = 0.5862946551081328
[ Info: stopping criterion not reached: se = 0.5862946551081328 > 0.5
[ Info: current item: 4
[ Info: new response: y = 1
[ Info: new ability estimate: theta = -0.31806563476643934, se = 0.5606163850467831
[ Info: stopping criterion not reached: se = 0.5606163850467831 > 0.5
[ Info: current item: 2
[ Info: new response: y = 0
[ Info: new ability estimate: theta = -0.456548088082302, se = 0.5415356093594295
[ Info: stopping criterion not reached: se = 0.5415356093594295 > 0.5
[ Info: current item: 56
[ Info: new response: y = 1
[ Info: new ability estimate: theta = -0.327251565883756, se = 0.5212687420604935
[ Info: stopping criterion not reached: se = 0.5212687420604935 > 0.5
[ Info: current item: 2
[ Info: new response: y = 1
[ Info: new ability estimate: theta = -0.20594676096866088, se = 0.5046800914037389
[ Info: stopping criterion not reached: se = 0.5046800914037389 > 0.5
[ Info: current item: 43
[ Info: new response: y = 1
[ Info: new ability estimate: theta = -0.09122983058723957, se = 0.49089385133439956
[ Info: stopping criterion reached: se = 0.49089385133439956 < 0.5

As is evident from the logging statements, the test stops after 18 items have been administered. The final estimate is about -0.09 with a standard error of 0.49.

julia

last(estimates)

PersonParameter{Float64}(-0.09122983058723957, 0.49089385133439956)

The following table contains all tracked data from the virtual test.

julia

using MarkdownTables

init = (; step = 0, item = "", response = "", estimate = value(estimates[1]), se = se(estimates[1]))
data = [(;
    step = i,
    item = items[i],
    response = responses[i],
    estimate = value(estimates[i + 1]),
    se = se(estimates[i + 1])
) for i in eachindex(items)]

markdown_table(vcat(init, data))

step	item	response	estimate	se
0			0.0	Inf
1	1	1	-0.8099721723419143	2.309401076758503
2	58	0	-1.3638735496687246	1.4669900004712164
3	3	1	-0.8180648042890541	1.221596564798906
4	58	1	-0.32297299266066554	1.1153930153077691
5	2	1	0.14767452484942956	1.063568994642235
6	92	1	0.6163450733547612	1.0391410827075063
7	32	0	0.2924461709293114	0.8710063456102501
8	96	0	0.02506826847145942	0.7765744102156711
9	79	1	0.27362580920349505	0.7409616105883491
10	42	0	0.06104764935875984	0.6827492937228274
11	48	0	-0.1300889399884594	0.6413268157271816
12	95	0	-0.3039828466805544	0.6103446676458212
13	46	0	-0.4647848285824869	0.5862946551081328
14	4	1	-0.31806563476643934	0.5606163850467831
15	2	0	-0.456548088082302	0.5415356093594295
16	56	1	-0.327251565883756	0.5212687420604935
17	2	1	-0.20594676096866088	0.5046800914037389
18	43	1	-0.09122983058723957	0.49089385133439956

Additional information

julia

using Pkg
Pkg.status()

Status `~/work/PersonParameters.jl/PersonParameters.jl/docs/Project.toml`
  [31c24e10] Distributions v0.25.113
  [e30172f5] Documenter v1.8.0
  [4710194d] DocumenterVitepress v0.1.3
  [18e85bec] ItemResponseFunctions v0.2.0
  [1862ce21] MarkdownTables v1.1.0
  [510215fc] Observables v0.5.5
  [ede86a6c] PersonParameters v0.2.1 `~/work/PersonParameters.jl/PersonParameters.jl`

Online estimation of ability in adaptive testing ​

Preparation ​

Setting up the item pool ​

Defining the item selection procedure ​

Defining the stopping criterion ​

Implementing the test logic ​

Step-by-step explanation ​

Administering the test ​

Additional information ​

Online estimation of ability in adaptive testing

Preparation

Setting up the item pool

Defining the item selection procedure

Defining the stopping criterion

Implementing the test logic

Step-by-step explanation

Administering the test

Additional information