Skip to content

Online estimation of ability in adaptive testing

In adaptive testing the updating of the ability estimate plays a central role. Generally, adaptive testing consists of the following steps:

  1. Select an optimal item from an item pool given the current ability estimate of the test-taker

  2. Present the item and await the response of the test-taker

  3. Update the ability estimate of the test-taker given the new responses

Additionally we need to provide an inital ability estimate for the test-taker before the first item is responded to. Also a stopping criterion is required to terminate the test when the criterion is met.

In this guide we are going to implement a simple adaptive test using PersonParameters.jl to update the ability values in step 3 of the testing procedure.

Preparation

Setting up the item pool

To keep things simple our item pool will consist of 100 items with item parameters given by a Rasch model. We draw the item difficulty values from a standard normal distribution.

julia
item_pool = randn(100)
100-element Vector{Float64}:
 -1.9085844610100242
 -0.3193501309246828
 -1.3706003089561634
 -0.4765395263704167
 -0.18406197886030887
  0.07956351890906067
  0.9653035713225864
 -0.7412845170310357
 -0.6708770393386021
  2.210995604236078

  0.17993853097914825
  0.26604455391591014
 -0.388363107314558
 -0.13520179370852234
  0.290395546661944
 -0.198443768875884
  1.0461801589213602
 -0.41526658756575163
  0.5608774661943376

Defining the item selection procedure

Next, we need to define a function to select the optimal item (step 1). In adaptive testing this is usually the item that maximises the item information given the current ability estimate.

Therefore we need a function that takes the item_pool and current ability estimate theta as inputs, calculates the information for each item in item_pool at theta and returns the id of the item with maximum information.

Information functions for the Rasch model are available in ItemResponseFunctions.jl.

julia
using ItemResponseFunctions: OnePL, information

function select_item(item_pool, theta)
    infos = [information(OnePL, theta, beta) for beta in item_pool]
    return argmax(infos)
end
select_item (generic function with 1 method)

WARNING

For simplicity items are selected with replacement from the item pool.

In a real-world application one would prefer to track the exposed items and only select items that weren't previously exposed to the test-taker.

Calling the function at theta = 0.0 returns a valid item id.

julia
select_item(item_pool, 0.0)
68

Defining the stopping criterion

Our stopping criterion in this example is also based on the ability estimate. The test should only be stopped if the accuracy of the estimate is higher than a predefined threshold. In other words: We stop the test only if the standard error of the ability estimate is below threshold.

The stopping criterion returns false if the criterion is not met, meaning the test continues and a new item is selected. If the stopping criterion is met, true is returned and the test is stopped.

julia
using PersonParameters: PersonParameter, se

function stop(estimate::PersonParameter; threshold = 0.5)
    return se(estimate) < threshold
end
stop (generic function with 1 method)

A small test confirms the stopping rule works as intended.

julia
stop(PersonParameter(0.0, 0.6))  # should return false
false
julia
stop(PersonParameter(0.0, 0.2))  # should return true
true

Implementing the test logic

Now that the item selection and stopping criterion are defined, we can move on to code the test logic. Recall that we must

  1. Await the response of the test-taker,

  2. Update the ability value given the new response,

  3. Evaluate the stopping rule and either present the next item to the test-taker according to select_item, or stop the test.

Also we likely want to store responses, administered items, and intermediate ability values like so.

julia
responses = Int[]
estimates = [PersonParameter(0.0, Inf)]
items = [rand(eachindex(item_pool))]
1-element Vector{Int64}:
 1

INFO

The objects estimates and items already include initial values. For the ability estimate the initial value was fixed at 0.0. For the initial item a random item was chosen from the item pool.

The following function update implements the test logic as described above. It makes use of Observables.jl, running every time a new response is observed.

julia
using Observables: Observable, on
using PersonParameters: person_parameter, value, WLE

response = Observable(0)
is_stopped = Observable(false)

update = on(response) do y
    if !is_stopped[]
        push!(responses, y)
        @info "new response: y = $y"

        theta = person_parameter(OnePL, responses, item_pool[items], WLE())
        push!(estimates, theta)
        @info "new ability estimate: theta = $(value(theta)), se = $(se(theta))"

        if stop(theta)
            @info "stopping criterion reached: se = $(se(theta)) < 0.5"
            is_stopped[] = true
        else
            @info "stopping criterion not reached: se = $(se(theta)) > 0.5"
            new_item = select_item(item_pool, value(theta))
            push!(items, new_item)
            @info "current item: $(new_item)"
            return new_item
        end
    end
end

Step-by-step explanation

The first part of the implementation is to define our observables. On one hand we need an observable for new responses, response. The test logic will run whenever response is updated.

On the other hand we need an observable to tell us if the test is active or stopped according to our stoppping rule. The observable is_stopped tells us exactly that.

julia
response = Observable(0)
is_stopped = Observable(false)

The function update contains the updating procedure once a new response is observed. It will run always if response is updated and the test is not stopped, c.f. is_stopped[] == false.

julia
update = on(response) do y
    if !is_stopped[]
        # ...
    end
end

Within update the steps described above are executed. First, we commit the new response to storage

julia
update = on(response) do y
    if !is_stopped[]
        push!(responses, y) 
        # ...
end

Then the new ability is estimated using person_parameter and also commited to storage. For adaptive testing purposes we choose the WLE algorithm, since it provides ability estimates even if all responses are 0 or 1 respectively.

julia
update = on(response) do y
    if !is_stopped[]
        push!(responses, y)

        theta = person_parameter(OnePL, responses, item_pool[items], WLE())  
        push!(estimates, value(theta))  
        # ...
    end
end

Finally the stopping criterion is evaluated. If it is reached, the test is terminated by setting is_stopped[] = true. Otherwise, a new item is selected by select_item, commited to storage and provided to the test-taker.

julia
update = on(response) do y
    if !is_stopped[]
        push!(responses, y)

        theta = person_parameter(OnePL, responses, item_pool[items], WLE())
        push!(estimates, value(theta))

        if stop(theta) 
            is_stopped[] = true
        else
            new_item = select_item(item_pool, value(theta)) 
            push!(items, new_item) 
            return new_item 
        end
    end
end

INFO

In the original definition of update @info statements are placed throughout for logging purposes.

Administering the test

With all our test logic in place we can administer the test to a virtual test-taker. We assume that the test taker has a true ability and their response follows the Rasch model. Thus, we can define a respond function that gives us a random response to the item, given the expected probability of a correct response under the Rasch model.

julia
using Distributions: Bernoulli
using ItemResponseFunctions: irf

function respond(beta; true_theta = 0.0)
    prob = irf(OnePL, true_theta, beta, 1)
    return Int(rand(Bernoulli(prob)))
end
respond (generic function with 1 method)

The virtual test-taker then responds to the administered items until the stopping criterion is met.

julia
while !is_stopped[]
    # get the item difficulty
    current_item = last(items)
    beta = item_pool[current_item]
    # respond
    response[] = respond(beta)
end
[ Info: new response: y = 1
[ Info: new ability estimate: theta = -0.8099721723419143, se = 2.309401076758503
[ Info: stopping criterion not reached: se = 2.309401076758503 > 0.5
[ Info: current item: 58
[ Info: new response: y = 0
[ Info: new ability estimate: theta = -1.3638735496687246, se = 1.4669900004712164
[ Info: stopping criterion not reached: se = 1.4669900004712164 > 0.5
[ Info: current item: 3
[ Info: new response: y = 1
[ Info: new ability estimate: theta = -0.8180648042890541, se = 1.221596564798906
[ Info: stopping criterion not reached: se = 1.221596564798906 > 0.5
[ Info: current item: 58
[ Info: new response: y = 1
[ Info: new ability estimate: theta = -0.32297299266066554, se = 1.1153930153077691
[ Info: stopping criterion not reached: se = 1.1153930153077691 > 0.5
[ Info: current item: 2
[ Info: new response: y = 1
[ Info: new ability estimate: theta = 0.14767452484942956, se = 1.063568994642235
[ Info: stopping criterion not reached: se = 1.063568994642235 > 0.5
[ Info: current item: 92
[ Info: new response: y = 1
[ Info: new ability estimate: theta = 0.6163450733547612, se = 1.0391410827075063
[ Info: stopping criterion not reached: se = 1.0391410827075063 > 0.5
[ Info: current item: 32
[ Info: new response: y = 0
[ Info: new ability estimate: theta = 0.2924461709293114, se = 0.8710063456102501
[ Info: stopping criterion not reached: se = 0.8710063456102501 > 0.5
[ Info: current item: 96
[ Info: new response: y = 0
[ Info: new ability estimate: theta = 0.02506826847145942, se = 0.7765744102156711
[ Info: stopping criterion not reached: se = 0.7765744102156711 > 0.5
[ Info: current item: 79
[ Info: new response: y = 1
[ Info: new ability estimate: theta = 0.27362580920349505, se = 0.7409616105883491
[ Info: stopping criterion not reached: se = 0.7409616105883491 > 0.5
[ Info: current item: 42
[ Info: new response: y = 0
[ Info: new ability estimate: theta = 0.06104764935875984, se = 0.6827492937228274
[ Info: stopping criterion not reached: se = 0.6827492937228274 > 0.5
[ Info: current item: 48
[ Info: new response: y = 0
[ Info: new ability estimate: theta = -0.1300889399884594, se = 0.6413268157271816
[ Info: stopping criterion not reached: se = 0.6413268157271816 > 0.5
[ Info: current item: 95
[ Info: new response: y = 0
[ Info: new ability estimate: theta = -0.3039828466805544, se = 0.6103446676458212
[ Info: stopping criterion not reached: se = 0.6103446676458212 > 0.5
[ Info: current item: 46
[ Info: new response: y = 0
[ Info: new ability estimate: theta = -0.4647848285824869, se = 0.5862946551081328
[ Info: stopping criterion not reached: se = 0.5862946551081328 > 0.5
[ Info: current item: 4
[ Info: new response: y = 1
[ Info: new ability estimate: theta = -0.31806563476643934, se = 0.5606163850467831
[ Info: stopping criterion not reached: se = 0.5606163850467831 > 0.5
[ Info: current item: 2
[ Info: new response: y = 0
[ Info: new ability estimate: theta = -0.456548088082302, se = 0.5415356093594295
[ Info: stopping criterion not reached: se = 0.5415356093594295 > 0.5
[ Info: current item: 56
[ Info: new response: y = 1
[ Info: new ability estimate: theta = -0.327251565883756, se = 0.5212687420604935
[ Info: stopping criterion not reached: se = 0.5212687420604935 > 0.5
[ Info: current item: 2
[ Info: new response: y = 1
[ Info: new ability estimate: theta = -0.20594676096866088, se = 0.5046800914037389
[ Info: stopping criterion not reached: se = 0.5046800914037389 > 0.5
[ Info: current item: 43
[ Info: new response: y = 1
[ Info: new ability estimate: theta = -0.09122983058723957, se = 0.49089385133439956
[ Info: stopping criterion reached: se = 0.49089385133439956 < 0.5

As is evident from the logging statements, the test stops after 18 items have been administered. The final estimate is about -0.09 with a standard error of 0.49.

julia
last(estimates)
PersonParameter{Float64}(-0.09122983058723957, 0.49089385133439956)

The following table contains all tracked data from the virtual test.

julia
using MarkdownTables

init = (; step = 0, item = "", response = "", estimate = value(estimates[1]), se = se(estimates[1]))
data = [(;
    step = i,
    item = items[i],
    response = responses[i],
    estimate = value(estimates[i + 1]),
    se = se(estimates[i + 1])
) for i in eachindex(items)]

markdown_table(vcat(init, data))
stepitemresponseestimatese
00.0Inf
111-0.80997217234191432.309401076758503
2580-1.36387354966872461.4669900004712164
331-0.81806480428905411.221596564798906
4581-0.322972992660665541.1153930153077691
5210.147674524849429561.063568994642235
69210.61634507335476121.0391410827075063
73200.29244617092931140.8710063456102501
89600.025068268471459420.7765744102156711
97910.273625809203495050.7409616105883491
104200.061047649358759840.6827492937228274
11480-0.13008893998845940.6413268157271816
12950-0.30398284668055440.6103446676458212
13460-0.46478482858248690.5862946551081328
1441-0.318065634766439340.5606163850467831
1520-0.4565480880823020.5415356093594295
16561-0.3272515658837560.5212687420604935
1721-0.205946760968660880.5046800914037389
18431-0.091229830587239570.49089385133439956

Additional information

julia
using Pkg
Pkg.status()
Status `~/work/PersonParameters.jl/PersonParameters.jl/docs/Project.toml`
  [31c24e10] Distributions v0.25.113
  [e30172f5] Documenter v1.8.0
  [4710194d] DocumenterVitepress v0.1.3
  [18e85bec] ItemResponseFunctions v0.2.0
  [1862ce21] MarkdownTables v1.1.0
  [510215fc] Observables v0.5.5
  [ede86a6c] PersonParameters v0.2.1 `~/work/PersonParameters.jl/PersonParameters.jl`