Simulate Item Level Data

Simulate data in a way that is very similar to the real thing.

Uses IRT model to generate item responses.

Simulate scores

This will be used as the “theta” in IRT model.

Can include a group or time effect via the regression formula.

library(COA34)
set.seed(6172022)

Beta.PRO <- matrix(c(0, 0, 1, -2), nrow = 4, ncol = 1)

sim <- COA34::sim_dat(N = 1000, 
                      number.groups = 2, 
                      number.timepoints = 2,
                      reg.formula = formula(~ Group + Time + Time*Group), 
                      Beta = Beta.PRO,
                      cor.value = 0.8, 
                      var.values = 1)

dat <- sim$dat # pull the dataframe

Simulate Validator Variables (Reference Measures)

  • Simulate validator variables that have a specified correlation with the theta scores (true scores).

  • IMPORTANT: this means that the correlation is between the IRT latent variable theta and the validator variable. The correlation between the sum-score or mean score and the validator variable will, by definition, be lower. If instead you want to simulate validator variables with a specified correlation with the sum-score or mean score, you need to do that later (see below).

  • Make validator variable 1 a categorical variable with 5 response categories

  • Consider that your PGIS anchor - note that it is specified here to be highly correlated with the PRO score (0.8).

  • The other validator variables will be continuous (specify NA in n.cat).


out <- COA34::sim_val_var_v2(dat = dat, 
                             PRO.score = 'Y_comp',
                             n.val = 5,
                             n.cat = c(5, NA, NA, NA, NA),
                             cor.val.ref = c(0.8, 0.7, 0.6, 0.5, 0.4)  )


dat2 <- out$dat # just use the dataframe 

Implement Drop-out

Can implement drop out of different types (mcar, mar, mnar).

dat3 <- COA34::dropout(dat = dat2, type_dropout = c('mcar'), prop.miss = .2)

Simulate Item Responses via IRT model

Here we simulate item responses using the complete data (Y_comp) and the MCAR data (Y_mcar) data.

j <- 1 # number of items
k <- 5 # number of response categories

ir <- COA34::sim_irt_item(dat = dat2, 
                          J = j, 
                          K = k,
                          latent.variable = 'Y_comp', 
                          time.var = 'Time')

ir_NA <- COA34::sim_irt_item(dat = dat3, 
                             J = j, 
                             K = k,
                             latent.variable = 'Y_mcar', 
                             time.var = 'Time')

# Use dataframes only:
dat4 <- ir$dat
dat5 <- ir_NA$dat

The function sim_irt_item() uses default item parameters. If you are familiar with IRT models and you want to use your own item parameters in item generation, you can do so by creating a matrix of the item parameters:


# Default item param:
ir$item.param
#>        slope intercept_1 intercept_2 intercept_3 intercept_4
#> Item_1     2          -2  -0.6666667   0.6666667           2

# Create new item parameters in a matrix:
new.item.param <- matrix(c(3, -2, -1, 1, 2), nrow = j,  ncol = k, byrow = T,
                         dimnames = list(c('Item_1'), 
                                         c('slope',  'intercept_1',  'intercept_2', 'intercept_3',  'intercept_4')))

# Make sure you have them named 'slope' and 'intercept', the function uses
# those names to pull the values

ir.new <- COA34::sim_irt_item(dat = dat2, 
                              item.param = new.item.param,
                              J = j, 
                              K = k,
                              latent.variable = 'Y_comp', 
                              time.var = 'Time')

ir.new$item.param
#>        slope intercept_1 intercept_2 intercept_3 intercept_4
#> Item_1     3          -2          -1           1           2

Scoring Algorithm

Create the sum score for both dataframes - this will be your PRO score going forward. Of course it could be an average or whatever scoring algorithm is appropriate.

dat4$sum_score <- apply(dat4[, paste0('Item_', 1:j), drop = F], 1, sum)
dat5$sum_score <- apply(dat5[, paste0('Item_', 1:j), drop = F], 1, sum)

Note on validator variables (reference measures)

NOTE: if you want the validator variables to have a specific correlation with the observed/calculated score (e.g., sum-score), you should simulate validator variables at this stage.

Just change what is passed to PRO.score. Below is for illustration, but not run.

# NOT RUN
out <- COA34::sim_val_var_v2(dat = dat, 
                             PRO.score = 'sum_score',
                             n.val = 5,
                             n.cat = c(5, NA, NA, NA, NA),
                             cor.val.ref = c(0.8, 0.7, 0.6, 0.5, 0.4)  )


dat <- out$dat # pull the dataframe

Write out data file

For SAS users: this seems to be the best way to write out data to a text file - seems to make it easiest to read it into SAS.

write.table(dat4, na = '.', quote = F, sep = ', ', row.names = F,
            file = 'sim_data_PRO_17June2022.txt')
write.table(dat5, na = '.', quote = F, sep = ', ', row.names = F,
            file = 'sim_data_PRO_Missing_17June2022.txt')

Estimate MWPC Thresholds

Now that we have the data simulated, let’s do the MWPC analysis.

Compute Anchor Change Score

This is for PGIS anchor - no need if you’re using PGIC because you already have your anchor change scores in the raw response.

# Validator variable 1 (`Val_1`) is the anchor 
dat6 <- COA34::compute_anchor_delta(dat = dat4,
                                   subject.id = 'USUBJID',
                                   time.var = 'Time',
                                   anchor = 'Val_1') 

Compute PRO Change Score

Pretty much what it says. You can do this by hand, if you’d prefer. This function will use the first timepoint and final timepoint automatically. Assuming, of course, your Time variable is correctly ordered.

dat7 <- COA34::compute_change_score(dat = dat6,
                                    subject.id = 'USUBJID',
                                    time.var = 'Time',
                                    score = 'sum_score')

Ability to Detect Change

Use these anchor change scores in the ability to detect change analysis. You’ll compute correlations between this anchor change score and the PRO change score.

For an anchor to be considered appropriate for use in estimation of MWPC thresholds, this correlation should be greater than 0.3 or 0.4.


cor(dat7$sum_score_delta, dat7$Val_1_delta)
#> [1] 0.4318808

Compute Anchor Group/Category

From the anchor change score, consolidate into the anchor groups you’ll use. This function automatically converts into 5 anchor groups. Note: If you need a different definition of anchor groups/category, you’ll have to do it by hand.

dat8 <- COA34::compute_anchor_group(dat = dat7,
                                    anchor.variable = 'Val_1_delta')

Compute MWPC Thresholds

Function computes a table of the MWPC threshold values.

thr <- COA34::compute_thresholds_v2(dat = dat8,
                                    anchor.group = 'anchor.groups',
                                    time.var = 'Time',
                                    change.score = 'sum_score_delta')

eCDF and ePDF

There are functions to output the eCDF and ePDF. More detail on these functions is available in other vignettes. Note that in the case of single-item endpoints, ePDF doesn’t make much sense, it really should be a barplot.

library(ggplot2)
ecdf <- COA34::ggplot2_eCDF_v2(dat = dat8,
                               anchor.group = 'anchor.groups',
                               time.var = 'Time',
                               change.score = 'sum_score_delta')


epdf <- COA34::ggplot2_ePDF(dat = dat8,
                            anchor.group = 'anchor.groups',
                            time.var = 'Time',
                            change.score = 'sum_score_delta')

# Note that you have the ggplot2 object here, you can plot and/or modify it. See other Vignette on MWPC for more details. 
# plot(ecdf)
# plot(epdf)

Output Tables and Figures

Need to output tables and figures - figures is actually easiest. Adjust the width and height manually to make output look good. Start with the 8.5 and 5.5, that seems to fit well into most reports in MS Word.

#eCDF
png(file = 'Example_eCDF.png', units="in", width=8.5, height=5.5, res=300)

  ecdf

dev.off()


# ePDF
png(file = 'Example_ePDF.png', units="in", width=8.5, height=5.5, res=300)

  epdf

dev.off()

Tables: for tables, there are a few approaches. You can google “R write out table to MS Word” and I bet there are a dozen ways to do it, all decent. I’ll put a few examples here

Customized

Code is available here: https://github.com/CJangelo/R2Word/blob/main/R/dump_df_mat_to_file.R

library(R2Word)

R2Word::dump_df_mat_to_file(out = thr,
                            table.title = 'Anchor Group Thresholds',
                            NA.string = '-',
                            decimals = c(2, 2, 0, 1),
                            file.name = 'thr', 
                            print.dir = print.dir)

Another approach

Most other R packages help you to make summary tables and then output them. See tableone and gtsummary R packages.

https://www.r-bloggers.com/2021/11/publication-ready-tables-with-flextable-and-own-theme-in-r/?utm_source=phpList&utm_medium=email&utm_campaign=R-bloggers-daily&utm_content=HTML