a08_Vignette_All_Together.Rmd
Simulate data in a way that is very similar to the real thing.
Uses IRT model to generate item responses.
This will be used as the “theta” in IRT model.
Can include a group or time effect via the regression formula.
Simulate validator variables that have a specified correlation with the theta scores (true scores).
IMPORTANT: this means that the correlation is between the IRT latent variable theta and the validator variable. The correlation between the sum-score or mean score and the validator variable will, by definition, be lower. If instead you want to simulate validator variables with a specified correlation with the sum-score or mean score, you need to do that later (see below).
Make validator variable 1 a categorical variable with 5 response categories
Consider that your PGIS anchor - note that it is specified here to be highly correlated with the PRO score (0.8).
The other validator variables will be continuous (specify NA in n.cat
).
out <- COA34::sim_val_var_v2(dat = dat,
PRO.score = 'Y_comp',
n.val = 5,
n.cat = c(5, NA, NA, NA, NA),
cor.val.ref = c(0.8, 0.7, 0.6, 0.5, 0.4) )
dat2 <- out$dat # just use the dataframe
Here we simulate item responses using the complete data (Y_comp
) and the MCAR data (Y_mcar
) data.
j <- 1 # number of items
k <- 5 # number of response categories
ir <- COA34::sim_irt_item(dat = dat2,
J = j,
K = k,
latent.variable = 'Y_comp',
time.var = 'Time')
ir_NA <- COA34::sim_irt_item(dat = dat3,
J = j,
K = k,
latent.variable = 'Y_mcar',
time.var = 'Time')
# Use dataframes only:
dat4 <- ir$dat
dat5 <- ir_NA$dat
The function sim_irt_item()
uses default item parameters. If you are familiar with IRT models and you want to use your own item parameters in item generation, you can do so by creating a matrix of the item parameters:
# Default item param:
ir$item.param
#> slope intercept_1 intercept_2 intercept_3 intercept_4
#> Item_1 2 -2 -0.6666667 0.6666667 2
# Create new item parameters in a matrix:
new.item.param <- matrix(c(3, -2, -1, 1, 2), nrow = j, ncol = k, byrow = T,
dimnames = list(c('Item_1'),
c('slope', 'intercept_1', 'intercept_2', 'intercept_3', 'intercept_4')))
# Make sure you have them named 'slope' and 'intercept', the function uses
# those names to pull the values
ir.new <- COA34::sim_irt_item(dat = dat2,
item.param = new.item.param,
J = j,
K = k,
latent.variable = 'Y_comp',
time.var = 'Time')
ir.new$item.param
#> slope intercept_1 intercept_2 intercept_3 intercept_4
#> Item_1 3 -2 -1 1 2
Create the sum score for both dataframes - this will be your PRO score going forward. Of course it could be an average or whatever scoring algorithm is appropriate.
NOTE: if you want the validator variables to have a specific correlation with the observed/calculated score (e.g., sum-score), you should simulate validator variables at this stage.
Just change what is passed to PRO.score
. Below is for illustration, but not run.
# NOT RUN
out <- COA34::sim_val_var_v2(dat = dat,
PRO.score = 'sum_score',
n.val = 5,
n.cat = c(5, NA, NA, NA, NA),
cor.val.ref = c(0.8, 0.7, 0.6, 0.5, 0.4) )
dat <- out$dat # pull the dataframe
For SAS users: this seems to be the best way to write out data to a text file - seems to make it easiest to read it into SAS.
write.table(dat4, na = '.', quote = F, sep = ', ', row.names = F,
file = 'sim_data_PRO_17June2022.txt')
write.table(dat5, na = '.', quote = F, sep = ', ', row.names = F,
file = 'sim_data_PRO_Missing_17June2022.txt')
Now that we have the data simulated, let’s do the MWPC analysis.
This is for PGIS anchor - no need if you’re using PGIC because you already have your anchor change scores in the raw response.
# Validator variable 1 (`Val_1`) is the anchor
dat6 <- COA34::compute_anchor_delta(dat = dat4,
subject.id = 'USUBJID',
time.var = 'Time',
anchor = 'Val_1')
Pretty much what it says. You can do this by hand, if you’d prefer. This function will use the first timepoint and final timepoint automatically. Assuming, of course, your Time
variable is correctly ordered.
dat7 <- COA34::compute_change_score(dat = dat6,
subject.id = 'USUBJID',
time.var = 'Time',
score = 'sum_score')
Use these anchor change scores in the ability to detect change analysis. You’ll compute correlations between this anchor change score and the PRO change score.
For an anchor to be considered appropriate for use in estimation of MWPC thresholds, this correlation should be greater than 0.3 or 0.4.
cor(dat7$sum_score_delta, dat7$Val_1_delta)
#> [1] 0.4318808
From the anchor change score, consolidate into the anchor groups you’ll use. This function automatically converts into 5 anchor groups. Note: If you need a different definition of anchor groups/category, you’ll have to do it by hand.
dat8 <- COA34::compute_anchor_group(dat = dat7,
anchor.variable = 'Val_1_delta')
Function computes a table of the MWPC threshold values.
thr <- COA34::compute_thresholds_v2(dat = dat8,
anchor.group = 'anchor.groups',
time.var = 'Time',
change.score = 'sum_score_delta')
There are functions to output the eCDF and ePDF. More detail on these functions is available in other vignettes. Note that in the case of single-item endpoints, ePDF doesn’t make much sense, it really should be a barplot.
library(ggplot2)
ecdf <- COA34::ggplot2_eCDF_v2(dat = dat8,
anchor.group = 'anchor.groups',
time.var = 'Time',
change.score = 'sum_score_delta')
epdf <- COA34::ggplot2_ePDF(dat = dat8,
anchor.group = 'anchor.groups',
time.var = 'Time',
change.score = 'sum_score_delta')
# Note that you have the ggplot2 object here, you can plot and/or modify it. See other Vignette on MWPC for more details.
# plot(ecdf)
# plot(epdf)
Need to output tables and figures - figures is actually easiest. Adjust the width and height manually to make output look good. Start with the 8.5 and 5.5, that seems to fit well into most reports in MS Word.
#eCDF
png(file = 'Example_eCDF.png', units="in", width=8.5, height=5.5, res=300)
ecdf
dev.off()
# ePDF
png(file = 'Example_ePDF.png', units="in", width=8.5, height=5.5, res=300)
epdf
dev.off()
Tables: for tables, there are a few approaches. You can google “R write out table to MS Word” and I bet there are a dozen ways to do it, all decent. I’ll put a few examples here
Code is available here: https://github.com/CJangelo/R2Word/blob/main/R/dump_df_mat_to_file.R
It’s essentially the flextable
R package with a custom function to yield the desired number of decimals in each number.
The round_numbers()
function is what is useful here: https://github.com/CJangelo/R2Word/blob/main/R/round_numbers.R
Then the rest is just using the flextable
R package to finalize formatting and output the table to MS Word.
library(R2Word)
R2Word::dump_df_mat_to_file(out = thr,
table.title = 'Anchor Group Thresholds',
NA.string = '-',
decimals = c(2, 2, 0, 1),
file.name = 'thr',
print.dir = print.dir)