Software Project Set up
- Sponsor name, some descriptive for you to remember it, and project code:
- Example: ABBV_CONSISTENCY_20222
- Within this folder:
- R project with project number
- README file to note things to yourself with dates and topics, anything you need to jot down, do it here.
- Keep a running record of everything
- Folders: DATA_20222, CODE_20222, TABLES_20222, FIGURES_20222, DOCUMENTS_2022; “DEPRECATED” subfolder to hold old, archived versions; “SANDBOX” subfolder to dump any code where you were messing around with something but it’s not for use - eg: “mmrm_testing_df_for_meeting_check”
- Create browser folder containing bookmarks related to a project
Data Management
Match on CSR Tables
- First, match the endpoints reported in the CSR.
- At the very least, match the primary endpoints.
- If this isn’t possible, match on sample size, or any available official outcomes.
- The details of the data matching should be specified in the SAP
Factor variables
- All of your variables that are factors should be carefully ordered and labeled.
- This is done using the
factor()
function, specifically the “levels” and “labels” calls.
- Do not do any analysis until you are confident in how you’re implementing this.
- “Should this variable be a character or a factor?”
- Usually I make my variables as a factor or numeric. Only Subject IDs are left as character variables.
- I do this because I am not interested in the subject IDs and there’s so many that a factor variable will get unwieldy.
Data Management Function - data_mgmt()
- Create a function for data management
- Source that file, run that function to yield managed data
- Do this each time you run analysis code, this will ensure all results are fully replicable
- Any changes to data management should be done to that one single file
- This prevents cluttering your Global environment with additional objects
Descriptive Statistics
- Routine part of every analysis
- Example: in psychometric analysis, you’re expected to compute item response distributions
- Descriptive statistics should computed and presented in tables and figures
-
gtsummary
R package
Coding Best Practices
- Getting started: Outline your code on paper
- Especially important for bigger projects with multiple functions
- Pass what objects to what functions?
- Functions
- Sandbox: Sometimes creating functions makes it hard to trouble shoot what you’re doing if you’re just experimenting - If this is going to be a problem, just write it all out hard coded, confirm it works
- Make it a function: once you know it works, yes, you should make it a function.
- Writing Functions
- Build functions sequentially
- First write the procedure, hard code all variables
- Test it! Make sure it works!
- One at a time, change the variables from hard-coded to things you can pass
- Test it, make sure it still works
- Proceed to change next variable
- After it has ALL been generalized, then wrap it in a function
- Pass all of the variables to the function
- Make it throw an error if any of the variables are NOT passed to the function
Testing Code
Code very slowly in order to code very quickly
- Write a line of code
- Run line of code
- Evaluate output
- Check length/dimensions
- View output – does it align with what you anticipated?
- Test it – if you change something, does it break or output something undesired?
- Missing value handling – what happens if there’s an NA or NAN in the input?
- Any other way it could break? Make a note.
- Each time you test that line of code, re-run all the preceding code
- Ensures you are not overwriting anything
- Your output won’t depend on your previous test
- After you’re confident that you don’t have any errors in that line of code, proceed to next line of code
- TO MAXIMIZE SPEED, MINIMIZE ERRORS!
Statistics Resources & References
- We all need reminders for ourselves, and references to point clients to
- Common statistical tests are linear models (or: how to do basic stats)
- Missing Data in clinical trials (and analysis in RCT more broadly)
- Code Guidelines
- Jargon
- Bug report