Evaluating Students’ Code as a Learning Product

Allison Theobold

Today’s layout

  1. How has students’ code been analyzed?
  1. Another (qualitative) way to analyze students’ code
  1. Examples of how this framework could be used

A bit about me…

“Supporting Data-Intensive Environmental Science Research: Data Science Skills for Scientific Practitioners of Statistics”

An image of an oceanside landscape, with a field of blooming California poppy flowers in the background.

How has students’ code been analyzed?

In the beginning there was…

Then came

A comparison of formula and tidyverse syntaxes (McNamara 2023)

  • lines of code produced
  • unique functions used

And finally

Rafalski et al. (2019) extended these same ideas to compare students’ ability to write accurate code across three different R syntaxes: the tidyverse, base R, and the tilde style.

  • number of errors
  • time completion

An alternative way to analyze students’ code

A framework for analyzing student’s code (Schulte 2008)

Text Surface Program Execution Function
Macrostructure Understanding the overall structure of the program Understanding the “algorithm” of the program Understanding the goal / purpose of the program (in its context)
Relations References between blocks, e.g., method calls, object creation Sequence of method calls, object sequence diagrams Understanding how sub-goals are related to goals, how function is achieved by subfunctions
Blocks Regions of interest (ROI) that syntactically or semantically build a unit Operation of a block, a method, or a ROI (as a sequence of statements) Function of a block, may be seen as a sub-goal
Atoms Language elements Operation of a statement Function of a statement, only understandable in context


with(ProximateAnalysisData, plot(PSUA~Lipid, las=1))

Text Surface

How is whitespace being used?

Program Execution

What operation(s) does this statement carry out?


How is this statement related to the broader context of the program?


anterior <- lm(ProximateAnalysisData$PSUA~ProximateAnalysisData$Lipid)  
with(ProximateAnalysisData, plot(PSUA~Lipid, las=1))  

Program Execution

What operation(s) does this block carry out?


How is this block related to the broader context of the program?

Relationships Between Blocks

anterior <- lm(ProximateAnalysisData$PSUA~ProximateAnalysisData$Lipid)  
with(ProximateAnalysisData, plot(PSUA~Lipid, las=1))  

posterior2 <- lm(ProximateAnalysisDataOutlier$PSUP ~ ProximateAnalysisDataOutlier$Lipid)
with(ProximateAnalysisDataOutlier, plot(PSUP~Lipid, las=1, xlab = "Whole-body Lipid Content (%)", ylab = "UP Fatmeter Reading"))

Okay, but how would this type of analysis look?

Your turn!

RPMA2GrowthSub$Weight[RPMA2GrowthSub$Age == 1]

How would you describe the action(s) being taken in this statement?

Coding student’s code

RPMA2GrowthSub$Weight[RPMA2GrowthSub$Age == 1]

Descriptive code

“Filters a vector of values using extraction operator, based on an equality relation with a variable selected from dataframe using $ operator”

In-vivo code

“Uses [ ] and == to filter vector, uses $ to select variable”

Uncovering emergent themes

linearAnterior <- lm(PADataNoOutlier$Lipid ~ PADataNoOutlier$PSUA)

early <- subset(RPMA2Growth, StockYear < 2006)  

Weight5 <- mean(RPMA2GrowthSub$Weight[RPMA2GrowthSub$Age == 5], na.rm = TRUE)

gas <- gas[!(substr(gas$sampleID,3,3) %in% c("b","c")), ]   

obsD <- subset(gas, gas$carboy == "D")$N15_N2_Ar

lowerCIBound <- pMat[1:mlleIndex,1][which.min(abs(mlleCI+likelihoods[1:mlleIndex]))]

Data wrangling

Statements of code whose purpose is to prepare a dataset for analysis and / or visualization


  • selecting variables
  • filtering observations
  • mutating variables

An alternative direction

Process coding:

uses gerunds (“-ing” words) to connote action in the data (Saldana 2013)

  • Particularly relevant to describing the processes of human actions
  • Can be intertwined with time, such that actions can emerge, change, or occur in particular sequences.

Your turn!

plot(EarlyLengthAge$meanLE ~ EarlyLengthAge$Age, 
     las = 1, ylab = "Fork Length (mm)", xlab = "Age")

lines(EarlyLengthAge$meanLE ~ EarlyLengthAge$Age)

points(MidLengthAge$meanLM ~ MidLengthAge$Age, col = "red")

lines(MidLengthAge$meanLM ~ MidLengthAge$Age, col = "red")

legend(15, 600, legend = c("1998-2003", "2006-2017"), 
       col = c("black", "red"), lty = 1:1, cex = 0.8)

How would you process being enacted in this block of code?

How is that process different from this block of code?

     plot(Lipid ~ PSUA, 
          las = 1, 
          col = ifelse(PADataNoOutlier$`Fork Length` <  280,

How could this be used?

Concept dependence

How does a student’s concept model of a dataset inform how they filter data?

(atoms; program execution)

Program environment

How do the visualizations produced by students who learn ggplot differ from those who learn “base” R?

(blocks; program execution)

Linguistic structure

How do students name objects they will use later?

(relationships; text)

Learning trajectory

How do students’ exploratory data analyses change over the duration of a course?

(macrostructure; function / purpose)

Why is this important for data science education?

“Data science education faces a multitude of open questions surrounding the teaching and learning of data science, and we posit the horizon of research in data science education critically inspects student learning from the perspective of the learner.” (Theobold, Wickstrom, and Hancock 2023)

How can we distinguish merely interesting learning from effective learning? (Wiggins and McTighe 2005)


Practical considerations

How much code should I collect?

  • Driven by the research question!
    • Amount of each student’s code
    • Number of students

How do readers trust my analysis?

  • Trust comes from:

    • confirmability
    • reliability
    • credibility
    • transferability

Excellent resources: Creswell & Poth (2018); Merriam & Tisdell (2016); Miles et al. (2020)


Corbin, Joseph, and Allan Strauss. 2008. Basics of qualitative research: Techniques and procedures for developing grounded theory. Thousand Oaks: Sage.
Creswell, J. W., and C. N. Poth. 2018. Qualitative Inquiry & Research Design. Thousand Oaks, CA: Sage.
Jadud, M. C. 2005. A First Look at Novice Compilation Behavior Using BlueJ.” Computer Science Education 15 (1): 25–40. https://doi.org/10.1080/08993400500056530.
———. 2006. Methods and Tools for Exploring Novice Compilation Behavior.” In Proceedings of the 2nd International Workshop on Computing Education Research (ICER), 73–84. Canterbury, United Kingdom: Association of Computing Machinery. https://doi.org/10.1145/1151588.1151600.
McCall, D., and M. Kolling. 2014. Meaningful Categorisation of Novice Programer Errors.” In 2014 IEEE Frontiers in Education Conference (FIE) Proceedings, 1–8. Madrid, Spain. https://doi.org/10.1109/FIE.2014.7044420.
McNamara, A. 2023. “Teaching Modeling in Introductory Statistics: A Comparison of Formula and Tidyverse Syntaxes.” ArXiv. https://arxiv.org/abs/2201.12960.
Merriam, S. B., and E. J. Tisdell. 2016. Qualitative Research. San Francisco, CA: John Wiley & Sons.
Miles, M. B., A. M. Huberman, and J. Saldaña. 2020. Qualitative Data Analysis. Thousand Oaks, CA: Sage.
Rafalski, T., P. M. Uesbeck, C. Panks-Meloney, P. Daleiden, W. Allee, A. McNamara, and A. Stefik. 2019. A Randomized Controlled Trial on the Wild Wild West of Scientific Computing with Student Learners.” In Proceedings of the 2019 ACM Conference on International Computing Education Research, 239–47. https://doi.org/10.1145/3291279.3339421.
Saldana, J. 2013. The Coding Maual for Qualitative Researchers. Thousand Oaks: Sage.
Schulte, Carsten. 2008. “Block Model.” Proceedings of the Fourth International Workshop on Computing Education Research, September. https://doi.org/10.1145/1404520.1404535.
Theobold, Allison S., Megan M. Wickstrom, and Stacey A. Hancock. 2023. Coding Code: Qualitative Methods for Investigating Data Science Skills.” Journal of Statistics and Data Science Education. https://doi.org/10.1080/26939169.2023.2277847.
Wiggins, G., and J. McTighe. 2005. Understanding by Design. 2nd ed. Alexandria: Association for Supervision; Curriculum Development (ASCD).