Evaluating Students’ Code as a Learning Product
Allison Theobold
Today’s layout
A bit about me…
“Supporting Data-Intensive Environmental Science Research: Data Science Skills for Scientific Practitioners of Statistics”
How has students’ code been analyzed?
A comparison of formula and tidyverse syntaxes (McNamara 2023)
Rafalski et al. (2019) extended these same ideas to compare students’ ability to write accurate code across three different R syntaxes: the tidyverse, base R, and the tilde style.
An alternative way to analyze students’ code
A framework for analyzing student’s code (Schulte 2008)
Text Surface | Program Execution | Function | |
---|---|---|---|
Macrostructure | Understanding the overall structure of the program | Understanding the “algorithm” of the program | Understanding the goal / purpose of the program (in its context) |
Relations | References between blocks, e.g., method calls, object creation | Sequence of method calls, object sequence diagrams | Understanding how sub-goals are related to goals, how function is achieved by subfunctions |
Blocks | Regions of interest (ROI) that syntactically or semantically build a unit | Operation of a block, a method, or a ROI (as a sequence of statements) | Function of a block, may be seen as a sub-goal |
Atoms | Language elements | Operation of a statement | Function of a statement, only understandable in context |
Atom
with(ProximateAnalysisData, plot(PSUA~Lipid, las=1))
Text Surface
How is whitespace being used?
Program Execution
What operation(s) does this statement carry out?
Function
How is this statement related to the broader context of the program?
Block
anterior <- lm(ProximateAnalysisData$PSUA~ProximateAnalysisData$Lipid)
summary(anterior)
with(ProximateAnalysisData, plot(PSUA~Lipid, las=1))
abline(anterior)
plot(anterior)
Program Execution
What operation(s) does this block carry out?
Function
How is this block related to the broader context of the program?
Relationships Between Blocks
anterior <- lm(ProximateAnalysisData$PSUA~ProximateAnalysisData$Lipid)
summary(anterior)
with(ProximateAnalysisData, plot(PSUA~Lipid, las=1))
abline(anterior)
plot(anterior)
posterior2 <- lm(ProximateAnalysisDataOutlier$PSUP ~ ProximateAnalysisDataOutlier$Lipid)
summary(posterior2)
with(ProximateAnalysisDataOutlier, plot(PSUP~Lipid, las=1, xlab = "Whole-body Lipid Content (%)", ylab = "UP Fatmeter Reading"))
abline(posterior2)
plot(posterior2)
posterior2
Okay, but how would this type of analysis look?
Your turn!
How would you describe the action(s) being taken in this statement?
Coding student’s code
Descriptive code
“Filters a vector of values using extraction operator, based on an equality relation with a variable selected from dataframe using
$
operator”
In-vivo code
“Uses
[ ]
and==
to filter vector, uses$
to select variable”
Uncovering emergent themes
linearAnterior <- lm(PADataNoOutlier$Lipid ~ PADataNoOutlier$PSUA)
early <- subset(RPMA2Growth, StockYear < 2006)
Weight5 <- mean(RPMA2GrowthSub$Weight[RPMA2GrowthSub$Age == 5], na.rm = TRUE)
gas <- gas[!(substr(gas$sampleID,3,3) %in% c("b","c")), ]
obsD <- subset(gas, gas$carboy == "D")$N15_N2_Ar
lowerCIBound <- pMat[1:mlleIndex,1][which.min(abs(mlleCI+likelihoods[1:mlleIndex]))]
Data wrangling
Statements of code whose purpose is to prepare a dataset for analysis and / or visualization
Sub-themes
An alternative direction
Your turn!
plot(EarlyLengthAge$meanLE ~ EarlyLengthAge$Age,
las = 1, ylab = "Fork Length (mm)", xlab = "Age")
lines(EarlyLengthAge$meanLE ~ EarlyLengthAge$Age)
points(MidLengthAge$meanLM ~ MidLengthAge$Age, col = "red")
lines(MidLengthAge$meanLM ~ MidLengthAge$Age, col = "red")
legend(15, 600, legend = c("1998-2003", "2006-2017"),
col = c("black", "red"), lty = 1:1, cex = 0.8)
How would you process being enacted in this block of code?
How is that process different from this block of code?
How could this be used?
Concept dependence
How does a student’s concept model of a dataset inform how they filter data?
(atoms; program execution)
Program environment
How do the visualizations produced by students who learn ggplot differ from those who learn “base” R?
(blocks; program execution)
Linguistic structure
How do students name objects they will use later?
(relationships; text)
Learning trajectory
How do students’ exploratory data analyses change over the duration of a course?
(macrostructure; function / purpose)
Why is this important for data science education?
“Data science education faces a multitude of open questions surrounding the teaching and learning of data science, and we posit the horizon of research in data science education critically inspects student learning from the perspective of the learner.” (Theobold, Wickstrom, and Hancock 2023)
How can we distinguish merely interesting learning from effective learning? (Wiggins and McTighe 2005)
Questions?
Practical considerations
How much code should I collect?
How do readers trust my analysis?
Trust comes from: