# code chunk for loading packages
Lab 3: Student Evaluations of Teaching
In this lab, we will be using the dplyr
package to explore student evaluations of teaching data. You are expected to use functions from dplyr
to do your data manipulation!
Part 1: GitHub Workflow
Step 1: Making a Copy from GitHub Classroom
You can access the Lab 3 materials through the Lab 3 assignment on GitHub Classroom. We’re going to follow the same steps as last week to make your own copy of this repository:
Use these steps to make a copy of the Lab 2 repository: List of Steps to Copy the Lab Assignment from GitHub Classroom
Step 2: Making a Small Change
Now, find the lab-3-student.qmd
file in the “Files” tab in the lower right hand corner. Click on this file to open it.
At the top of the document (in the YAML) there is an author
line that says "Your name here!"
. Change this to be your name and save your file either by clicking on the blue floppy disk or with a shortcut (command / control + s).
Step 3: Pushing Your Lab to GitHub
Now for our last step, we need to commit the files to our repo.
- Click the “Git” tab in upper right pane
- Check the “Staged” box for the
lab-3-student.qmd
file - Click “Commit”
- In the box that opens, type a message in “Commit message”, such as “Added my name”.
- Click “Commit”.
- Click the green “Push” button to send your local changes to GitHub.
RStudio will display something like:
>>> /usr/bin/git push origin HEAD:refs/heads/main
To https://github.com/atheobold/introduction-to-quarto-allison-theobold.git
3a2171f..6d58539 HEAD -> main
Step 4: Let’s get started wranging some data!
Part 2: Some Words of Advice
Part of learning to program is learning from a variety of resources. Thus, I expect you will use resources that you find on the internet. There is, however, an important balance between copying someone else’s code and using their code to learn. Therefore, if you use external resources, I want to know about it.
If you used Google, you are expected to “inform” me of any resources you used by pasting the link to the resource in a code comment next to where you used that resource.
If you used ChatGPT, you are expected to “inform” me of the assistance you received by (1) indicating somewhere in the problem that you used ChatGPT (e.g., below the question prompt or as a code comment), and (2) downloading and attaching the
.txt
file containing your entire conversation with ChatGPT.
Additionally, you are permitted and encouraged to work with your peers as you complete lab assignments, but you are expected to do your own work. Copying from each other is cheating, and letting people copy from you is also cheating. Please don’t do either of those things.
Setting Up Your Code Chunks
- The first chunk of your Quarto document should be to declare your libraries (probably only
tidyverse
for now). - The second chunk of your Quarto document should be to load in your data.
Save Regularly, Render Often
- Be sure to save your work regularly.
- Be sure to render your file every so often, to check for errors and make sure it looks nice.
- Make sure your Quarto document does not contain
View(dataset)
orinstall.packages("package")
, both of these will prevent rendering. - Check your Quarto document for occasions when you looked at the data by typing the name of the data frame. Leaving these in means the whole dataset will print out and this looks unprofessional. Remove these!
- If all else fails, you can set your execution options to
error: true
, which will allow the file to render even if errors are present.
- Make sure your Quarto document does not contain
Part 3: Let’s Start Working with the Data!
The Data
The teacher_evals
dataset contains student evaluations of reaching (SET) collected from students at a University in Poland. There are SET surveys from students in all fields and all levels of study offered by the university. 1
The SET questionnaire that every student at this university completes is as follows:
Evaluation survey of the teaching staff of University of Poland. Please complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way: 5 - I strongly agree; 4 - I agree; 3 - Neutral; 2 - I don’t agree; 1 - I strongly don’t agree.
Question 1: I learned a lot during the course.
Question 2: I think that the knowledge acquired during the course is very useful.
Question 3: The professor used activities to make the class more engaging.
Question 4: If it was possible, I would enroll for a course conducted by this lecturer again.
Question 5: The classes started on time.
Question 6: The lecturer always used time efficiently.
Question 7: The lecturer delivered the class content in an understandable and efficient way.
Question 8: The lecturer was available when we had doubts.
Question 9. The lecturer treated all students equally regardless of their race, background and ethnicity.
These data are from the end of the winter semester of the 2020-2021 academic year. In the period of data collection, all university classes were entirely online amid the COVID-19 pandemic. While expected learning outcomes were not changed, the online mode of study could have affected grading policies and could have implications for data.
Average SET scores were combined with many other variables, including:
- characteristics of the teacher (degree, seniority, gender, SET scores in the past 6 semesters).
- characteristics of the course (time of day, day of the week, course type, course breadth, class duration, class size).
- percentage of students providing SET feedback.
- course grades (mean, standard deviation, percentage failed for the current course and previous 6 semesters).
This rich dataset allows us to investigate many of the biases in student evaluations of teaching that have been reported in the literature and to formulate new hypotheses.
Before tackling the problems below, study the description of each variable included in the teacher_evals_codebook.pdf
inside the “resources” folder.
1. Load the appropriate R packages for your analysis.
2. Load in the teacher_evals
data. Hint: You should use the here()
function from the here package!
# code chunk for importing the data
Data Inspection + Summary
3. Provide a brief overview (~4 sentences) of the dataset.
It is always good practice to start an analysis by getting a feel for the data and providing a quick summary for readers. You do not need to show any code for this question, although you probably want to use code to get some information about the data (e.g., summary(data)
, glimpse(data)
, dim(data)
, etc.). Things to think about – where did the data come from? what sort of data are provided (context and data type)? how much data do you have? etc.
# you may want to use code to answer this question
4. What is the unit of observation (i.e. a single row in the dataset) identified by?
It is not one instructor per row! It’s also not just one class per row!
# you may want to use code to answer this question
5. Use one dplyr
pipeline to clean the data by:
- renaming the
gender
variablesex
- removing all courses with fewer than 10 respondents
- changing data types in whichever way you see fit (e.g., is the instructor ID really a numeric data type?)
- only keeping the columns we will use –
course_id
,teacher_id
,question_no
,no_participants
,resp_share
,SET_score_avg
,percent_failed_cur
,academic_degree
,seniority
, andsex
Assign your cleaned data to a new variable named teacher_evals_clean
–- use these data going forward.
Helpful functions: rename()
, mutate()
, as.factor()
It would be most efficient to use across()
in combination with mutate()
to complete this task.
# code chunk for Q4
5. How many unique instructors and unique courses are present in the cleaned dataset?
Helpful functions: summarize()
, n_distinct()
# code chunk for Q5
6. One teacher-course combination has some missing values, coded as NA
. Which instructor has these missing values? Which course? What variable are the missing values in?
Helpful functions: filter()
, if_any()
# code chunk for Q6
7. What are the demographics of the instructors in this study? Investigate the variables academic_degree
, seniority
, and sex
and summarize your findings in ~3 complete sentences.
You’ll need to first wrangle your data to have each instructor represented only once.
Helpful functions: select()
, distinct(___, .keep_all = TRUE)
, count()
, summary()
# code chunk for Q7
8. Each course seems to have used a different subset of the nine evaluation questions. How many teacher-course combinations asked all nine questions?
# code chunk for Q8
Rate my Professor
Helpful functions: filter()
, group_by()
, summarize()
, slice()
Useful variables: question_no
, teacher_id
, SET_score_avg
, percent_failed_cur
, resp_share
, academic_degree
, seniority
, sex
9. Which instructors had the highest and lowest average rating for Question 1 (I learnt a lot during the course.) across all their courses?
# code chunk for Q9
10. Which instructors with one year of experience had the highest and lowest average percentage of students failing in the current semester across all their courses?
# code chunk for Q10
11. Which female instructors with either a doctorate or professor degree had the highest and lowest average percent of students responding to the evaluation across all their courses?
# code chunk for Q11
Footnotes
Citation: journal, Under blind review in refereed. University SET data, with faculty and courses characteristics. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2021-09-12. https://doi.org/10.3886/E149801V1↩︎