Lab 3: Student Evaluations of Teaching

Author

Instructions

Part One: Set-up

In this lab, we will be using the dplyr package to explore student evaluations of teaching data. You are expected to use functions from dplyr to do your data manipulation!

Accessing the Lab

Download the template Lab 3 Quarto file here: lab-3-student.qmd

Download the data for the lab here: teacher_evals.csv

Important

Be sure to save both files in the Lab 3 folder, inside your Week 3 folder, inside your STAT 331 folder!

Part Two: Some Words of Advice

Part of learning to program is learning from a variety of resources. Thus, I expect you will use resources that you find on the internet. There is, however, an important balance between copying someone else’s code and using their code to learn. Therefore, if you use external resources, I want to know about it.

  • If you used Google, you are expected to “inform” me of any resources you used by pasting the link to the resource in a code comment next to where you used that resource.

  • If you used ChatGPT, you are expected to “inform” me of the assistance you received by (1) indicating somewhere in the problem that you used ChatGPT (e.g., below the question prompt or as a code comment), and (2) including a link to the conversation you had with Chat.

    • Note: If I cannot open the link to your conversation with Chat, you will earn a “Growing” on that problem and will be required to submit a revised link.

Additionally, you are permitted and encouraged to work with your peers as you complete lab assignments, but you are expected to do your own work. Copying from each other is cheating, and letting people copy from you is also cheating. Please don’t do either of those things.

Setting Up Your Code Chunks

  • The first chunk of your Quarto document should be to declare your libraries (probably only tidyverse for now).
  • The second chunk of your Quarto document should be to load in your data.

Save Regularly, Render Often

  • Be sure to save your work regularly.
  • Be sure to render your file every so often, to check for errors and make sure it looks nice.
    • Make sure your Quarto document does not contain View(dataset) or install.packages("package"), both of these will prevent rendering.
    • Check your Quarto document for occasions when you looked at the data by typing the name of the data frame. Leaving these in means the whole dataset will print out and this looks unprofessional. Remove these!
    • If all else fails, you can set your execution options to error: true, which will allow the file to render even if errors are present.

Part Three: Let’s Start Working with the Data!

The Data

The teacher_evals dataset contains student evaluations of reaching (SET) collected from students at a University in Poland. There are SET surveys from students in all fields and all levels of study offered by the university. 1

The SET questionnaire that every student at this university completes is as follows:

Evaluation survey of the teaching staff of University of Poland. Please complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way:

5: I strongly agree

4: I agree

3: Neutral

2: I don’t agree

1: I strongly don’t agree

Question 1: I learned a lot during the course.

Question 2: I think that the knowledge acquired during the course is very useful.

Question 3: The professor used activities to make the class more engaging.

Question 4: If it was possible, I would enroll for a course conducted by this lecturer again.

Question 5: The classes started on time.

Question 6: The lecturer always used time efficiently.

Question 7: The lecturer delivered the class content in an understandable and efficient way.

Question 8: The lecturer was available when we had doubts.

Question 9. The lecturer treated all students equally regardless of their race, background and ethnicity.

These data are from the end of the winter semester of the 2020-2021 academic year. In the period of data collection, all university classes were entirely online amid the COVID-19 pandemic. While expected learning outcomes were not changed, the online mode of study could have affected grading policies and could have implications for data.

Average SET scores were combined with many other variables, including:

  1. characteristics of the teacher (degree, seniority, gender, SET scores in the past 6 semesters).
  2. characteristics of the course (time of day, day of the week, course type, course breadth, class duration, class size).
  3. percentage of students providing SET feedback.
  4. course grades (mean, standard deviation, percentage failed for the current course and previous six semesters).

This rich dataset allows us to investigate many of the biases in student evaluations of teaching that have been reported in the literature and to formulate new hypotheses.

Before tackling the problems below, study the description of each variable in the data dictionary here. You will want to have this open and refer back to it!

1. Load the appropriate R packages for your analysis. Hint: The ggplot2 package and the readr package can both be loaded with the tidyverse package.

```{r}
#| label: packages
# Code to load in packages necessary for the analysis
```

2. Load in the teacher_evals data. Hint: You should use the here() function from the here package!

```{r}
#| label: load-data
# Code to load in the data for the analysis
```

Data Inspection + Summary

3. Provide a brief overview (~4 sentences) of the dataset.

Note

It is always good practice to start an analysis by getting a feel for the data and providing a quick summary for readers. You do not need to show any code for this question, although you probably want to use code to get some information about the data (e.g., summary(data), glimpse(data), dim(data), etc.). Things you should think about:

– where did the data come from? - what is the context of the data? - how much data do you have? - what sort of data (e.g., variables, data type)?

4. What is the unit of observation (i.e. a single row in the dataset) identified by?

Warning

It is not one instructor per row! It’s also not just one class per row!

5. Use one dplyr pipeline to clean the data by:

  • renaming the gender variable to sex
  • removing all courses with fewer than 10 respondents
  • changing data types in whichever way you see fit (e.g., is the instructor ID really a numeric data type?)
  • only keeping the columns we will use – course_id, teacher_id, question_no, no_participants, resp_share, SET_score_avg, percent_failed_cur, academic_degree, seniority, and sex

Assign your cleaned data to a new variable named teacher_evals_clean –- use these data going forward.

Tip

Helpful functions: rename(), mutate(), as.character()

It would be most efficient to use across() in combination with mutate() if you are changing the data type of three or more variables.

6. How many unique instructors and unique courses are present in the cleaned dataset?

Tip

Helpful functions: summarize(), n_distinct()

7. What are the demographics of the instructors in this study? Investigate the variables academic_degree, seniority, and sex and summarize your findings in ~3 complete sentences.

Tip

You’ll need to first wrangle your data to have each instructor represented only once.

Helpful functions: select(), distinct(___, .keep_all = TRUE), count(), summarize()

8. One teacher-course combination has some missing values, coded as NA. Which instructor has these missing values? Which course? What variable are the missing values in?

Tip

Helpful functions: filter(), if_any(), is.na()

9. Each course seems to have used a different subset of the nine evaluation questions. How many teacher-course combinations asked all nine questions?

Rate my Professor

Tip

Helpful functions: filter(), group_by(), summarize(), slice()

Useful variables: question_no, teacher_id, SET_score_avg, percent_failed_cur, resp_share, academic_degree, seniority, sex

10. Which instructors who had at least five courses reviewed in the data had the highest and lowest average rating for Question 1 (I learnt a lot during the course.) across all their courses?

11. Which instructors with one year of experience had the highest and lowest average percentage of students failing in the current semester across all their courses?

12. Which instructor(s) with either a doctorate or professor degree had the highest and lowest average percent of students responding to the evaluation across all their courses? Include how many years the instructor had worked (seniority) and their sex in your output.

ImportantEfficiency

This week, we are thinking about efficiency in terms of the number of times a function is called. If you are using the same function multiple times, I would strongly recommend considering if you can collapse these into one function call.

Footnotes

  1. Citation: journal, Under blind review in refereed. University SET data, with faculty and courses characteristics. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2021-09-12. https://doi.org/10.3886/E149801V1↩︎