Lab 3: Student Evaluations of Teaching

Author

Instructions

In this lab, we will be using the dplyr package to explore student evaluations of teaching data. You are expected to use functions from dplyr to do your data manipulation!

Part 1: GitHub Workflow

Step 1: Making a Copy from GitHub Classroom

You can access the Lab 3 materials through the Lab 3 assignment on GitHub Classroom. We’re going to follow the same steps as last week to make your own copy of this repository:

Same process as last week!

Use these steps to make a copy of the Lab 2 repository: List of Steps to Copy the Lab Assignment from GitHub Classroom

Step 2: Making a Small Change

Now, find the lab-3-student.qmd file in the “Files” tab in the lower right hand corner. Click on this file to open it.

At the top of the document (in the YAML) there is an author line that says "Your name here!". Change this to be your name and save your file either by clicking on the blue floppy disk or with a shortcut (command / control + s).

Step 3: Pushing Your Lab to GitHub

Now for our last step, we need to commit the files to our repo.

Click the “Git” tab in upper right pane
Check the “Staged” box for the lab-3-student.qmd file
Click “Commit”
In the box that opens, type a message in “Commit message”, such as “Added my name”.
Click “Commit”.
Click the green “Push” button to send your local changes to GitHub.

RStudio will display something like:

>>> /usr/bin/git push origin HEAD:refs/heads/main
To https://github.com/atheobold/introduction-to-quarto-allison-theobold.git
   3a2171f..6d58539  HEAD -> main

Step 4: Let’s get started wranging some data!

Part 2: Some Words of Advice

Part of learning to program is learning from a variety of resources. Thus, I expect you will use resources that you find on the internet. There is, however, an important balance between copying someone else’s code and using their code to learn. Therefore, if you use external resources, I want to know about it.

If you used Google, you are expected to “inform” me of any resources you used by pasting the link to the resource in a code comment next to where you used that resource.
If you used ChatGPT, you are expected to “inform” me of the assistance you received by (1) indicating somewhere in the problem that you used ChatGPT (e.g., below the question prompt or as a code comment), and (2) downloading and attaching the .txt file containing your entire conversation with ChatGPT.

Additionally, you are permitted and encouraged to work with your peers as you complete lab assignments, but you are expected to do your own work. Copying from each other is cheating, and letting people copy from you is also cheating. Please don’t do either of those things.

Setting Up Your Code Chunks

The first chunk of your Quarto document should be to declare your libraries (probably only tidyverse for now).
The second chunk of your Quarto document should be to load in your data.

Save Regularly, Render Often

Be sure to save your work regularly.
Be sure to render your file every so often, to check for errors and make sure it looks nice.
- Make sure your Quarto document does not contain View(dataset) or install.packages("package"), both of these will prevent rendering.
- Check your Quarto document for occasions when you looked at the data by typing the name of the data frame. Leaving these in means the whole dataset will print out and this looks unprofessional. Remove these!
- If all else fails, you can set your execution options to error: true, which will allow the file to render even if errors are present.

Part 3: Let’s Start Working with the Data!

The Data

The teacher_evals dataset contains student evaluations of reaching (SET) collected from students at a University in Poland. There are SET surveys from students in all fields and all levels of study offered by the university. ¹

The SET questionnaire that every student at this university completes is as follows:

Evaluation survey of the teaching staff of University of Poland. Please complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way: 5 - I strongly agree; 4 - I agree; 3 - Neutral; 2 - I don’t agree; 1 - I strongly don’t agree.

Question 1: I learned a lot during the course.

Question 2: I think that the knowledge acquired during the course is very useful.

Question 3: The professor used activities to make the class more engaging.

Question 4: If it was possible, I would enroll for a course conducted by this lecturer again.

Question 5: The classes started on time.

Question 6: The lecturer always used time efficiently.

Question 7: The lecturer delivered the class content in an understandable and efficient way.

Question 8: The lecturer was available when we had doubts.

Question 9. The lecturer treated all students equally regardless of their race, background and ethnicity.

These data are from the end of the winter semester of the 2020-2021 academic year. In the period of data collection, all university classes were entirely online amid the COVID-19 pandemic. While expected learning outcomes were not changed, the online mode of study could have affected grading policies and could have implications for data.

Average SET scores were combined with many other variables, including:

characteristics of the teacher (degree, seniority, gender, SET scores in the past 6 semesters).
characteristics of the course (time of day, day of the week, course type, course breadth, class duration, class size).
percentage of students providing SET feedback.
course grades (mean, standard deviation, percentage failed for the current course and previous 6 semesters).

This rich dataset allows us to investigate many of the biases in student evaluations of teaching that have been reported in the literature and to formulate new hypotheses.

Before tackling the problems below, study the description of each variable included in the teacher_evals_codebook.pdf inside the “resources” folder.

1. Load the appropriate R packages for your analysis.

# code chunk for loading packages

2. Load in the teacher_evals data. Hint: You should use the here() function from the here package!

# code chunk for importing the data

Data Inspection + Summary

3. Provide a brief overview (~4 sentences) of the dataset.

Note

It is always good practice to start an analysis by getting a feel for the data and providing a quick summary for readers. You do not need to show any code for this question, although you probably want to use code to get some information about the data (e.g., summary(data), glimpse(data), dim(data), etc.). Things to think about – where did the data come from? what sort of data are provided (context and data type)? how much data do you have? etc.

# you may want to use code to answer this question

4. What is the unit of observation (i.e. a single row in the dataset) identified by?

Warning

It is not one instructor per row! It’s also not just one class per row!

# you may want to use code to answer this question

5. Use one dplyr pipeline to clean the data by:

renaming the gender variable sex
removing all courses with fewer than 10 respondents
changing data types in whichever way you see fit (e.g., is the instructor ID really a numeric data type?)
only keeping the columns we will use – course_id, teacher_id, question_no, no_participants, resp_share, SET_score_avg, percent_failed_cur, academic_degree, seniority, and sex

Assign your cleaned data to a new variable named teacher_evals_clean –- use these data going forward.

Tip

Helpful functions: rename(), mutate(), as.factor()

It would be most efficient to use across() in combination with mutate() to complete this task.

# code chunk for Q4

5. How many unique instructors and unique courses are present in the cleaned dataset?

Tip

Helpful functions: summarize(), n_distinct()

# code chunk for Q5

6. One teacher-course combination has some missing values, coded as NA. Which instructor has these missing values? Which course? What variable are the missing values in?

Tip

Helpful functions: filter(), if_any()

# code chunk for Q6

7. What are the demographics of the instructors in this study? Investigate the variables academic_degree, seniority, and sex and summarize your findings in ~3 complete sentences.

Tip

You’ll need to first wrangle your data to have each instructor represented only once.

Helpful functions: select(), distinct(___, .keep_all = TRUE), count(), summary()

# code chunk for Q7

8. Each course seems to have used a different subset of the nine evaluation questions. How many teacher-course combinations asked all nine questions?

# code chunk for Q8

Rate my Professor

Tip

Helpful functions: filter(), group_by(), summarize(), slice()

Useful variables: question_no, teacher_id, SET_score_avg, percent_failed_cur, resp_share, academic_degree, seniority, sex

9. Which instructors had the highest and lowest average rating for Question 1 (I learnt a lot during the course.) across all their courses?

# code chunk for Q9

10. Which instructors with one year of experience had the highest and lowest average percentage of students failing in the current semester across all their courses?

# code chunk for Q10

11. Which female instructors with either a doctorate or professor degree had the highest and lowest average percent of students responding to the evaluation across all their courses?

# code chunk for Q11

Footnotes

Citation: journal, Under blind review in refereed. University SET data, with faculty and courses characteristics. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2021-09-12. https://doi.org/10.3886/E149801V1 ↩︎

--- title: "Lab 3: Student Evaluations of Teaching" author: "Instructions" format: html: embed-resources: true code-tools: true toc: true editor: source execute: echo: true warning: false message: false --- In this lab, we will be using the `dplyr` package to explore student evaluations of teaching data. **You are expected to use functions from `dplyr` to do your data manipulation!** # Part 1: GitHub Workflow ## Step 1: Making a Copy from GitHub Classroom You can access the Lab 3 materials through the [Lab 3 assignment on GitHub Classroom](https://classroom.github.com/a/qwGgpcu4). We're going to follow the same steps as last week to make your own copy of this repository:  ::: {.callout-tip collapse="true"} # Same process as last week! Use these steps to make a copy of the Lab 2 repository: [List of Steps to Copy the Lab Assignment from GitHub Classroom](https://scribehow.com/shared/Copying_Labs_from_GitHub_into_RStudio__ATprsf2MTYmuPP86Xna1Eg) ::: ## Step 2: Making a Small Change Now, find the `lab-3-student.qmd` file in the "Files" tab in the lower right hand corner. Click on this file to open it. At the top of the document (in the YAML) there is an `author` line that says `"Your name here!"`. Change this to be your name and save your file either by clicking on the blue floppy disk or with a shortcut (command / control + s). ## Step 3: Pushing Your Lab to GitHub Now for our last step, we need to [commit the files to our repo](https://happygitwithr.com/existing-github-first#stage-and-commit). - Click the "Git" tab in upper right pane - Check the "Staged" box for the `lab-3-student.qmd` file - Click "Commit" - In the box that opens, type a message in "Commit message", such as "Added my name". - Click "Commit". - Click the green "Push" button to send your local changes to GitHub. RStudio will display something like: ``` >>> /usr/bin/git push origin HEAD:refs/heads/main To https://github.com/atheobold/introduction-to-quarto-allison-theobold.git 3a2171f..6d58539 HEAD -> main ``` ## Step 4: Let's get started wranging some data! # Part 2: Some Words of Advice Part of learning to program is learning from a variety of resources. Thus, I expect you will use resources that you find on the internet. There is, however, an important balance between copying someone else's code and *using their code to learn*. Therefore, if you use external resources, I want to know about it. - If you used Google, you are expected to "inform" me of any resources you used by **pasting the link to the resource in a code comment next to where you used that resource**. - If you used ChatGPT, you are expected to "inform" me of the assistance you received by (1) indicating somewhere in the problem that you used ChatGPT (e.g., below the question prompt or as a code comment), and (2) downloading and attaching the `.txt` file containing your **entire** conversation with ChatGPT. Additionally, you are permitted and encouraged to work with your peers as you complete lab assignments, but **you are expected to do your own work**. Copying from each other is cheating, and letting people copy from you is also cheating. Please don't do either of those things. ## Setting Up Your Code Chunks - The first chunk of your Quarto document should be to *declare your libraries* (probably only `tidyverse` for now). - The second chunk of your Quarto document should be to *load in your data*. ## Save Regularly, Render Often - Be sure to **save** your work regularly. - Be sure to **render** your file every so often, to check for errors and make sure it looks nice. - Make sure your Quarto document does not contain `View(dataset)` or `install.packages("package")`, both of these will prevent rendering. - Check your Quarto document for occasions when you looked at the data by typing the name of the data frame. Leaving these in means the whole dataset will print out and this looks unprofessional. **Remove these!** - If all else fails, you can set your execution options to `error: true`, which will allow the file to render even if errors are present. # Part 3: Let's Start Working with the Data! ## The Data The `teacher_evals` dataset contains student evaluations of reaching (SET) collected from students at a University in Poland. There are SET surveys from students in all fields and all levels of study offered by the university. [^1] [^1]: Citation: journal, Under blind review in refereed. *University SET data, with faculty and courses characteristics.* Ann Arbor, MI: Inter-university Consortium for Political and Social Research \[distributor\], 2021-09-12. <https://doi.org/10.3886/E149801V1> The SET questionnaire that every student at this university completes is as follows: > Evaluation survey of the teaching staff of University of Poland. Please complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way: 5 - I strongly agree; 4 - I agree; 3 - Neutral; 2 - I don’t agree; 1 - I strongly don’t agree. > > Question 1: I learned a lot during the course. > > Question 2: I think that the knowledge acquired during the course is very useful. > > Question 3: The professor used activities to make the class more engaging. > > Question 4: If it was possible, I would enroll for a course conducted by this lecturer again. > > Question 5: The classes started on time. > > Question 6: The lecturer always used time efficiently. > > Question 7: The lecturer delivered the class content in an understandable and efficient way. > > Question 8: The lecturer was available when we had doubts. > > Question 9. The lecturer treated all students equally regardless of their race, background and ethnicity. These data are from the end of the winter semester of the 2020-2021 academic year. In the period of data collection, all university classes were entirely online amid the COVID-19 pandemic. While expected learning outcomes were not changed, the online mode of study could have affected grading policies and could have implications for data. **Average SET scores** were combined with many other variables, including: 1. **characteristics of the teacher** (degree, seniority, gender, SET scores in the past 6 semesters). 2. **characteristics of the course** (time of day, day of the week, course type, course breadth, class duration, class size). 3. **percentage of students providing SET feedback.** 4. **course grades** (mean, standard deviation, percentage failed for the current course and previous 6 semesters). This rich dataset allows us to **investigate many of the biases in student evaluations of teaching** that have been reported in the literature and to formulate new hypotheses. Before tackling the problems below, study the description of each variable included in the `teacher_evals_codebook.pdf` inside the "resources" folder. **1. Load the appropriate R packages for your analysis.** ```{r} #| label: setup # code chunk for loading packages ``` **2. Load in the `teacher_evals` data.** *Hint: You should use the `here()` function from the __here__ package!* ```{r} #| label: load-data # code chunk for importing the data ``` ### Data Inspection + Summary **3. Provide a brief overview (\~4 sentences) of the dataset.** ::: callout-note It is always good practice to start an analysis by getting a feel for the data and providing a quick summary for readers. You do not need to show any code for this question, although you probably want to use code to get some information about the data (e.g., `summary(data)`, `glimpse(data)`, `dim(data)`, etc.). Things to think about – where did the data come from? what sort of data are provided (context and data type)? how much data do you have? etc. ::: ```{r} #| label: explore-data # you may want to use code to answer this question ``` **4. What is the unit of observation (i.e. a single row in the dataset) identified by?** ::: callout-warning It is not one instructor per row! It’s also not just one class per row! ::: ```{r} #| label: row-identification # you may want to use code to answer this question ``` **5. Use _one_ `dplyr` pipeline to clean the data by:** - **renaming the `gender` variable `sex`** - **removing all courses with fewer than 10 respondents** - **changing data types in whichever way you see fit (e.g., is the instructor ID really a numeric data type?)** - **only keeping the columns we will use -- `course_id`, `teacher_id`, `question_no`, `no_participants`, `resp_share`, `SET_score_avg`, `percent_failed_cur`, `academic_degree`, `seniority`, and `sex`** **Assign your cleaned data to a new variable named `teacher_evals_clean` –- use these data going forward.** ::: callout-tip Helpful functions: `rename()`, `mutate()`, `as.factor()` It would be most efficient to use `across()` in combination with `mutate()` to complete this task. ::: ```{r} #| label: data-cleaning # code chunk for Q4 ``` **5. How many unique instructors and unique courses are present in the cleaned dataset?** ::: callout-tip Helpful functions: `summarize()`, `n_distinct()` ::: ```{r} #| label: unique-courses # code chunk for Q5 ``` **6. One teacher-course combination has some missing values, coded as `NA`. Which instructor has these missing values? Which course? What variable are the missing values in?** ::: callout-tip Helpful functions: `filter()`, `if_any()` ::: ```{r} #| label: uncovering-missing-values # code chunk for Q6 ``` **7. What are the demographics of the instructors in this study? Investigate the variables `academic_degree`, `seniority`, and `sex` and summarize your findings in \~3 complete sentences.** ::: callout-tip You’ll need to first wrangle your data to have each instructor represented only once. Helpful functions: `select()`, `distinct(___, .keep_all = TRUE)`, `count()`, `summary()` ::: ```{r} #| label: exploring-demographics-of-instructors # code chunk for Q7 ``` **8. Each course seems to have used a different subset of the nine evaluation questions. How many teacher-course combinations asked all nine questions?** ```{r} #| label: teacher-course-asked-every-question # code chunk for Q8 ``` ## Rate my Professor ::: callout-tip Helpful functions: `filter()`, `group_by()`, `summarize()`, `slice()` Useful variables: `question_no`, `teacher_id`, `SET_score_avg`, `percent_failed_cur`, `resp_share`, `academic_degree`, `seniority`, `sex` ::: **9. Which instructors had the highest and lowest average rating for Question 1 (I learnt a lot during the course.) across all their courses?** ```{r} #| label: question-1-high-low # code chunk for Q9 ``` **10. Which instructors with one year of experience had the highest and lowest average percentage of students failing in the current semester across all their courses?** ```{r} #| label: one-year-experience-failing-students # code chunk for Q10 ``` **11. Which female instructors with either a doctorate or professor degree had the highest and lowest average percent of students responding to the evaluation across all their courses?** ```{r} #| label: female-instructor-student-response # code chunk for Q11 ```