congress_age %>%
group_by(state) %>%
summarize(mean_age = mean(age))Code Speed
Fortunately, we stand on the shoulders of giants. Many skilled computer scientists have put a lot of time and effort into writing R functions that run as fast as possible.
To speed up the code, without deep knowledge of computer algorithms and inner workings, you can sometimes come up with clever ways to avoid these pitfalls.
First, though: as you start thinking about writing faster code, you’ll need a way to check whether your improvements actually sped up the code.
Stat 541 Only lobstr::obj_size()
Use faster existing functions
Because R has so many packages, there are often many functions to perform the same task. Not all functions are created equal!
data.table
For speeding up work with data frames, no package is better than data.table!
- Here is some
dplyrcode. Re-write it indata.tablesyntax.
- Using either the
tictocormicrobenchmarkpackages perform a speed test to see which code is faster, and by how much.
Tip 3: Only improve what needs improving
“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, [they] will be wise to look carefully at the critical code; but only after that code has been identified.” — Donald Knuth.
Speed and computational efficiency are important, but there can be a trade off. If you make a small speed improvement, but your code becomes overly complex, confusing, or difficult to edit and test it’s not worth it!
Also, consider your time and energy as the programmer. Suppose you have a function that takes 30-minutes to run. It relies on a subfunction that takes 30-seconds to run. Should you really spend your efforts making the subfunction take only 10-seconds?
The art of finding the slow bits of your code where you can make the most improvement is called profiling.
📖 Required Reading: Profiling
📖 Recommended Reading: Measuring Performance
For this reading, I strongly encourage you to skim. The code aspects are mostly more complicated that we expect in this class; but the general overview of the principles behind speeding up code is helpful.
In your work, you will most likely “profile” manually, by tictoc-ing bits of your code and so forth, not by these fancy methods.
- It specifies the output type.
- It specifies the input type.
- It performs fewer safety checks.
- It performs fewer iterations.
- Here are two other ways to compute the square root of a vector. Which is faster? fastest? Use microbenchmarking to test your answers!
- Why is
mean(x)slower thansum(x)/length(x)?
Optional: Super advanced mode
The following are some suggestions of ways you can level-up your code speed. These are outside the scope of this class, but you are welcome to explore them!
- Dive deep into the R efficiency rabbit hole…
- Parallelize your computations
- Make your code run in C++ with Rcpp
- Learn to use recursion
- Learn more about memory allocation and garbage collecting in R.