4 min read

Youtube view counts of Linear Algebra lectures

I am learning linear algebra these days by watching the excellent series of lectures taught by Prof. Gilbert Strang at Youtube. During this journey, I think it would be interesting to look how many view count for all lectures. I expect the view counts will decline for later lectures.

Alright, first load some R packages in order to get data from Youtube.

library(plyr)
library(dplyr)
library(rvest) # for webpage scripting
library(stringr) # string handling
library(ggplot2) # plotting
library(knitr)

Then I searched online to find out the url of the playlist for all lectures. To find the correct CSS part, I followed this tutorial.

# the playlist first
url = html("https://www.youtube.com/playlist?list=PLE7DDD91010BC51F8")
lectures = html_nodes(url, ".yt-uix-tile-link")
# length(lectures) # 35 vedio
# get lecture names
lec_names = html_text(lectures) %>% 
  sapply(function(x) str_replace(x, "^.*Lec ([b0-9]*) .*", "\\1")) %>% 
  unname() %>% as.character()
lec_names[lec_names == "24b"] = 24.5
lec_names = as.numeric(lec_names)

Then, get urls for all lectures and extract their view counts.

# get url for all lectures
url_all = ldply(lectures, function(x){
  paste0("https://www.youtube.com", html_attr(x, name = "href"))
})

# for each lecture, get the view count
view_all = sapply(url_all$V1, function(x){
  print(x)
  xx = html(x)
  view_count = html_nodes(xx, ".watch-view-count") %>% html_text() %>%
    gsub(",", "", .) %>% 
    as.numeric()
  lect_descrip = html_nodes(xx, "#eow-description") %>% html_text() %>% 
    gsub("^(.*)View the complete.*$", "\\1", .) %>% str_trim()
  print(lect_descrip)
  list(view_count, as.character(lect_descrip))
})

Now, combine lecture names with their view counts.

# combine lecture names with view count
view = unlist(view_all[(1:length(view_all)) %% 2 == 1])
# remove some notes that start with *.
descrip = unlist(view_all[(1:length(view_all)) %% 2 == 0])
descrip = sapply(descrip, function(x){
 if(str_detect(x, "\\*")){
   str_replace(x, "^(.*)\\*+.*$", "\\1")
 } else{
   x
 }
})
dat = data_frame(lec = lec_names, view = view, description = descrip)
kable(data.frame(lec = lec_names, view = format(view, big.mark = ","), description = descrip), format = "html")
view description
1,471,018 Lecture 1: The Geometry of Linear Equations.
421,456 Lecture 2: Elimination with Matrices.
359,628 Lecture 3: Multiplication and Inverse Matrices.
304,766 Lecture 4: Factorization into A = LU
210,564 Lecture 5: Transposes, Permutations, Spaces R^n.
199,004 Lecture 6: Column Space and Nullspace.
151,769 Lecture 7: Solving Ax = 0: Pivot Variables, Special Solutions.
138,087 Lecture 8: Solving Ax = b: Row Reduced Form R.
151,718 Lecture 9: Independence, Basis, and Dimension.
132,971 Lecture 10: The Four Fundamental Subspaces.
104,452 Lecture 11: Matrix Spaces; Rank 1; Small World Graphs.
82,273 Lecture 12: Graphs, Networks, Incidence Matrices.
77,250 Lecture 13: Quiz 1 Review.
108,408 Lecture 14: Orthogonal Vectors and Subspaces.
99,687 Lecture 15: Projections onto Subspaces.
96,329 Lecture 16: Projection Matrices and Least Squares.
95,593 Lecture 17: Orthogonal Matrices and Gram-Schmidt.
90,094 Lecture 18: Properties of Determinants.
79,339 Lecture 19: Determinant Formulas and Cofactors.
85,189 Lecture 20: Cramer's Rule, Inverse Matrix, and Volume.
159,954 Lecture 21: Eigenvalues and Eigenvectors.
109,883 Lecture 22: Diagonalization and Powers of A.
84,893 Lecture 23: Differential Equations and exp(At).
84,173 Lecture 24: Markov Matrices; Fourier Series.*
36,172 Lecture 24b : Quiz 2 Review.*
59,755 Lecture 25: Symmetric Matrices and Positive Definiteness.*
62,566 Lecture 26: Complex Matrices; Fast Fourier Transform.
58,041 Lecture 27: Positive Definite Matrices and Minima.
70,082 Lecture 28: Similar Matrices and Jordan Form.
85,714 Lecture 29: Singular Value Decomposition.
99,162 Lecture 30: Linear Transformations and Their Matrices.
61,037 Lecture 31: Change of Basis; Image Compression.
36,158 Lecture 32: Quiz 3 Review.
55,596 Lecture 33: Left and Right Inverses; Pseudoinverse.
50,540 Lecture 34: Final Course Review.

Finally, let’s plot it.

# plot
ggplot(dat, aes(x = lec, y = view)) +
  geom_point(color = "red", size = 2) + 
  geom_line(color = "blue") +
  labs(x = "Lectures", y = "Youtube view count",
       title = "Youtube view counts of Linear Algebra lectures taught by 
       Gilbert Strang, Srping 2005")

Imgur

Wow, the first lecture has 1,471,030 by far (2015-06-21-23:00 Central Time)! However, the view count of the second lecture is about one million lower than the first one. It will be interesting to find out why lecture 21 and 22 have more view counts than their neighbors (I am getting their, at lecture 14 now! – Eigenvalues!). The last lecture has about 50K views. Does this mean about 50K people finished all lectures?

It clearly shows how hard it is to be persistent.