I am learning linear algebra these days by watching the excellent series of lectures taught by Prof. Gilbert Strang at Youtube. During this journey, I think it would be interesting to look how many view count for all lectures. I expect the view counts will decline for later lectures.
Alright, first load some R packages in order to get data from Youtube.
library(plyr)
library(dplyr)
library(rvest) # for webpage scripting
library(stringr) # string handling
library(ggplot2) # plotting
library(knitr)
Then I searched online to find out the url of the playlist for all lectures. To find the correct CSS part, I followed this tutorial.
# the playlist first
url = html("https://www.youtube.com/playlist?list=PLE7DDD91010BC51F8")
lectures = html_nodes(url, ".yt-uix-tile-link")
# length(lectures) # 35 vedio
# get lecture names
lec_names = html_text(lectures) %>%
sapply(function(x) str_replace(x, "^.*Lec ([b0-9]*) .*", "\\1")) %>%
unname() %>% as.character()
lec_names[lec_names == "24b"] = 24.5
lec_names = as.numeric(lec_names)
Then, get urls for all lectures and extract their view counts.
# get url for all lectures
url_all = ldply(lectures, function(x){
paste0("https://www.youtube.com", html_attr(x, name = "href"))
})
# for each lecture, get the view count
view_all = sapply(url_all$V1, function(x){
print(x)
xx = html(x)
view_count = html_nodes(xx, ".watch-view-count") %>% html_text() %>%
gsub(",", "", .) %>%
as.numeric()
lect_descrip = html_nodes(xx, "#eow-description") %>% html_text() %>%
gsub("^(.*)View the complete.*$", "\\1", .) %>% str_trim()
print(lect_descrip)
list(view_count, as.character(lect_descrip))
})
Now, combine lecture names with their view counts.
# combine lecture names with view count
view = unlist(view_all[(1:length(view_all)) %% 2 == 1])
# remove some notes that start with *.
descrip = unlist(view_all[(1:length(view_all)) %% 2 == 0])
descrip = sapply(descrip, function(x){
if(str_detect(x, "\\*")){
str_replace(x, "^(.*)\\*+.*$", "\\1")
} else{
x
}
})
dat = data_frame(lec = lec_names, view = view, description = descrip)
kable(data.frame(lec = lec_names, view = format(view, big.mark = ","), description = descrip), format = "html")
view | description |
---|---|
1,471,018 | Lecture 1: The Geometry of Linear Equations. |
421,456 | Lecture 2: Elimination with Matrices. |
359,628 | Lecture 3: Multiplication and Inverse Matrices. |
304,766 | Lecture 4: Factorization into A = LU |
210,564 | Lecture 5: Transposes, Permutations, Spaces R^n. |
199,004 | Lecture 6: Column Space and Nullspace. |
151,769 | Lecture 7: Solving Ax = 0: Pivot Variables, Special Solutions. |
138,087 | Lecture 8: Solving Ax = b: Row Reduced Form R. |
151,718 | Lecture 9: Independence, Basis, and Dimension. |
132,971 | Lecture 10: The Four Fundamental Subspaces. |
104,452 | Lecture 11: Matrix Spaces; Rank 1; Small World Graphs. |
82,273 | Lecture 12: Graphs, Networks, Incidence Matrices. |
77,250 | Lecture 13: Quiz 1 Review. |
108,408 | Lecture 14: Orthogonal Vectors and Subspaces. |
99,687 | Lecture 15: Projections onto Subspaces. |
96,329 | Lecture 16: Projection Matrices and Least Squares. |
95,593 | Lecture 17: Orthogonal Matrices and Gram-Schmidt. |
90,094 | Lecture 18: Properties of Determinants. |
79,339 | Lecture 19: Determinant Formulas and Cofactors. |
85,189 | Lecture 20: Cramer's Rule, Inverse Matrix, and Volume. |
159,954 | Lecture 21: Eigenvalues and Eigenvectors. |
109,883 | Lecture 22: Diagonalization and Powers of A. |
84,893 | Lecture 23: Differential Equations and exp(At). |
84,173 | Lecture 24: Markov Matrices; Fourier Series.* |
36,172 | Lecture 24b : Quiz 2 Review.* |
59,755 | Lecture 25: Symmetric Matrices and Positive Definiteness.* |
62,566 | Lecture 26: Complex Matrices; Fast Fourier Transform. |
58,041 | Lecture 27: Positive Definite Matrices and Minima. |
70,082 | Lecture 28: Similar Matrices and Jordan Form. |
85,714 | Lecture 29: Singular Value Decomposition. |
99,162 | Lecture 30: Linear Transformations and Their Matrices. |
61,037 | Lecture 31: Change of Basis; Image Compression. |
36,158 | Lecture 32: Quiz 3 Review. |
55,596 | Lecture 33: Left and Right Inverses; Pseudoinverse. |
50,540 | Lecture 34: Final Course Review. |
Finally, let’s plot it.
# plot
ggplot(dat, aes(x = lec, y = view)) +
geom_point(color = "red", size = 2) +
geom_line(color = "blue") +
labs(x = "Lectures", y = "Youtube view count",
title = "Youtube view counts of Linear Algebra lectures taught by
Gilbert Strang, Srping 2005")
Wow, the first lecture has 1,471,030 by far (2015-06-21-23:00 Central Time)! However, the view count of the second lecture is about one million lower than the first one. It will be interesting to find out why lecture 21 and 22 have more view counts than their neighbors (I am getting their, at lecture 14 now! – Eigenvalues!). The last lecture has about 50K views. Does this mean about 50K people finished all lectures?
It clearly shows how hard it is to be persistent.