Resources
Places for learning R and Data Science:
Swirl: Learn R in R. It sounds hard but it really isn’t.
DataCamp: A little heavy with hand-holding but it is great for beginners.
Coursera R programming course
Udemy R programming course
Introduction to R (book) by Alex Douglas, Deon Roos, Francesca Mancini, Ana Couto & David Lusseau
Rafael Irizarry Teaching Materials: Harvard statistics professor’s amazing collection of teaching materials.
BookDown: Collection of free open source books written by some of the top people. Especially check out these ones:
- R for data science by Hadley Wickham and Garett Grolemund
- Hands-on programming with R by Garett Grolemund
- R programming for data science by Robert Peng
- Introduction to Data Science by Rafael Irizarry
- R Markdown definitive guide by Yihui Xie, J. J. Allaire, Garrett Grolemund
- R Markdown cookbook by Yihui Xie, Christophe Dervieux, Emily Riederer
- Data Science Live book by Pablo Casas (for understanding of common issues when data analysis and machine learning are done)
- And many, many more in their archives!
The Big Book of R: It is a large collection of all-things-R. Some of the books I wrote above are also cited here. It is a nice compendium summarizing many great resources for R learning.
Gaston Sanchez, UC Berkeley: So many R tutorials and vignettes that will blow your mind.
Statistical tools for high-throughput data analysis (STHDA): Maintained by Alboukadel Kassambara (PhD in Bioinformatics and Cancer Biology) who authored several helpful R packages including
ggpubr
,survminer
,ggcorplot
, andfactoextra
.useR! Machine Learning Tutorial: Tutorial from the R user conference 2016 focusing on using machine learning algorithms in R.
Fantastic datasets and where to find them:
- Kaggle: Community curated datasets from all sorts of disciplines
- Harvard Dataverse: Harvard-managed database containing ~100K datasets from various sources.
- Our World in Data: Numbers of the World
- UCI Machine Learning Data Repository: “A collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms”
Websites that give you a helping hand when you are stuck:
- Stack Overflow: For coding problems
- Cross Validated: For questions about statistics and whatnot
- Biostars:Bioinformatics forum contributed by many across the globe
- R-bloggers: A collection for R blogs across the globe. You will never know the gems you’ll discover here. Many thanks Tal Galili for creating and maintaining the platform!
Places for understanding statistics and machine learning better
StatQuest: A great way of learning statistics and machine learning concepts without getting into heavy mathematics.
Introduction to Statistical Learning: Perfect for understanding how statistics and machine learning works, and it involves minimal maths.
Elements of Statistical Learning: The big brother of the Introduction to Statistical Learning course above. For a more detailed dive into the concepts.
Setosa.io: A blog for visually explaining things. Great for understanding things like principle component analysis.
For learning more about Git/GitHub
- Happy Git with R: All the good things together R, RStudio, and Git
- Git Docs Tutorial
- Git Book