VIBS-689/675: Single-cell Data Analysis via Machine Learning
VIBS-689/675: Single-cell Data Analysis via Machine Learning (Spring 2024)
Instructor: James Cai (jcai@tamu.edu)
Lecture & computer lab: Tues & Thurs 10 AM - 12 PM, VIDI (bldg #1813) Rm 103 & 115
Course Description
The aim of the course is to provide a practical introduction to the analysis of single-cell data using machine learning. Topics will range from data visualization/exploration to advanced machine learning methods. Practical examples and applications will be illustrated by using Matlab, Julia and R.
Principles and concepts in single-cell RNA sequencing (scRNAseq) experiments; real-world applications of scRNAseq with examples; machine learning (ML) methods for single-cell data analysis; practical and effective ML methods and concepts; applications of ML methods in high-dimensional scRNAseq data; algorithm design and development of scientific software using high-level high-performance scientific computer languages; emerging techniques for integrative single-cell data analysis, and the assumptions, advantages, and limitations of these techniques.
Course Milestones
1. Introduction to single-cell technologies (10x Genomics); INTRODUCTION: mathematical prerequisites and programming languages MATLAB and JULIA, data exploration and visualization [Refs. 1, 2, 3] (1/16)
2. Getting expression counts (UMIs) for individual cells (10X cell-ranger and salmon-alevin pipeline); VECTORS & MATRICES: linear algebra, norm, distances, eigenvalue and eigenvector [Refs. 1, 2, 3, 4, 5, 6] (1/23)
3. Basic statistics of scRNAseq data and QC; DISTANCE & SIMILARITY [Refs. 1] (1/30)
4. scRNAseq data modeling and normalization; PROBABILITY & DISTRIBUTIONS [Refs. 1, 2, 3] (2/6)
5. Dimension reduction and visualization; DIMENSION REDUCTION: PCA, tSNE, UMAP, PHATE [Refs. 1, 2, 3, 4, 5, 6 ] (2/13)
6. Clustering analysis of high-dimensional scRNAseq data; CLUSTERING: k-means, DBSCAN, spectral clustering [Refs. 1, 2, 3, 4, 5, 6, 7] (2/20)
7. Marker genes and cell type identification; CLASSIFICATION: classifiers, LDA, random forest, k-NN, neural network (2/27)
8. Feature selection, identification of highly variable genes; KERNELS: kernel methods, support vector machine [Refs. 1] (3/5)
=== Spring Break ===
9. Differential expression analyses with scRNA-seq data; REGRESSION & REGULARIZATION [Refs. 1, 2] (3/19)
10. scRNAseq vs bulk RNAseq; COMPONENT & DECOMPOSITION: SVD, ICA, NMF [refs. 1, 2, 3, 4] (3/26)
11. Pseudotime and trajectory analyses; OPTIMIZATION: gradient descent, quadratic programming, evolutionary algorithm [refs. 1, 2, 3] (4/2)
12. Manifold alignment for combining different types of data, e.g., scRNAseq and scATACseq; TENSOR FACTORIZATION & MANIFOLD LEARNING [refs. 1] (4/9)
13. Construction of single-cell gene regulatory networks (scGRNs); NETWORK & COMMUNITY: centrality analysis, community detection, clustering with multi-layer graphs [refs. 1, 2, 3, 4] (4/16)
14. Student presentation (4/23)
15. Final exam (4/30)
Learning objectives/outcomes
Acquisition of a working knowledge about running a full pipeline on scRNAseq data. After this course participants should be able to:
Develop and implement scRNAseq data visualization/exploration solutions
Understand advanced data analysis strategies in the context of scRNAseq research
Understand principles and applications of machine learning
Implement reproducible workflows of data analysis in real-word applications
Course prerequisites
Basics of a programming language, basic statistical knowledge, basic knowledge of linear algebra
Resources
MATLAB and Simulink Training (mathworks.com)
Introducing MATLAB Fundamental Classes (Data Types) - Video - MATLAB (mathworks.com)
https://probml.github.io/pml-book/book1.html
https://twitter.com/WomenInStat/status/1346477715398922248?s=20
NCHU 2020 single-cell data science lecture series
https://canvas.tamu.edu/courses/282934