VIBS-689/675: Single-cell Data Analysis via Machine Learning

VIBS-689/675: Single-cell Data Analysis via Machine Learning (Spring 2024)

Instructor: James Cai (jcai@tamu.edu)

Lecture & computer lab: Tues & Thurs 10 AM - 12 PM, VIDI (bldg #1813) Rm 103 & 115

Course Description

The aim of the course is to provide a practical introduction to the analysis of single-cell data using machine learning. Topics will range from data visualization/exploration to advanced machine learning methods. Practical examples and applications will be illustrated by using Matlab, Julia and R. 

Principles and concepts in single-cell RNA sequencing (scRNAseq) experiments; real-world applications of scRNAseq with examples; machine learning (ML) methods for single-cell data analysis; practical and effective ML methods and concepts; applications of ML methods in high-dimensional scRNAseq data; algorithm design and development of scientific software using high-level high-performance scientific computer languages; emerging techniques for integrative single-cell data analysis, and the assumptions, advantages, and limitations of these techniques.

Course Milestones

1. Introduction to single-cell technologies (10x Genomics); INTRODUCTION: mathematical prerequisites and programming languages MATLAB and JULIA, data exploration and visualization [Refs. 1, 2, 3] (1/16)

2. Getting expression counts (UMIs) for individual cells (10X cell-ranger and salmon-alevin pipeline); VECTORS & MATRICES: linear algebra, norm, distances, eigenvalue and eigenvector [Refs. 1, 2, 3, 4, 5, 6] (1/23)

3. Basic statistics of scRNAseq data and QC; DISTANCE & SIMILARITY [Refs. 1] (1/30)

4. scRNAseq data modeling and normalization; PROBABILITY & DISTRIBUTIONS [Refs. 1, 2, 3] (2/6)

5. Dimension reduction and visualization; DIMENSION REDUCTION: PCA, tSNE, UMAP, PHATE [Refs. 1, 2, 3, 4, 5, 6 ] (2/13)

6. Clustering analysis of high-dimensional scRNAseq data; CLUSTERING: k-means, DBSCAN, spectral clustering [Refs. 1, 2, 3, 4, 5, 6, 7] (2/20)

7. Marker genes and cell type identification; CLASSIFICATION: classifiers, LDA, random forest, k-NN, neural network (2/27)

8. Feature selection, identification of highly variable genes; KERNELS: kernel methods, support vector machine [Refs. 1] (3/5)

=== Spring Break === 

9. Differential expression analyses with scRNA-seq data; REGRESSION & REGULARIZATION [Refs. 1, 2] (3/19)

10. scRNAseq vs bulk RNAseq; COMPONENT & DECOMPOSITION: SVD, ICA, NMF [refs. 1, 2, 3, 4] (3/26)

11. Pseudotime and trajectory analyses; OPTIMIZATION: gradient descent, quadratic programming, evolutionary algorithm [refs. 1, 2, 3] (4/2)

12. Manifold alignment for combining different types of data, e.g., scRNAseq and scATACseq; TENSOR FACTORIZATION & MANIFOLD LEARNING [refs. 1] (4/9)

13. Construction of single-cell gene regulatory networks (scGRNs); NETWORK & COMMUNITY: centrality analysis, community detection, clustering with multi-layer graphs [refs. 1, 2, 3, 4] (4/16)

14. Student presentation (4/23)

15. Final exam (4/30)

Learning objectives/outcomes

Acquisition of a working knowledge about running a full pipeline on scRNAseq data. After this course participants should be able to:

Course prerequisites

Basics of a programming language, basic statistical knowledge, basic knowledge of linear algebra