VIBS-689: Single-cell Data Analysis via Machine Learning

VIBS-689: Single-cell Data Analysis via Machine Learning (Spring 2020)

Instructor: James Cai (

Lecture & computer lab: Tues & Thurs 10 AM - 12 PM, VIDI (bldg #1813) Rm 104 & 115

Course Description

The aim of the course is to provide a practical introduction to the analysis of single-cell data using machine learning. Topics will range from data visualization/exploration to advanced machine learning methods. Practical examples and applications will be illustrated by using Matlab, R and Julia.

Principles and concepts in single-cell RNA sequencing (scRNAseq) experiments; real-world applications of scRNAseq with examples; machine learning (ML) methods for single-cell data analysis; practical and effective ML methods and concepts; applications of ML methods in high-dimensional scRNAseq data; algorithm design and development of scientific software using high-level high-performance scientific computer languages; emerging techniques for integrative single-cell data analysis, and the assumptions, advantages, and limitations of these techniques.

Course Milestones

1. Introduction to single-cell technologies (10x Genomics and C1); INTRODUCTION: mathematical prerequisites and programming languages MATLAB/R/JULIA, data exploration and visualization (1/14)

2. Getting expression counts (UMIs) for individual cells (10X cell-ranger and salmon-alevin pipeline); VECTORS & MATRICES: linear algebra, norm, distances, eigenvalue and eigenvector (1/21)

3. Basic statistics of scRNAseq data and QC; DISTANCE & SIMILARITY (1/28)

4. scRNAseq data modeling and normalization; PROBABILITY & DISTRIBUTIONS (2/4)

5. Dimension reduction and visualization; DIMENSION REDUCTION: PCA, tSNE, UMAP, PHATE (2/11)

6. Clustering analysis of high-dimensional scRNAseq data; CLUSTERING: k-means, DBSCAN, spectral clustering (2/18)

7. Marker genes and cell type identification; CLASSIFICATION: classifiers, LDA, random forest, k-NN, neural network (2/25)

8. Feature selection, identification of highly variable genes; KERNELS: kernel methods, support vector machine (3/3)

=== Spring Break (3/9-3/13) (3/16-3/20) ===

9. Differential expression analyses with scRNA-seq data; REGRESSION & REGULARIZATION (3/24)

10. scRNAseq vs bulk RNAseq; COMPONENT & DECOMPOSITION: SVD, ICA, NMF (3/31)

11. Pseudotime and trajectory analyses; OPTIMIZATION: gradient descent, quadratic programming, evolutionary algorithm (4/7)

12. Manifold alignment for combining different types of data, e.g., scRNAseq and scATACseq; TENSOR FACTORIZATION & MANIFOLD LEARNING (4/14)

13. Construction of single-cell gene regulatory networks (scGRNs); NETWORK & COMMUNITY: centrality analysis, community detection, clustering with multi-layer graphs (4/21)

14. Final exam (4/28)

Learning objectives/outcomes

Acquisition of a working knowledge about running a full pipeline on scRNAseq data. After this course participants should be able to:

  • Develop and implement scRNAseq data visualization/exploration solutions
  • Understand advanced data analysis strategies in the context of scRNAseq research
  • Understand principles and applications of machine learning
  • Implement reproducible workflows of data analysis in real-word applications

Course prerequisites

basics of a programming language, basic statistical knowledge, basic knowledge of linear algebra