VIBS-675: Single-cell Data Analysis via Machine Learning

VIBS-675: Single-Cell Data Analysis via Machine Learning

VIBS-675: Single-Cell Data Analysis via Machine Learning (Spring 2025)

Lecture & computer lab: Tues & Thurs 10 AM - 12 PM, VIDI (bldg #1813) Rm 103 & 115

Course Description

The aim of the course is to provide a practical introduction to the analysis of single-cell data using machine learning. Topics will range from data visualization/exploration to advanced machine learning methods. Practical examples and applications will be illustrated by using Matlab, Julia and R.

Principles and concepts in single-cell RNA sequencing (scRNAseq) experiments; real-world applications of scRNAseq with examples; machine learning (ML) methods for single-cell data analysis; practical and effective ML methods and concepts; applications of ML methods in high-dimensional scRNAseq data; algorithm design and development of scientific software using high-level high-performance scientific computer languages; emerging techniques for integrative single-cell data analysis, and the assumptions, advantages, and limitations of these techniques.

Course Milestones

1. Introduction to single-cell technologies (10x Genomics); INTRODUCTION: mathematical prerequisites and programming languages MATLAB and JULIA, data exploration and visualization [Refs. 1, 2, 3] (1/14)

2. Getting expression counts (UMIs) for individual cells (10X cell-ranger and salmon-alevin pipeline); VECTORS & MATRICES: linear algebra, norm, distances, eigenvalue and eigenvector [Refs. 1, 2, 3, 4, 5, 6] (1/21 winter storm, lecture moved to 1/28)

3. Basic statistics of scRNAseq data and QC; DISTANCE & SIMILARITY [Refs. 1] (2/4)

4. scRNAseq data modeling and normalization; PROBABILITY & DISTRIBUTIONS [Refs. 1, 2, 3] (2/11)

5. Dimension reduction and visualization; DIMENSION REDUCTION: PCA, tSNE, UMAP, PHATE [Refs. 1, 2, 3, 4, 5, 6 ] (2/18)

6. Clustering analysis of high-dimensional scRNAseq data; CLUSTERING: k-means, DBSCAN, spectral clustering [Refs. 1, 2, 3, 4, 5, 6, 7] (2/25)

7. Marker genes and cell type identification; CLASSIFICATION: classifiers, LDA, random forest, k-NN, neural network (3/4)

=== Spring Break ===

8. Feature selection, identification of highly variable genes; KERNELS: kernel methods, support vector machine [Refs. 1] (guest lectures 3/18, lecture moved to 3/20)

9. Differential expression analyses with scRNA-seq data; REGRESSION & REGULARIZATION [Refs. 1, 2] (3/25)

10. scRNAseq vs bulk RNAseq; COMPONENT & DECOMPOSITION: SVD, ICA, NMF [refs. 1, 2, 3, 4] (4/1)

11. Pseudotime and trajectory analyses; OPTIMIZATION: gradient descent, quadratic programming, evolutionary algorithm [refs. 1, 2, 3] (4/8)

12. Manifold alignment for combining different types of data, e.g., scRNAseq and scATACseq; TENSOR FACTORIZATION & MANIFOLD LEARNING [refs. 1] (4/15)

13. Construction of single-cell gene regulatory networks (scGRNs); NETWORK & COMMUNITY: centrality analysis, community detection, clustering with multi-layer graphs [refs. 1, 2, 3, 4] (4/22)

14. Student presentation (4/29)

15. Final exam (5/6)

Learning objectives/outcomes

Acquisition of a working knowledge about running a full pipeline on scRNA-seq data. After this course participants should be able to:

Develop and implement scRNA-seq data visualization/exploration solutions
Understand advanced data analysis strategies in the context of scRNA-seq research
Understand principles and applications of machine learning
Implement reproducible workflows of data analysis in real-word applications

Course prerequisites

Basics of a programming language, basic statistical knowledge, basic knowledge of linear algebra

Link to Team VIBS-675

Team - VIBS-675 (2025) | General | Microsoft Teams

Resources

MATLAB and Simulink Training (mathworks.com)

Introducing MATLAB Fundamental Classes (Data Types) - Video - MATLAB (mathworks.com)

https://probml.github.io/pml-book/book1.html

Tutorial: guidelines for the computational analysis of single-cell RNA sequencing data | Nature Protocols

https://twitter.com/WomenInStat/status/1346477715398922248?s=20

NCHU 2020 single-cell data science lecture series

https://canvas.tamu.edu/courses/282934

Report abuse