VIBS-689: Machine Learning for Biologists

VIBS-689: Machine Learning for Biologists (Spring 2020)

Instructor: James Cai (jcai@tamu.edu)

Lecture & computer lab: Tues & Thurs 10 AM - 12 PM, VIDI (bldg #1813) Rm 104 & 115

Course Description

The aim of the course is to provide a practical introduction to the analysis of “omics” and single-cell data using machine learning. Topics will range from data visualization/exploration to advanced machine learning methods. Practical examples and applications will be illustrated by using Matlab, R and Julia.

Practical and effective machine learning methods and concepts; principles and concepts in statistical experiment design and analytics; single cell data analysis, population genetics, and evolutionary genomics; applications of quantitative approaches (computation, statistics, and mathematics) in analyzing large-scale and complex biological data sets; algorithm design and development of scientific software using high-level high-performance scientific computer languages; emerging techniques for integrative data analysis, and the assumptions, advantages, and limitations of these techniques.

Course Milestones

1. BASIC MATH & MATLAB/R/JULIA: data exploration, visualization, programming languages (1/14)

2. VECTORS & MATRICES: linear algebra, norm, distances, eigenvalue and eigenvector (1/21)

3. DISTRIBUTIONS, CURVE FITTING & OPTIMIZATION: anomaly detection, gradient descent, evolutionary algorithm (1/28)

4. CORRELATION & REGRESSION: multivariate analysis, regularization, LASSO (2/4)

5. ENTROPY & MUTUAL INFORMATION (2/11)

6. COMPONENT & DECOMPOSITION: SVD, ICA, NMF (2/18)

7. DIMENSION REDUCTION: PCA, tSNE, UMAP, PHATE (2/25)

8. CLUSTERING: : k-means, nearest neighbor, DBSCAN, spectral clustering (3/3)

=== Spring Break (3/9-3/13) ===

9. CLASSIFICATION: classifiers, LDA, random forest, neural network (3/17)

10. KERNELS: support vector machine (3/24)

11. NETWORK & COMMUNITY: centrality analysis, community detection, clustering with multi-layer graphs (3/31)

12. DIFFERENTIAL NETWORK ANALYSIS: WGCNA, single-cell regulatory network, graph comparison (4/7)

13. TENSOR FACTORIZATION & MANIFOLD LEARNING (4/14)

14. Student presentations (4/21) and final exam (4/28)

Learning objectives/outcomes

Acquisition of a working knowledge about running a full classification/profiling pipeline on omics data (e.g., gene expression). After this course participants should be able to:

  • Develop and implement data visualization/exploration solutions
  • Understand advanced data analysis strategies in the context of omics research
  • Understand principles and applications of machine learning
  • Implement reproducible workflows of data analysis in real-word applications

Course prerequisites

basics of a programming language, basic statistical knowledge, basic knowledge of linear algebra