VIBS-### - Machine Learning for Biologists

VIBS-### - Machine Learning for Biologists

Instructor: James Cai (jcai@tamu.edu)

Meeting & Lab: Tue & Thr 1:30 PM - 3:30 PM, VIDI (bldg 1813) Rm 104 & 115

Office hours: Wed 10-12, VMR (bldg 1811) Rm 384

Course Description

The aim of the course is to provide a practical introduction to the analysis of “omics” data using machine learning methods. Topics will range from data visualization/exploration to advanced data analysis and machine learning. Practical examples and applications will be illustrated by using Matlab and R.

Course Milestones

1. BASIC MATH & MATLAB: data exploration, visualization, programming languages

2. VECTOR & MATRIX: linear algebra, data structure, norm, cosine distance

3. CURVE FITTING & OPTIMIZATION: - http://www.cs.grinnell.edu/~weinman/code/index.shtml

4. CORRELATION & REGRESSION: multivariate, regularization, LASSO

5. DECOMPOSITION: eigen, SVD, PCA, ICA, NMF

6. Dimension reduction and beyond: Linear Discriminant Analysis [12], and Locality Preserving Projection [19]. https://core.ac.uk/download/pdf/4820214.pdf

7. CLUSTERING: K-means, spectral clustering

8. CLASSIFICATION: classifiers, performance measures, diagnostics

9. ENTROPY & MUTUAL INFORMATION

10.

11. NETWORK CENTRALITY & COMMUNITY: centrality analysis (node importance); community detection (groups of well-connected nodes)

12. DIFFERENTIAL NETWORK ANALYSIS: differential network analysis – co-expression networks, graph comparison, community detection - Clustering with Multi-Layer Graphs: A Spectral Perspective https://arxiv.org/abs/1106.2233

13. TENSOR LEARNING & MANIFOLD LEARNING

Target audience

Computational biologists, bioinformaticians, biological data analysts.

Learning objectives/outcomes

Acquisition of a working knowledge about running a full classification/profiling pipeline on omics data (e.g., gene expression). After this course participants should be able to:

  • Develop and implement data visualization/exploration solutions
  • Understand advanced data analysis strategies in the context of omics research
  • Understand principles and applications of machine learning
  • Implement reproducible workflows of data analysis in real-word applications

Course prerequisites

basics of a programming language, basic statistical knowledge, basic knowledge of linear algebra