Mission:

The mission of the of the integrated health science facility core (IHSFC) is to assist investigators in knowledge extraction from complex data, by applying data mining approaches to problems of today's biological systems and to identify new issues in biomedical research. The IHSFC specializes in analysis of “wide” data characteristic of multi-omics data sets such as that produced by genomics, proteomics or metabolomics experiments. In these types of data sets, the numbers of features are much greater than the number of samples, a characteristic that poses special problems in analysis. For this purpose, the IHSFC has acquired training, software and tools for both supervised (classification) and unsupervised learning. A large emphasis is placed on feature reduction with the intent to eliminate “noisy” features and retain those with the most information. The IHFSC will assist in selection of the most appropriate data analysis approach based on investigator need and type of data. Some of these approaches are described below.

Supervised learning (Classification):

The goal of supervised learning is in predicting an outcome or class. Supervised Classification is a specialized area of Machine Learning using computers to detect patterns and trends. In supervised learning a complex “omics” data set is analyzed using various computational algorithms to determine the behavior of the data in relationship to a particular event or to identify types/subtypes of disease present. A wide range of classifiers (supervised learning methods) are available, each with its own strengths and weaknesses. Classifier performance depends greatly on the characteristics of the data being analyzed. Determining a suitable classifier for a given problem and identifying the exact behavior of a particular disease for a particular response. Below are some of the classifiers.

Unsupervised learning:

Unsupervised learning is used to identify patterns or differences existing in the data. It may express certain unique patterns which may be representative of a particular event which dominates or is expressed in the data. It gives an understanding of how gene expression data has information and shows us possibilities if/how further analysis can be done. Few of the several different techniques are described below.

Tools/ Software:

Below are some of the software tools available:

Contact information if you have a project or data to analyze:

Allan R. Brasier, MD
MRB 8.122
301 University Blvd.
Galveston, TX 77555-1060
Phone (409)-772-2824
Email: arbrasie@utmb.edu

Sundar S Victor, M.S
MRB 5.138
301 University Blvd
Galveston, Texas-77550
Phone: 409-772-2178
Email : ssvictor@utmb.edu