Jump to ratings and reviews
Rate this book

Data Mining Algorithms for Genomic Analysis

Rate this book
This dissertation, "Data Mining Algorithms for Genomic Analysis" by Sio-iong, Ao, 區小勇, was obtained from The University of Hong Kong (Pokfulam, Hong Kong) and is being sold pursuant to Creative Attribution 3.0 Hong Kong License. The content of this dissertation has not been altered in any way. We have altered the formatting in order to facilitate the ease of printing and reading of the dissertation. All rights not granted by the above license are retained by the author.



Abstract of thesis entitled

DATA MINING ALGORITHMS FOR GENOMIC ANALYSIS

Submitted by

Ao, Sio Iong

for the degree of Doctor of Philosophy at The University of Hong Kong in January 2007



With the results of many different genome-sequencing projects, hundreds of genomes from all branches of species have become available. Currently, one important task is to search for ways that can explain the organization and function of each genome. Data mining algorithms become very useful to extract the patterns from the data and to present it in such a way that can better our understanding of the structure, relation, and function of the subjects. In this work, data mining algorithms have been developed for solving some frontier problems in genomic analysis.

It is estimated that there exist about ten million single-nucleotide polymorphisms (SNPs) in the human genome. The complete screening of all the SNPs in a genomic region becomes an expensive undertaking. The problem of selecting a subset of informative SNPs (tag SNPs) has been formulated as a hierarchical clustering problem with the development of a suitable similarity function for measuring the distances between the clusters. The proposed algorithm takes account of both functional and linkage disequilibrium information with the asymmetry thresholds for different SNPs, and does not have the difficulties of the block-detecting methods, which can result in different block boundaries. Experimental results supported that the algorithm is cost-effective for tag-SNP selection. More compact clusters can be produced with the algorithm to improve the efficiency of association studies.

There are several different advantages of the linkage disequilibrium maps (LD maps) for genomic analysis. In this study, the construction of the LD mapping has been formulated as a non-parametric constrained unidimensional scaling problem, which is based on the LD information among the SNPs. This is different from the previous LD map, which is derived from the given Malecot model. Two procedures, one with the formulation as the least squares problem with nonnegativity and the other with the iterative algorithms, have been considered to solve this problem. The proposed maps can accommodate recombination events that have accumulated. Application of the proposed LD maps for human genome is presented. The linkage disequilibrium patterns in the LD maps can provide the genomic information like the hot and cold recombination regions, and can facilitate the study of recent selective sweeps across the human genome.

Microarray has been the most widely used tool for assessing differences in mRNA abundance in the biological samples. Previous studies have successfully employed principal components analysis-neural network as a classifier of gene types, with continuous inputs and discrete outputs. An algorithm has been developed for testing the predictability of gene expression time series with PCA and NN components on a continuous numerical inputs and outputs basis. Comparisons of results support that our approach is a more realistic model for the gene network from a continuous prospective.

10.5353/th_b3831982



Data mining
Algorithms
Genomics - Data processing

196 pages, Paperback

Published January 27, 2017

About the author

Sio-Iong Ao

71 books

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
0 (0%)
4 stars
0 (0%)
3 stars
0 (0%)
2 stars
0 (0%)
1 star
0 (0%)
No one has reviewed this book yet.

Can't find what you're looking for?

Get help and learn more about the design.