The beginning of the twenty-first century has been characterized by an explosion of biological information. The avalanche of data grows daily and arises as a consequence of advances in the fields of molecular biology and genomics and proteomics. The challenge for nowadays biologist lies in the de-codification of this huge and complex data, in order to achieve a better understanding of how our genes shape who we are, how our genome evolved, and how we function.
Without the annotation and data mining, the information provided by for example high throughput genomic sequencing projects is not very useful. Bioinformatics is the application of computer science and technology to the management and analysis of biological data, in an effort to address biological questions. The work presented in this thesis has focused on the use of Grid and High Performance Computing for solving computationally expensive bioinformatics tasks, where, due to the very large amount of available data and the complexity of the tasks, new solutions are required for efficient data analysis and interpretation.
Three major research topics are addressed; First, the use of grids for distributing the execution of sequence based proteomic analysis, its application in optimal epitope selection and in a proteome-wide effort to map the linear epitopes in the human proteome. Second, the application of grid technology in genetic association studies, which enabled the analysis of thousand of simulated genotypes, and finally the development and application of a economic based model for grid-job scheduling and resource administration.
The applications of the grid based technology developed in the present investigation, results in successfully tagging and linking chromosomes regions in Alzheimer disease, proteome-wide mapping of the linear epitopes, and the development of a Market-Based Resource Allocation in Grid for Scientific Applications.