Publications
Display by Type
Display by Topic
P. Church, A. Wong, M. Brock, A. Goscinski
The cost and time of deploying HPC applications on clouds is a problem. Instead of conducting their research discipline specialists are forced to carry out activities for application deployment, publication and ease of access. In response, a new approach for HPC application deployment and access in clouds is proposed. The major innovations are a new approach to deploying and executing HPC applications on IaaS and PaaS clouds, and exposing HPC applications as services. Through three case studies this paper demonstrates the feasibility and effectiveness of the proposed approach that could lead to the building of a SaaS library of discipline-oriented services evocable through user friendly, discipline specific interfaces. The new approach will reduce the time and money needed to deploy and expose discipline HPC applications.
P. Church, A. Goscinski, K. Holt, M. Inouye, A. Ghoting, K. Makarychev, R. Matthias
The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.
P. Church, A. Goscinski
The increasing amount of data collected in the fields of physics and bio-informatics allows researchers to build realistic, and therefore accurate, models/simulations and gain a deeper understanding of complex systems. This analysis is often at the cost of greatly increased processing requirements. Cloud computing, which provides on demand resources, can offset increased analysis requirements. While beneficial to researchers, adaption of clouds has been slow due to network and performance uncertainties. We compare the performance of cloud computers to clusters to make clear the advantages and limitations of clouds. Focus has been put on understanding how virtualization and the underlying network effects performance of High Performance Computing (HPC) applications. Collected results indicate that performance comparable to high performance clusters is achievable on cloud computers depending on the type of application run.
Full Text
P. Church, A. Goscinski, C. Lefevre, A. Wong
Gene Expression Comparative Analysis allows bioinformatics researchers to discover the conserved or specific functional regulation of genes. This is achieved through comparisons between quantitative gene expression measurements obtained in different species on different platforms to address a particular biological system. Comparisons are made more difficult due to the need to map orthologous genes between species, pre-processing of data (normalization) and post-analysis (statistical and correlation analysis). In this paper we introduce a web-based software package called EXP-PAC which provides on line interfaces for database construction and query of data, and makes use of a high performance computing platform of computer clusters to run gene sequence mapping and normalization methods in parallel. Thus, EXP-PAC facilitates the integration of gene expression data for comparative analysis and the online sharing, retrieval and visualization of complex multi-specific and multi-platform gene expression results.
Full Text
P. Church, A. Goscinski, C. Lefevre, A. Wong
Gene Expression Comparative Analysis allows bio-informatics researchers to discover the functional regulation of genes. This is achieved through comparisons between data-sets representing the quantities of substances in a biological system. Unnatural variations can be introduced during the data collection and digitization process so normalization algorithms must be applied to data before any accurate comparison can be made. There exist many different normalization methods each of which gives a different result. Comparing differently normalized datasets can allow for discovery of crucial regulated genes that may be otherwise hidden due to errors in a single normalization study. In this paper we introduce a web-based software package called EXP-PAC which makes use of a high performance computing platform of computer clusters to run multiple normalization methods in parallel. By generating multiple normalized datasets concurrently, we allow researchers the ability to improve the accuracy of their research with almost no extra time-cost.
J. S. Church, R. J. Denning, D. J. Evans
J. Y. Cai, J. S. Church, S. M. Smith
When dyeing a machine-washable cotton/wool blend, the affinity of the anionic dyes towards wool is substantially increased due to the chlorine/Hercosett treatment of the wool component in the blend. As a result, the wool is often dyed much darker than the cotton, or heavily stained by cotton dyes. It is therefore difficult to achieve a solid shade with satisfactory fastness properties. To dye the blend successfully in a one-bath process, dye blocking agents must be used to control the dye partition between the two fibres. Syntans have been most widely used for this purpose, but can cause hue changes and reduced lightfastness. This paper introduces an improved method for one-bath dyeing of machine-washable cotton/wool blends by using a new class of blocking agent as an alternative to syntans.
J. S. Church, G. L. Corino, D. J. Evans