|
|
|
Posters |
|
1
Lynda B.M. Ellis Laboratory Medicine and Pathology
2 Jennifer L. Seffernick, Larry P. Wackett, and Patsy C. Babbitt Biochemistry, Molecular Biology and Biophysics Defining the expanse of sequence space within highly divergent superfamilies is a difficult task. Previous methodologies have resulted in either few sequences or large false positive rates. Here results from a new method are presented. This method modified the program Shotgun for use with large numbers of PSI-BLAST outputs. The new program was tested by identifying sequence space for the amidohydrolase superfamily. The original program, Shotgun, had a false positive rate of 34%, with this superfamily, while the new program has a reduced false positive rate of 1.2%. This procedure has allowed for expansion of the amidohydrolase superfamily to over 600 nonidentical members.
3 David Fermin1, Xinqiang Han1, Maralyssa Bann1, Mohammad-Karim Ezzat1, Robert Hebert1, Suzanne Grindle2, Soon Park1, Yingjie Chen1, Robert Bache1, Leslie Miller1, Jennifer Hall1 1Medicine, 2Cancer Center Informatics Core Heart failure affects an estimated 4.7 million Americans. The five-year survival rate of patients with heart failure is approximately 50%. Implantation of a mechanical ventricular assist device (VAD) has been shown to significantly decrease mortality in patients with refractory heart failure. A microarray chip representing 22,283 genes was used to examine transcriptome alterations in paired human heart samples in response to ventricular unloading with a VAD. We paired this approach with a supervised learning technique based on the theory of support vector machines to rank genes in order of their ability to discriminate hearts by their underlying etiology as well as in response to ventricular unloading. Initial analyses of 26 failing heart samples prior to implantation of the VAD revealed a distinct etiology-dependent stratification of genomic profiles. Gene ranking analysis identified 16 genes out of 22,283 that best discriminated idiopathic cardiomyopathy from ischemic cardiomyopathy versus acute myocardial infarction with an error rate of less than 6%. In line with these findings, the genomic signature of the unloaded heart was highly dependent upon the underlying etiology. We identified a subset of genes that best discriminated post-VAD from pre-VAD hearts within each cohort with error rates of less than 4%. In conclusion, our studies suggest that the genomic signature of the failing heart is dictated by the underlying cause of the disease. Furthermore, we demonstrated that the genomic response of the heart to ventricular support is uniquely dependent upon the underlying etiology of heart failure.
4 Martina Stromvik1, J. Johnson1, J. Schupp2, C. Schmidt1, J. Crow1, E. Shoop3, P. Keim2, R. Shoemaker4, L. Vodkin5, E. Retzel1 1 Center for Computational Genomics and Bioinformatics, University of Minnesota, Minneapolis, MN, 55455; 2 Department of Biology, Northern Arizona University, Flagstaff, AZ, 86011; 3 Mathematics and Computer Science, Macalester College, St. Paul, MN, 55105; 4 USDA-ARS, Iowa State University, Ames, IA, 50011; 5 Department of Crop Sciences, University of Illinois, Urbana, 61801. TableView is a portable Java application for visual data mining and exploration. Any tabular data can be loaded and shown it in multiple ways, such as in scatter plots, 3D scatter plots, parallel coordinates, line graphs, cluster and histograms. Points of interest can be selected in any of the views and simultaneously that selection will be shown in the other views. Using TableView, we mined an aggregated set of soybean SAGE (Serial Analysis of Gene Expression) tag counts, orientation prediction scores, EST counts, EST contigs and annotation data. We show how different information can be obtained from this integrated data set. Some examples of questions we can answer are: which tags were highly abundant in particular tissues; which tags were annotated as belonging to gene families of interests; which tags did not match any consensus sequences; and which consensus sequences and tags had been annotated as unknown. http://ccgb.umn.edu/software/java/apps/TableView/
5 Bo Kyeng Hou1, Lawrence P. Wackett1, and Lynda B.M. Ellis2 1BioTechnology Institute and 2Laboratory Medicine and Pathology We have developed a system to predict microbial catabolism, using the University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD, http://umbbd.ahc.umn.edu/) as a knowledge base. The present system, available on the web (http://umbbd.ahc.umn.edu/predict/), can predict biodegradation of most of the major aliphatic and aromatic organic functional groups containing C, H, N, O and halogens. It can duplicate at least one known biodegradation pathway for 60% of the compounds in a 84-member validation set; most pathways that did not completely duplicate known metabolism could plausibly occur in nature. Users are encouraged, and have begun, to submit additional biotransformation rules and comment on existing rules; the system will further develop under the direction of the scientific community.
6 Cheryl M. Dvorak1, Chad R. Ramler1, Kendra A. Hyland1, Yongqing Zhang2, Scott Fahrenkrug2, and Michael P. Murtaugh1 1Veterinary Pathobiology, 2Animal Science
7 Matthew Rasmussen1, Mukund Deshpande1, George Karypis1, James Johnson2, John A. Crow2, and Ernest F. Retzel2 1Computer Science & Engineering, 2Computational Genomics and Bioinformatics
8 Steven B. Cannon1, Andrew Baumgarten1, Georgiana May1,2, and Nevin D. Young1,3 1Plant Biology, 2Ecology, Evolution and Behavior, 3Plant Pathology
9 Kevin Roberg-Perez, Suzanne Grindle, and Donald Connelly Cancer Center Informatics Core
10 Lisa Herron-Olson1, James M. Musser2 and Vivek Kapur1 1Biomedical Genomics Center and Veterinary Pathobiology, 2Rocky Mountain Laboratories, NIAID, NIH, Hamilton, MT. Staphylococcus aureus (SA) is a leading cause of mastitis in dairy cattle worldwide despite the institution of extensive hygiene and prevention programs such as low level antimicrobial administration. In addition, SA causes significant morbidity and mortality among humans. Our previous analyses of the clonal population structure of SA recovered from animals and humans suggested that whole-genome sequencing and comparative analyses would enhance our understanding of the molecular mechanisms underlying pathogenesis and host specificity of SA. We here present the results of whole genomic sequencing of RF122, a bovine mastitis-associated strain of SA. Together with comparative genomic analyses of RF122 with the human pathogenic strains Mu50 and N315 (hospital-acquired MRSA) and MW2 (community-acquired MRSA), our analyses provide key insight on the genetic basis for pathogenicity and host specificity of SA.
11 Zheng Jin Tu, Patton Fast, Wayne Xu, Yuk Sham Supercomputing Institute The Supercomputing Institute for Digital Simulation and Advanced Computation is an interdisciplinary research program of the University of Minnesota. The Institute has a state-of-art bioinformatics infrastructure, large databases, a wide range of software, and technical skills to support bioinformatics and genomics research as well as large scale high-throughput bioinformatics project development. The Supercomputing Institute has most of popular software for bioinformatics (BLAST, GCG, EMBOSS, Phred/Phrap ...), evolution (PHYLIP, PAUP, PAML ...), and microarray data analysis (Expressionist, GeneTraffic, GeneSpring, Spotfire ...). We also have numerous local biological databases available, including Genbank and human genome sequence data. The statistic packages such as SAS and R are available and can be used for bioinformatics data analysis. For database application development IBM DB2, Oracle, and SQL Server are all available at the Institute. Please contact the Institute's user support staff to discuss your needs, projects, databases, and software requirements. For more information, please check Institute's Computational Biology and Genomics web page at http://www.msi.umn.edu/user_support/compgen/.
12 John Garbe and Yang Da Animal Science Graphical pedigree visualization is helpful for studying the relationships among individuals, gene flows from generation to generation, and the population structure. However, graphical visualization of large complex pedigrees is often a humanly impossible task. Pedigraph provides rapid graphical visualization of large, complex pedigrees, with options for controlling colors, drawing size, page size and margins, drawing styles, extraction and highlighting of partial pedigrees involving selected individuals. Pedigraph can display all individuals in the data set, or display the number of offspring in each family by gender and trait status such as disease or normal phenotypes. Pedigraph requires a simple text pedigree file as user input, and is able to draw pedigrees with complex inbreeding structures over multiple generations in a population with a large number of individuals, as is common in animal populations. A trial version of the program is available at http://animalgene.umn.edu.
13 Jayprakash Vasdewani1, Arvind Raghavan2, Cavan S. Reilly3, George Karypis1 and Paul R. Bohjanen2 1Computer Science and Engineering, 2Microbiology, 3Biostatistics The 5' and 3' untranslated regions (UTRs) of certain mRNA transcripts contain sequences are known to mediate mRNA stability. For example, AU-rich elements (AREs) in the 3' UTRs of cytokine and proto-oncogene transcripts mediate mRNA degradation, while 5' UTR sequences in IL-2 transcripts mediate pathway-dependent transcript stabilization. We predict that each mRNA transcript regulated at the level of mRNA stability contains one or more cis-regulatory sequences that interact with the cellular mRNA degradation machinery, and that coordinately regulated transcripts may contain conserved cis-regulatory sequences. Using probabilistic approaches to motif prediction, we compare the outputs of the Gibbs sampler and MEME (Multiple EM for Motif Elicitation) algorithms over a range of motif lengths and mRNA stability profiles. By clustering T cell transcripts based on sequence similarity scores and mRNA stability parameters, and then comparing these sets of clusters, we wish to test our hypothesis that occurrence of cis-regulatory sequences can be correlated to mRNA stability profiles.
14 Kevin Messner Bio-Medical and Magrath Libraries The Biology Student Workbench project (http://bsw.ncsa.uiuc.edu), located primarily at the University of Illinois at Urbana-Champaign, provides training and curricular materials for biology teachers to include bioinformatics in their curricula. The project supports the Biology Workbench (http://workbench.sdsc.edu) and the Student Interface to the Biology Workbench (http://bsw.ncsa.uiuc.edu/cgi-bin/sib.py) as primary toolkits for investigation. The goals of the project are to 1) create a strong suite of tools usable by non-experts to conduct bioinformatics inquiries; 2) provide inquiry-based curricular materials for classroom use; and 3) establish a community of scientists and educators to support the use of computational research tools in education.
15 Xu Guo1, Huilin Qi2, Catherine M. Verfaillie2, and Wei Pan1 1Biostatistics and 2Medicine Longitudinal gene expression data arise from time- course microarray experiments, which are designed to study biological processes in a temporal fashion by taking samples from the same subject at different time points to measure gene expression levels. We apply generalized estimating equation techniques to construct a robust statistic, which is a variant of the robust Wald statistic, for longitudinal gene expression data to detect genes with temporal changes in expression. We associate significance levels to the proposed statistic by either incorporating the idea of the Significance Analysis of Microarrays (SAM) method (Tusher et al., 2001) or using the mixture model method (MMM) (Pan et al., 2002) to identify significant genes. The utility of the statistic is demonstrated through its application to an important study of osteoblast lineage-specific differentiation.
16 Xiaohong Huang and Wei Pan Biostatistics Using gene expression data to classify (or predict) tumor types has received much research attention recently. Due to special features of gene expression data, several new methods have been proposed, including the weighted voting scheme of Golub et al (1999), the compound covariate method of Hedenfalk et al (2001) (originally proposed by Tukey (1993)), and the shrunken centroids method of Tibshirani et al (2002). These methods look different and are more or less ad hoc. Here we point out a close connection of the three methods with a linear regression model. Casting the classification problem in the general framework of linear regression naturally leads to new alternatives, such as modified partial least squares (PLS) methods and penalized PLS (PPLS) methods. Using two real data sets, we show the competitive performance of our new methods when compared with the other three methods.
17 Eric W. Klee1,5, Dan Carlson2, Scott C. Fahrenkrug2,3,5, Steve C. Ekker4,5, Lynda B. Ellis1,5 1Laboratory Medicine and Pathology, 2Animal Science, 3Animal Biotechnology Center, 4Genetics, Cell Biology and Development, 5Beckman Center for Transposon Research Secreted proteins are high-priority targets for functional-annotation and gene-expression analysis research due to their mediation of short-range and long-range intercellular signaling during the development and growth of multi-cellular organisms. To obtain viable targets and facilitate research in this area across a broad spectrum of vertebrates, a method to identify secreted proteins from Expressed Sequence Tag (EST) databases was needed. We designed a system to overcome limitations incurred by existing prediction programs when analyzing low quality, artifactually truncated sequences. We used a hybrid of homology modeling, signal peptide prediction and homologous sequence pair alignment analysis to produce our predictions. Output from the analysis of The Institute for Genomic Research Porcine Gene Index and validation by microsome-coupled in vitro translation are presented.
18 Sean Goggins1, Amber Kocemba1, Yongqing Zhang2,3, Kevin Roberg-Perez7, Eric Klee5,6, Michael Pickart4,6, John Keele8, Greg Harhay8, Jim Wray8, Warren Snelling8, Lynda Ellis5,6, Stephen C. Ekker4,6, David Largaespada4,5,6, John Carlis1, Scott C. Fahrenkrug2,3,6 1Computer Science and Engineering, 2Animal Science, 3Animal Biotechnology Center, 4Genetics, Cell Biology and Development, 5Laboratory Medicine and Pathology, 6Beckman Center for Transposon Research, 7Cancer Center Informatics Core, 8USDA Meat Animal Research Center (MARC), Clay City, Nebraska Vertebrate comparative and functional genomics requires relating heterogeneous types of data from distributed sources, an exercise that would greatly benefit from the development of a suitable data model and relational database. We have undertaken the development of a logical data model to meet this need. This logical data structure (LDS) connects vertebrate genetic and physical mapping data to external databases (ENSEMBL, TIGR, NCBI, and Celera) by way of gene-ontology. The LDS has also been developed as the core of a laboratory information management system (LIMS) by providing a connection between animals, samples, and sequence data with the laboratory processes generating them. Specific portals into the data model originate from research efforts aimed at annotating the vertebrate genome by forward-, reverse- and para- genetic techniques being applied to livestock, mice and zebrafish. Our current LDS derives from the "reverse engineering" and integration of data models from MARC, the mouse transposon insertion database (MTID), and the zebrafish morpholino consortium. Connections to an LDS describing proteomic and gene expression data have also been anticipated. Physical implementation of the LDS will provide for accelerated candidate gene identification for improved animal production and animal & human health.
19 Yongqing Zhang1,3, Wen Dong4, Cheryl Dvorak2,3, Kendra Hyland2, Michael Murtaugh2,3, Scott C. Fahrenkrug1,3 1Animal Science, 2Veterinary Pathobiology, 3Animal Biotechnology Center, 4Supercomputing Institute Panspecific comparison of animal phenotypes and genotypes is central to determining vertebrate gene-function. These comparisons require that we relate physical and genetic mapping data, gene-expression data, and data from forward and reverse genetic analysis of multiple vertebrate species. This poster describes a pipeline for tracking experimental animals, laboratory processes, and their connections to the generation of DNA sequence data. A laboratory information management system (LIMS) tracks the generation of DNA sequence in the context of a logical data model (see poster 18). Fields describing the genetic and experimental lineage of animals and the origin of template for DNA sequencing are captured prior to sample submission. After sequencing, raw trace files are parsed to a mySQL relational database running on a Linux computer. Trace files are automatically processed for quality and vector content and are stored in our system as SCF and FASTA files that can be viewed via an access-controlled web browser interface. Quality statistics are associated with each sample and sequencing plate for instant user feedback. Users can assemble groups of sequences and trace files for analysis or submission to public databases (dbEST, dbSTS, dbGSS, and TRACEdb) using a "shopping-cart"-based selection system (seqCart). Groomed sequence submission is an iterative and interactive process based on user defined parameters and submission-file inspection. Tools are provided for seqCart-based design of PCR primers or overgo-oligos, analysis by BLAST or BLAT, or assembly and annotation based on the POLYPHRED/PHRAP/CONSED suite. Implementation in a relational database will provide for seamless integration of additional tools as required. The entire system is now being migrated to Unix computers at the University of Minnesota Supercomputing Center and will make use of Oracle 9i.
20 Himanshu Khandelia, Ben Anderson, Yiannis N. Kaznessis Chemical Engineering and Material Science We have conducted extensive simulations of the N-terminus of NK-lysin in zwitterionic and anionic lipid monolayers. NK-lysin is a basic antimicrobial polypeptide and its activity has been suggested by the high degree of homology to surfactant protein B (SP- B), which binds strongly to anionic phospholipids. The simulation results quantify the interactions between the peptide and mammalian and bacterial model membranes. Questions that are being answered conclusively are: How does NK-lysin alter the interfacial properties of the membrane? Are there specific amino acid residues that have a significant role in regulating lipid-protein interactions? How does the nature of the lipids affect the answer to all of these questions? What are the differences between SP-B and NK-lysin interactions with lipid monolayers?We leverage the gained knowledge to improve the design rules for lung surfactant-active peptides with antimicrobial activities.
|
Page Author(s): Jeff Lande, Lynda Ellis