Symp Home
Bioinformatics:
Building Bridges

Posters


1
Graduate Training in Bioinformatics at the University of Minnesota

Lynda B.M. Ellis

Laboratory Medicine and Pathology

On February 8, 2002, the Regents of the University of Minnesota approved a new Graduate Program in Bioinformatics (URL = http://www.binf.umn.edu/). It offers Graduate Minors at the Masters and PhD level and includes 18 faculty members from 12 departments in 5 schools. On April 26, 2002 the Graduate Program held a daylong symposium with world-renown speakers, a poster session, and a lunch hosted by the Graduate Faculty. The symposium was presented to an overflow (100+) audience and is being repeated today. An informal Bioinformatics Journal Club met weekly in Spring 2002. It is now a formal class (BINF 5480) offered every Fall and Spring, starting with Fall 2002. A Bioinformatics email list, open to all, began in September 2001 and now has over 130 subscribers and receives over 30 posts a month. Nine students enrolled in the first year; several have already graduated with the minor. The Graduate Program's present curriculum, administration, and structure will be presented and plans for development will be outlined.


2
Superfamily Expansion: Defining Sequence and Functional Space

Jennifer L. Seffernick, Larry P. Wackett, and Patsy C. Babbitt

Biochemistry, Molecular Biology and Biophysics

Defining the expanse of sequence space within highly divergent superfamilies is a difficult task. Previous methodologies have resulted in either few sequences or large false positive rates. Here results from a new method are presented. This method modified the program Shotgun for use with large numbers of PSI-BLAST outputs. The new program was tested by identifying sequence space for the amidohydrolase superfamily. The original program, Shotgun, had a false positive rate of 34%, with this superfamily, while the new program has a reduced false positive rate of 1.2%. This procedure has allowed for expansion of the amidohydrolase superfamily to over 600 nonidentical members.


3
Genomic Signature of the Failing Heart

David Fermin1, Xinqiang Han1, Maralyssa Bann1, Mohammad-Karim Ezzat1, Robert Hebert1, Suzanne Grindle2, Soon Park1, Yingjie Chen1, Robert Bache1, Leslie Miller1, Jennifer Hall1

1Medicine, 2Cancer Center Informatics Core

Heart failure affects an estimated 4.7 million Americans. The five-year survival rate of patients with heart failure is approximately 50%. Implantation of a mechanical ventricular assist device (VAD) has been shown to significantly decrease mortality in patients with refractory heart failure. A microarray chip representing 22,283 genes was used to examine transcriptome alterations in paired human heart samples in response to ventricular unloading with a VAD. We paired this approach with a supervised learning technique based on the theory of support vector machines to rank genes in order of their ability to discriminate hearts by their underlying etiology as well as in response to ventricular unloading. Initial analyses of 26 failing heart samples prior to implantation of the VAD revealed a distinct etiology-dependent stratification of genomic profiles. Gene ranking analysis identified 16 genes out of 22,283 that best discriminated idiopathic cardiomyopathy from ischemic cardiomyopathy versus acute myocardial infarction with an error rate of less than 6%. In line with these findings, the genomic signature of the unloaded heart was highly dependent upon the underlying etiology. We identified a subset of genes that best discriminated post-VAD from pre-VAD hearts within each cohort with error rates of less than 4%. In conclusion, our studies suggest that the genomic signature of the failing heart is dictated by the underlying cause of the disease. Furthermore, we demonstrated that the genomic response of the heart to ventricular support is uniquely dependent upon the underlying etiology of heart failure.


4
Soybean SAGE Tag Data Ming Using the TableView Software

Martina Stromvik1, J. Johnson1, J. Schupp2, C. Schmidt1, J. Crow1, E. Shoop3, P. Keim2, R. Shoemaker4, L. Vodkin5, E. Retzel1

1 Center for Computational Genomics and Bioinformatics, University of Minnesota, Minneapolis, MN, 55455; 2 Department of Biology, Northern Arizona University, Flagstaff, AZ, 86011; 3 Mathematics and Computer Science, Macalester College, St. Paul, MN, 55105; 4 USDA-ARS, Iowa State University, Ames, IA, 50011; 5 Department of Crop Sciences, University of Illinois, Urbana, 61801.

TableView is a portable Java application for visual data mining and exploration. Any tabular data can be loaded and shown it in multiple ways, such as in scatter plots, 3D scatter plots, parallel coordinates, line graphs, cluster and histograms. Points of interest can be selected in any of the views and simultaneously that selection will be shown in the other views.

Using TableView, we mined an aggregated set of soybean SAGE (Serial Analysis of Gene Expression) tag counts, orientation prediction scores, EST counts, EST contigs and annotation data. We show how different information can be obtained from this integrated data set. Some examples of questions we can answer are: which tags were highly abundant in particular tissues; which tags were annotated as belonging to gene families of interests; which tags did not match any consensus sequences; and which consensus sequences and tags had been annotated as unknown. http://ccgb.umn.edu/software/java/apps/TableView/


5
Microbial Pathway Prediction: A Functional Group Approach

Bo Kyeng Hou1, Lawrence P. Wackett1, and Lynda B.M. Ellis2

1BioTechnology Institute and 2Laboratory Medicine and Pathology

We have developed a system to predict microbial catabolism, using the University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD, http://umbbd.ahc.umn.edu/) as a knowledge base. The present system, available on the web (http://umbbd.ahc.umn.edu/predict/), can predict biodegradation of most of the major aliphatic and aromatic organic functional groups containing C, H, N, O and halogens. It can duplicate at least one known biodegradation pathway for 60% of the compounds in a 84-member validation set; most pathways that did not completely duplicate known metabolism could plausibly occur in nature. Users are encouraged, and have begun, to submit additional biotransformation rules and comment on existing rules; the system will further develop under the direction of the scientific community.


6
Differential Gene Expression in Porcine Peyer's Patch

Cheryl M. Dvorak1, Chad R. Ramler1, Kendra A. Hyland1, Yongqing Zhang2, Scott Fahrenkrug2, and Michael P. Murtaugh1

1Veterinary Pathobiology, 2Animal Science

Peyer's patches (PP), organized lymphoid tissues of the small intestine, sample the gut environment for pathogen surveillance and nutrient tolerance. We are characterizing the molecular mechanisms involved in oral tolerance and innate responses to enteric pathogens using an unbiased genomics approach. A subtracted cDNA library of approximately 4,800 transcripts enriched for sequences expressed in Peyer's patch was constructed. 5'-end sequencing revealed a putative UniGene set of 2433 unique genes of which 46% have either no BLAST match in Genbank or match sequences of unassigned function. Approximately 14% of the sequences correspond to genes with functions in immune response, intracellular signaling, and protein turnover or modification. Glass slide microarrays confirm differential expression in PP and regulation by immune response modifiers. In vivo and ex vivo disease models in swine, combined with microarray technology, will be a powerful tool for the identification of genes involved in the mucosal immune response.


7
wCLUTO: A Web-Enabled Clustering Toolkit

Matthew Rasmussen1, Mukund Deshpande1, George Karypis1, James Johnson2, John A. Crow2, and Ernest F. Retzel2

1Computer Science & Engineering, 2Computational Genomics and Bioinformatics

As structural and functional genomics efforts provide the biological community with ever-broadening sets of inter-related data, the need to explore such complex information for subtle relationships expands. We present wCluto (http://cluto.ccgb.umn.edu/), a web-enabled version of the stand-alone application Cluto, designed to apply clustering methods to genomic information. Its first application is focussed on the clustering transcriptome data from microarrays. Data can be uploaded by the user into the clustering tool, a choice of several clustering methods can be made and configured, and data is presented to the user in a variety of visual formats, including a three-dimensional mountain view of the clusters. Parameters can be explored to rapidly examine a variety of clustering results, and the resulting clusters can be downloaded either for manipulation by other programs or saved in a format for publication.


8
The Relative Importance of Segmental and Tandem Duplications in Gene Family Evolution in Arabidopsis thaliana

Steven B. Cannon1, Andrew Baumgarten1, Georgiana May1,2, and Nevin D. Young1,3

1Plant Biology, 2Ecology, Evolution and Behavior, 3Plant Pathology

The complete sequencing of the Arabidopsis thaliana genome has revealed numerous large-scale segmental duplications. These duplications have probably occurred several times, and can be placed into duplication age classes relative to one another (Vision et al., 2000). These segmental duplication blocks can be used to provide internal relative reference points in gene family phylogenies. At the same time, tandem or local duplications (closely related genes within 250 kb of one another) are also common. What are the relative frequencies of segmental and local duplications in the evolution of large gene families? We have developed software to identify clades in gene family phylogenies that have arisen either by segmental or local duplication. In Arabidopsis thaliana, we find that contributions made by these two mechanisms differ greatly from gene family to gene family. We describe the possible biological significance of these evolutionary differences for several gene families. Further details about the project are described at http://www.tc.umn.edu/~cann0010.


9
Bioinformatics Approaches and Databases in Cancer Center Research

Kevin Roberg-Perez, Suzanne Grindle, and Donald Connelly

Cancer Center Informatics Core

The mission of the University of Minnesota Cancer Center (UMCC) Informatics Core is to further the research of UMCC members and others at the University. Given the recent increase in bioinformatics needs among researchers, we provide bioinformatics support in the areas of database development, computational pipeline creation, and microarray analysis. Work to date has included the development of databases for the candida microarray consortium, mouse transposon insertions, and a head and neck cancer study. Sequence analysis pipelines have been developed for mapping and annotating insertions in the mouse genome (IMAP and CIS) and for identifying low copy transposons. In addition, assistance has been provided on a variety of microarray projects. In little over a year we have provided bioinformatics support to more than 20 laboratories and have assisted in the writing of several grants and publications.


10
Whole genome sequencing of Staphylococcus aureus isolated from bovine mastitis

Lisa Herron-Olson1, James M. Musser2 and Vivek Kapur1

1Biomedical Genomics Center and Veterinary Pathobiology, 2Rocky Mountain Laboratories, NIAID, NIH, Hamilton, MT.

Staphylococcus aureus (SA) is a leading cause of mastitis in dairy cattle worldwide despite the institution of extensive hygiene and prevention programs such as low level antimicrobial administration. In addition, SA causes significant morbidity and mortality among humans. Our previous analyses of the clonal population structure of SA recovered from animals and humans suggested that whole-genome sequencing and comparative analyses would enhance our understanding of the molecular mechanisms underlying pathogenesis and host specificity of SA. We here present the results of whole genomic sequencing of RF122, a bovine mastitis-associated strain of SA. Together with comparative genomic analyses of RF122 with the human pathogenic strains Mu50 and N315 (hospital-acquired MRSA) and MW2 (community-acquired MRSA), our analyses provide key insight on the genetic basis for pathogenicity and host specificity of SA.


11
Computational Biology at the Supercomputing Institute

Zheng Jin Tu, Patton Fast, Wayne Xu, Yuk Sham

Supercomputing Institute

The Supercomputing Institute for Digital Simulation and Advanced Computation is an interdisciplinary research program of the University of Minnesota. The Institute has a state-of-art bioinformatics infrastructure, large databases, a wide range of software, and technical skills to support bioinformatics and genomics research as well as large scale high-throughput bioinformatics project development.

The Supercomputing Institute has most of popular software for bioinformatics (BLAST, GCG, EMBOSS, Phred/Phrap ...), evolution (PHYLIP, PAUP, PAML ...), and microarray data analysis (Expressionist, GeneTraffic, GeneSpring, Spotfire ...). We also have numerous local biological databases available, including Genbank and human genome sequence data. The statistic packages such as SAS and R are available and can be used for bioinformatics data analysis. For database application development IBM DB2, Oracle, and SQL Server are all available at the Institute.

Please contact the Institute's user support staff to discuss your needs, projects, databases, and software requirements. For more information, please check Institute's Computational Biology and Genomics web page at http://www.msi.umn.edu/user_support/compgen/.


12
Pedigraph: a Software Tool for the Graphical Visualization of Large, Complex Pedigrees

John Garbe and Yang Da

Animal Science

Graphical pedigree visualization is helpful for studying the relationships among individuals, gene flows from generation to generation, and the population structure. However, graphical visualization of large complex pedigrees is often a humanly impossible task. Pedigraph provides rapid graphical visualization of large, complex pedigrees, with options for controlling colors, drawing size, page size and margins, drawing styles, extraction and highlighting of partial pedigrees involving selected individuals. Pedigraph can display all individuals in the data set, or display the number of offspring in each family by gender and trait status such as disease or normal phenotypes. Pedigraph requires a simple text pedigree file as user input, and is able to draw pedigrees with complex inbreeding structures over multiple generations in a population with a large number of individuals, as is common in animal populations. A trial version of the program is available at http://animalgene.umn.edu.


13
Identification of Sequences Involved in the Regulation of mRNA Stability in Resting and Activated T Cells

Jayprakash Vasdewani1, Arvind Raghavan2, Cavan S. Reilly3, George Karypis1 and Paul R. Bohjanen2

1Computer Science and Engineering, 2Microbiology, 3Biostatistics

The 5' and 3' untranslated regions (UTRs) of certain mRNA transcripts contain sequences are known to mediate mRNA stability. For example, AU-rich elements (AREs) in the 3' UTRs of cytokine and proto-oncogene transcripts mediate mRNA degradation, while 5' UTR sequences in IL-2 transcripts mediate pathway-dependent transcript stabilization. We predict that each mRNA transcript regulated at the level of mRNA stability contains one or more cis-regulatory sequences that interact with the cellular mRNA degradation machinery, and that coordinately regulated transcripts may contain conserved cis-regulatory sequences.

Using probabilistic approaches to motif prediction, we compare the outputs of the Gibbs sampler and MEME (Multiple EM for Motif Elicitation) algorithms over a range of motif lengths and mRNA stability profiles. By clustering T cell transcripts based on sequence similarity scores and mRNA stability parameters, and then comparing these sets of clusters, we wish to test our hypothesis that occurrence of cis-regulatory sequences can be correlated to mRNA stability profiles.


14
The Biology Student Workbench -- Bringing Bioinformatics to the Classroom, Enriching Biology Education

Kevin Messner

Bio-Medical and Magrath Libraries

The Biology Student Workbench project (http://bsw.ncsa.uiuc.edu), located primarily at the University of Illinois at Urbana-Champaign, provides training and curricular materials for biology teachers to include bioinformatics in their curricula. The project supports the Biology Workbench (http://workbench.sdsc.edu) and the Student Interface to the Biology Workbench (http://bsw.ncsa.uiuc.edu/cgi-bin/sib.py) as primary toolkits for investigation. The goals of the project are to 1) create a strong suite of tools usable by non-experts to conduct bioinformatics inquiries; 2) provide inquiry-based curricular materials for classroom use; and 3) establish a community of scientists and educators to support the use of computational research tools in education.


15
Statistical Significance Analysis of Longitudinal Gene Expression Data

Xu Guo1, Huilin Qi2, Catherine M. Verfaillie2, and Wei Pan1

1Biostatistics and 2Medicine

Longitudinal gene expression data arise from time- course microarray experiments, which are designed to study biological processes in a temporal fashion by taking samples from the same subject at different time points to measure gene expression levels. We apply generalized estimating equation techniques to construct a robust statistic, which is a variant of the robust Wald statistic, for longitudinal gene expression data to detect genes with temporal changes in expression. We associate significance levels to the proposed statistic by either incorporating the idea of the Significance Analysis of Microarrays (SAM) method (Tusher et al., 2001) or using the mixture model method (MMM) (Pan et al., 2002) to identify significant genes. The utility of the statistic is demonstrated through its application to an important study of osteoblast lineage-specific differentiation.


16
Linear Regression and Two-Class Classification with Gene Expression Data

Xiaohong Huang and Wei Pan

Biostatistics

Using gene expression data to classify (or predict) tumor types has received much research attention recently. Due to special features of gene expression data, several new methods have been proposed, including the weighted voting scheme of Golub et al (1999), the compound covariate method of Hedenfalk et al (2001) (originally proposed by Tukey (1993)), and the shrunken centroids method of Tibshirani et al (2002). These methods look different and are more or less ad hoc. Here we point out a close connection of the three methods with a linear regression model. Casting the classification problem in the general framework of linear regression naturally leads to new alternatives, such as modified partial least squares (PLS) methods and penalized PLS (PPLS) methods. Using two real data sets, we show the competitive performance of our new methods when compared with the other three methods.


17
Mining the Vertebrate Transcriptome for Secreted Proteins

Eric W. Klee1,5, Dan Carlson2, Scott C. Fahrenkrug2,3,5, Steve C. Ekker4,5, Lynda B. Ellis1,5

1Laboratory Medicine and Pathology, 2Animal Science, 3Animal Biotechnology Center, 4Genetics, Cell Biology and Development, 5Beckman Center for Transposon Research

Secreted proteins are high-priority targets for functional-annotation and gene-expression analysis research due to their mediation of short-range and long-range intercellular signaling during the development and growth of multi-cellular organisms. To obtain viable targets and facilitate research in this area across a broad spectrum of vertebrates, a method to identify secreted proteins from Expressed Sequence Tag (EST) databases was needed. We designed a system to overcome limitations incurred by existing prediction programs when analyzing low quality, artifactually truncated sequences. We used a hybrid of homology modeling, signal peptide prediction and homologous sequence pair alignment analysis to produce our predictions. Output from the analysis of The Institute for Genomic Research Porcine Gene Index and validation by microsome-coupled in vitro translation are presented.


18
A Logical Data Model for Comparative Vertebrate Genome Research

Sean Goggins1, Amber Kocemba1, Yongqing Zhang2,3, Kevin Roberg-Perez7, Eric Klee5,6, Michael Pickart4,6, John Keele8, Greg Harhay8, Jim Wray8, Warren Snelling8, Lynda Ellis5,6, Stephen C. Ekker4,6, David Largaespada4,5,6, John Carlis1, Scott C. Fahrenkrug2,3,6

1Computer Science and Engineering, 2Animal Science, 3Animal Biotechnology Center, 4Genetics, Cell Biology and Development, 5Laboratory Medicine and Pathology, 6Beckman Center for Transposon Research, 7Cancer Center Informatics Core, 8USDA Meat Animal Research Center (MARC), Clay City, Nebraska

Vertebrate comparative and functional genomics requires relating heterogeneous types of data from distributed sources, an exercise that would greatly benefit from the development of a suitable data model and relational database. We have undertaken the development of a logical data model to meet this need. This logical data structure (LDS) connects vertebrate genetic and physical mapping data to external databases (ENSEMBL, TIGR, NCBI, and Celera) by way of gene-ontology. The LDS has also been developed as the core of a laboratory information management system (LIMS) by providing a connection between animals, samples, and sequence data with the laboratory processes generating them. Specific portals into the data model originate from research efforts aimed at annotating the vertebrate genome by forward-, reverse- and para- genetic techniques being applied to livestock, mice and zebrafish. Our current LDS derives from the "reverse engineering" and integration of data models from MARC, the mouse transposon insertion database (MTID), and the zebrafish morpholino consortium. Connections to an LDS describing proteomic and gene expression data have also been anticipated. Physical implementation of the LDS will provide for accelerated candidate gene identification for improved animal production and animal & human health.


19
A Workbench for Managing and Analyzing DNA Sequence Data for Vertebrate Comparative and Functional Genomics

Yongqing Zhang1,3, Wen Dong4, Cheryl Dvorak2,3, Kendra Hyland2, Michael Murtaugh2,3, Scott C. Fahrenkrug1,3

1Animal Science, 2Veterinary Pathobiology, 3Animal Biotechnology Center, 4Supercomputing Institute

Panspecific comparison of animal phenotypes and genotypes is central to determining vertebrate gene-function. These comparisons require that we relate physical and genetic mapping data, gene-expression data, and data from forward and reverse genetic analysis of multiple vertebrate species. This poster describes a pipeline for tracking experimental animals, laboratory processes, and their connections to the generation of DNA sequence data. A laboratory information management system (LIMS) tracks the generation of DNA sequence in the context of a logical data model (see poster 18). Fields describing the genetic and experimental lineage of animals and the origin of template for DNA sequencing are captured prior to sample submission. After sequencing, raw trace files are parsed to a mySQL relational database running on a Linux computer. Trace files are automatically processed for quality and vector content and are stored in our system as SCF and FASTA files that can be viewed via an access-controlled web browser interface. Quality statistics are associated with each sample and sequencing plate for instant user feedback. Users can assemble groups of sequences and trace files for analysis or submission to public databases (dbEST, dbSTS, dbGSS, and TRACEdb) using a "shopping-cart"-based selection system (seqCart). Groomed sequence submission is an iterative and interactive process based on user defined parameters and submission-file inspection. Tools are provided for seqCart-based design of PCR primers or overgo-oligos, analysis by BLAST or BLAT, or assembly and annotation based on the POLYPHRED/PHRAP/CONSED suite. Implementation in a relational database will provide for seamless integration of additional tools as required. The entire system is now being migrated to Unix computers at the University of Minnesota Supercomputing Center and will make use of Oracle 9i.


20
Interaction of NK-Lysin with Lipid Monolayers

Himanshu Khandelia, Ben Anderson, Yiannis N. Kaznessis

Chemical Engineering and Material Science

We have conducted extensive simulations of the N-terminus of NK-lysin in zwitterionic and anionic lipid monolayers. NK-lysin is a basic antimicrobial polypeptide and its activity has been suggested by the high degree of homology to surfactant protein B (SP- B), which binds strongly to anionic phospholipids. The simulation results quantify the interactions between the peptide and mammalian and bacterial model membranes. Questions that are being answered conclusively are:

How does NK-lysin alter the interfacial properties of the membrane? Are there specific amino acid residues that have a significant role in regulating lipid-protein interactions? How does the nature of the lipids affect the answer to all of these questions? What are the differences between SP-B and NK-lysin interactions with lipid monolayers?
We leverage the gained knowledge to improve the design rules for lung surfactant-active peptides with antimicrobial activities.


Symp HomeBInf Home Page Author(s): Jeff Lande, Lynda Ellis