Searching WormBase for Information about Caenorhabditis elegans实验方法详情页

实验方法> 生物信息学技术> 数据库>Searching WormBase for Information about Caenorhabditis elegans

Searching WormBase for Information about Caenorhabditis elegans

关键词： searching wormbase来源：互联网

Abstract
Table of Contents
Figures
Literature Cited

Abstract

WormBase is the major public biological database for the nematode Caenorhabditis elegans . It is meant to be useful to any biologist who wants to use C. elegans , whatever his or her specialty. WormBase contains information about the genomic sequence of C. elegans , its genes and their products, and its higher?level traits such as gene expression patterns and neuronal connectivity. WormBase also contains genomic sequences and gene structures of C. briggsae and C. remanei , two closely related worms. These data are interconnected, so that a search beginning with one object (such as a gene) can be directed to related objects of a different type (e.g., the DNA sequence of the gene or the cells in which the gene is active). One can also perform searches for complex data sets. The WormBase developers group actively invites suggestions for improvements from the database users. WormBase's source code and underlying database are freely available for local installation and modification.

Keywords: Caenorhabditis elegans; WormBase; nematode; genomic annotation; gene expression pattern; RNAi; neuronal connectivity

GO TO THE FULL PROTOCOL: PDF or HTML at Wiley Online Library Table of Contents

Basic Protocol 1: Navigating the WormBase Home Page
Basic Protocol 2: Performing a Database Search
Basic Protocol 3: Examining a Gene in C. elegans
Basic Protocol 4: Examining a Molecular Sequence in C. elegans
Basic Protocol 5: Finding Protein Features
Basic Protocol 6: Searching for Gene Products with Particular Sequence Motifs
Basic Protocol 7: Using the Genome Browser
Basic Protocol 8: Viewing the C. briggsae Genome and its Synteny with C. elegans
Basic Protocol 9: Finding Sequence Similarities with Blast
Basic Protocol 10: Mining Gene Data with WormMart
Basic Protocol 11: Downloading a Batch of Sequences
Basic Protocol 12: Examining the Genomic Content of a Classical Genetic Interval
Basic Protocol 13: Using Other WormBase Searches
Alternate Protocol 1: Installing and Running WormBase Locally
Guidelines for Understanding Results
Commentary
Literature Cited
Figures

GO TO THE FULL PROTOCOL: PDF or HTML at Wiley Online Library Materials

GO TO THE FULL PROTOCOL: PDF or HTML at Wiley Online Library Figures

Figure 1.8.1 The Home page of the WormBase Web site, showing a general database search for zyg‐1 and the Web Site Directory. This page gives several different entry points for WormBase's diverse data. An example is shown of the simplest and broadest search (for Anything) with a single keyword. A menu of the most‐used database searches lines the top of the page, while a list of more specialized data fills the Web Site Directory on the page's left side.

View Image

Figure 1.8.2 Results of the database search in Figure . Having searched the entire database for anything matching zyg‐1 , one sees a plethora of disparate results: genes with zyg‐1 * names, protein‐coding sequences (CDSes), expression patterns, and archived research papers. The advantage of this sort of search is that it lowers the chance of missing a wanted item, but it necessarily requires picking and choosing among this sort of data slurry. Alternatively, one could pick a specific data class in the Find pull‐down menu (e.g., “Any Gene” or “Cell”; Fig. ) and get narrower, but better focused results.

View Image

Figure 1.8.3 The top of the Gene Summary page for zyg‐1 . WormBase organizes its data around a few key hubs. Gene Summary pages are perhaps the most important single such hub; they are intended to give a compact but full summary of everything known about a given gene in C. elegans . Even in this excerpt, one can get summarized gene function and orthology, a list of transcripts and their experimental evidence, links to DNA and protein sequences, a C. briggsae ortholog, and external database records.

View Image

Figure 1.8.4 Genetic and genomic information from the Gene Summary page for zyg‐1 . Further down the same page as in Figure is a small but detailed diagram of the gene's DNA structure, with links to transcripts, sequenced clones, and alleles. Along with this are given exact nucleotide coordinates and the meiotic gene map position.

View Image

Figure 1.8.5 The Sequence Summary page of F59E12.2 (linked to the zyg‐1 Gene Summary page by the link under sequence name). Most data for the exact nucleotide sequence is too detailed to be of immediate interest on a Gene Summary page, so it is given its own Sequence Summary page instead (linked to the Gene and Protein pages). These data are most useful in designing cloning experiments or direct perturbations of DNA function such as RNAi. Further down this page are another schematic diagram, a BLAST search launcher, exact coordinates of exons and introns in the genomic sequences, and a list of available cDNA clones.

View Image

Figure 1.8.6 Part of the CE28571 ( zyg‐1 ) Protein page, with a schematic diagram of CE28571's exons, protein motifs, low‐complexity domains (defined by the SEG program; Wootton, ), and similarities to proteins in other eukaryotic species. As with nucleotide sequences, proteins have enough detailed information to require their own specialized pages. WormBase's Protein pages give both text and diagrams to let a user map individual sequence features with respect to one another and to the protein's exonic coding sequences. The sequence features shown range from very generic (signal, low‐complexity, and predicted transmembrane) to broadly distributed but specific motifs (e.g., “tyrosine protein kinase”) and then to individual BLAST matches with highly similar proteins in other organisms. Diagramming all of these allows the user to quickly see what parts of the protein are likely to have distinct functions.

View Image

Figure 1.8.7 Protein motifs identified by a “ribonucleoprotein” search term. WormBase has an extensive catalog of protein motifs, taken from both the PFAM and the InterPro compilations. Keyword searches of these motifs are one way to subdivide a general protein type into several types with detailed functional differences.

View Image

Figure 1.8.8 Proteins identified as sharing a single motif. Motifs are evolutionarily mobile; they can be spread among homologous proteins or transferred horizontally between nonhomologous ones. Accordingly, each motif in WormBase is listed with the full set of proteins encoding it. This gives one way of identifying every gene product in C. elegans likely to participate in a shared biochemical function.

View Image

Figure 1.8.9 A view of the entire mitochondrial chromosome (mtDNA) in the Genome Browser. Like the Gene Summary page, the Genome Browser provides a central hub around which complex data can be economically organized. Here we see its view expanded to an entire chromosome. The view is customizable with many different user‐selected tracks (a few of which are visible).

View Image

Figure 1.8.10 An expanded Genome Browser view of the F59E12.2 ( zyg‐1 ) sequence, with added tracks for ESTs, mRNAs, and C. briggsae homologies. Where the Gene Summary page gives a text‐oriented, human‐readable summary of zyg‐1 , the Genome Browser here gives a view rooted in its DNA structure. Picking just a few tracks allows this view to link gene coexpression (through operons), likely regulatory sequences (i.e., noncoding DNA highly conserved in C. briggsae ), direct evidence for gene activity (ESTs and a cDNA), a genomic clone (archived in GenBank), and complexities of the gene's structure (including a nested gene with an entirely dissimilar mutant phenotype).

View Image

Figure 1.8.11 A view of 1 Mb of genomic DNA, centered on the F59E12.2 ( zyg‐1 ) sequence. Genome Browser views are customizable not only in their contents but in their size. Shown here is a tracked view spanning 1 Mb of genomic DNA. As the view grows, fine details are merged into an general map; this works best when one is looking for features that vary over a scale of tens or hundreds of thousands of nucleotides.

View Image

Figure 1.8.12 A view of 100 bp of genomic DNA immediately to the 5′ side of F59E12.2 ( zyg‐1 ). The opposite extreme of size selection is this 100‐nucleotide view of zyg‐1 's 5′‐flank. This view lists individual nucleotides and is ideal for fine resolution of transgenic construct or cis ‐regulatory sites. As in larger views, multiple tracks can be chosen to make easy comparisons of diverse features (e.g., cDNAs versus predicted start sites).

View Image

Figure 1.8.13 The Genome Browser showing the C. briggsae ortholog of zyg‐1 . C. briggsae 's genome is also available through the Genome Browser. This view of zyg‐1 confirms that its complex structure is indeed conserved in C. briggsae , while also showing small differences in intron size.

View Image

Figure 1.8.14 The Synteny Viewer showing the zyg‐1 / bli‐2 cluster in C. elegans and C. briggsae . Here the zyg‐1 loci from two Caenorhabditis species are shown in syntenic alignment, making their precise similarities and differences obvious. Like the Genome Browser, this view can be expanded to take in large chromosomal spans or contracted to single DNA sites. A particularly good use of this viewer is in working out the clearest possible view of an evolutionarily complex syntenic region.

View Image

Figure 1.8.15 A BLASTP search of WormPep release 147 with the human dymeclin (DYM) protein, which when mutated leads to Dyggve‐Melchior‐Clausen or Smith‐McCort dysplasia. BLAST searches in WormBase not only give hit results, but also give hyperlinks to their database records, making it easy to go from a positive search result to its Gene Summary page or to a view of its genomic region. Both strong and weak hits can be informative, since they can identify both orthologs and paralogs of a query sequence. Searches have a default cut‐off E‐value of 0.01, but this can be adjusted by the user for more or less stringency (and hits).

View Image

Figure 1.8.16 The Filter menu of WormMart, with filters set to select for pqn‐ * genes in C. elegans with uncoordinated RNAi phenotypes. WormMart gives the user a menu with which one or more of a great many different conditions can be imposed on data. Each condition is itself simple, but the freedom of users to choose and mix them with a graphical interface makes highly complex searches practical. This particular search started by choosing the WS140 data release (shown in the Summary on the right‐hand side) and its Gene data set. This still leaves the user with over 40,000 objects to sort through. In this simple search, the user has selected only those genes falling into the pqn class, which includes ∼100 genes encoding prion‐like proteins with domains highly enriched for glutamine (Q) or asparagine (N).

View Image

Figure 1.8.17 The Output menu for selecting sequence attributes, showing several different choices of gene substructure. After filtering, data in WormMart need to be exported, and again, many different choices of output contents and format exist. One particularly useful form is sequence output in which the user picks some type of gene structure (e.g., 5′ flanks, introns, or exons) for mass export from a selected gene set (selected by choices like those shown in Fig. ). As a given option for sequence export is picked, a small schematic diagram of the gene is marked in red to clarify what the option means in practice. Since the sequences are exported in FASTA format, the headers for these FASTA records can themselves be loaded with user‐selected data (e.g., gene names).

View Image

Figure 1.8.18 Final results of the search in Figure . Another option for user‐selected output is to have tables listing gene features rather than nucleotide sequences. This output was generated from the pqn‐ * search shown in Figure by selecting (in addition to the pqn gene class) for molecular and classical gene names, RNAi phenotypes, and conserved orthologous protein groups (KOGs). As with the Genome Browser, a strength of these user‐selected outputs is the ability to quickly compare disparate data sets in an easily scanned, well‐aligned format.

View Image

Figure 1.8.19 The graphical output from a search for genetic markers in the vicinity of hid ‐ 3 . Classical genetics in C. elegans remains crucial for finding new biological functions. Here the user has a gene map for the region around the uncloned hid ‐ 3 gene that integrates cloned genes, uncloned loci, predicted genes, and STS markers. Such a view makes it straightforward to design fine‐scale STS mapping and to identify other loci that might be allelic to hid ‐ 3 .

View Image

Figure 1.8.20 Part of the tabular output from a search for hid‐3 markers. Graphic and tabular gene maps have complementary uses. The graphical map in Figure lets the user take in a genetic region intuitively at a glance; this table lists the exact identity and details of its contents. Details include the meiotic map position, alleles, and laboratory strains for each gene in a region.

View Image

Figure 1.8.21 Results of a Gene Ontology (GO) search for “RNA splicing”. GO allows genes to be classified by their shared biochemical or biological roles whether or not their products have any similarity to each other. While this classification is powerful, it can be difficult to decipher because there are a great many GO terms, most with complex meanings. To help make sense of this complexity, searches in WormBase for GO terms give tables listing not only the names of terms, but also their definitions and the genes associated with them. Searching with a simple phrase such as “RNA splicing” can give many different results with highly detailed meanings.

View Image

Figure 1.8.22 Summary of the “RNA splicing” GO term in WormBase, with its connections to genes and protein motifs. Each GO term has its own summary page, accessible either through a term search (as in Fig. ) or through gene or protein motif pages. The broadly defined “RNA splicing” term is seen here to encompass two different genes and three different protein motifs. One link on this page leads to a browsable version of the entire Gene Ontology system.

View Image

Figure 1.8.23 A detailed view of the “RNA splicing, via transesterification reactions with bulged adenosine as nucleophile” GO term in WormBase defined, shown in its context of other GO terms, and connected to genes and protein motifs. Another view of a GO term, this time with a browsable context. As in Figure , links are given for associated genes and protein motifs, but here one can also see how this rather specialized term fits into the overall Gene Ontology. Note that this GO term is not actually the most narrow one possible, but is itself a parent term for three even more specialized terms (at the end of the derivation).

View Image

Figure 1.8.24 An expanded view of neuronal lineages. WormBase gives a graphically browsable diagram of C. elegans ' entire developmental lineage, from the fertilized egg to the adult body. Here is shown a small subset of that lineage, starting from the progenitor cell P1. Each node can be either collapsed or expanded by clicking on it to give simplified or elaborated views; here all the nodes have been expanded. Each cell type is given a hypertext link to its own Cell page.

View Image

Figure 1.8.25 The Cell Report page for AS1, a neuron in the P1 lineage seen in Figure . Clicking on the AS1 link in P1's lineage leads to this report, summarizing developmental and functional traits for this cell type. A single cell can belong to more than one group, defined either by cell class or by organ or tissue. Cells can be major progenitors of a lineage branch (blast cells), intermediates during development, or terminally differentiated. They can also have many different gene expression patterns associated with them, either generically (e.g., a gene expressed in neurons wi

推荐方法