StroCDB Home
Introduction Methods Web-only Data Figures Classification Search BLAST

Bioinformatics


Sequence Analysis

DNA sequences and conceptual translations were compared in-house with known nucleotide and protein sequences using the BLAST (Basic Local Alignment Search Tool) algorithm (blastn for nucleotide and blastx for protein databases). Six publicly-accessible databases were searched: SwissProt, Genbank non-redundant (nr) protein, Genbank nr nucleotide, dbEST expressed sequence tags, and the murine and human DoTS databases of transcribed sequences. The DoTS databases have been coalesced into Allgenes. Sequences were also compared with those in StroCDB itself as a measure of internal redundancy. Potential open reading frames (ORFs) were located using ORF Finder. Protein motif searches were performed using four different motif identification programs: Prosite, Pfam, Prodom, and SMART. Transmembrane helices were detected using the TMPred server and potential signal peptides were detected with SignalP. Categorization of sequence homology was based on the following criteria: 1) Exact match, identity to a published mouse protein; 2) Homolog: near-identity to a published protein from a species other than mouse; 3) Family Member: homology indicating relatedness to a described protein family; 4) EST only: no extensive homology to any published or characterized protein, but identity to expressed sequence tags from mouse or another species; 5) No Match: no extensive homology to any nucleotide or protein sequence in any of the public databases.

Database Architecture

The StroCDB database is a relational database with MySQL as the database manager. It contains all primary sequence data, results of bioinformatic analyses, and array expression results. The database architecture allows incorporation of new expression data or results of other experiments as they become available.