MEME job 14367 results: Dean"s Sequences

swarm-support
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
MEME job 14367 results: Dean"s Sequences

From:	John W. Fondon III (Trey)
Subject:	MEME job 14367 results: Dean"s Sequences
Date:	Thu, 12 Feb 1998 15:06:24 -0600
>From: address@hidden
>Date: Tue, 10 Feb 1998 14:03:31 +0100 (MET)
>To: address@hidden
>Subject: MEME job 14367 results: Dean"s Sequences
>
>Note: Save these results to a file if you intend to use MAST
>
>***************************************************************************
>*****
>MEME - Motif discovery tool
>***************************************************************************
>*****
>MEME version 2.0 (Release date: 1996/11/17 00:39:06)
>
>For further information on how to interpret these results or to get
>a copy of the MEME software please access http://www.sdsc.edu/MEME.
>
>This file may be used as input to the MAST algorithm for searching
>sequence databases for matches to groups of motifs.  MAST is available
>for interactive use and downloading at http://www.sdsc.edu/MEME.
>***************************************************************************
>*****
>
>
>***************************************************************************
>*****
>REFERENCE
>***************************************************************************
>*****
>If you use this program in your research, please cite:
>
>Timothy L. Bailey and Charles Elkan,
>"Fitting a mixture model by expectation maximization to discover
>motifs in biopolymers", Proceedings of the Second International
>Conference on Intelligent Systems for Molecular Biology, pp. 28-36,
>AAAI Press, Menlo Park, California, 1994.
>***************************************************************************
>*****
>
>
>***************************************************************************
>*****
>TRAINING SET
>***************************************************************************
>*****
>DATAFILE= meme.14367.data (deleted by web version of MEME)
>ALPHABET= ACDEFGHIKLMNPQRSTVWY
>Sequence name           Length         Sequence name           Length
>-------------           ------         -------------           ------
>PBP-1                      148         PBP-2                      150
>PBP-3                      154         PBP-5                      143
>LUSH                       153
>***************************************************************************
>*****
>
>
>***************************************************************************
>*****
>EXPLANATION OF RESULTS
>***************************************************************************
>*****
>For each motif that it discovers in the training set, MEME prints the
>following information:
>
>    Summary Line
>
>    This line gives the width (`width') and expected number of occurrences in
>    the training set (`sites') of the motif. MEME numbers the motifs
>    consecutively from one as it finds them. MEME usually finds the most
>    statistically significant motifs first. Each motif describes a pattern of
>    a fixed width--no gaps are allowed in MEME motifs. MEME estimates
>    the number of places the motif occurs in the training set. This need
>    not be an integer value.
>
>    Simplified Motif Letter-probability Matrix
>
>    MEME motifs are represented by letter-probability matrices that
>    specify the probability of each possible letter appearing at each
>    possible position in an occurrence of the motif. In order to make it
>    easier to see which letters are most likely in each of the columns of
>    the motif, the simplified motif shows the letter probabilities multiplied
>    by 10 rounded to the nearest integer. Zeros are replaced by ":" (the
>    colon) for readability.
>
>    Information Content Diagram
>
>    The information content diagram provides an idea of which positions
>    in the motif are most highly conserved. Each column (position) in a
>    motif can be characterized by the amount of information it contains
>    (measured in bits). Highly conserved positions in the motif have high
>    information; positions where all letters are equally likely have low
>    information. The diagram is printed so that each column lines up with
>    the same column in the simplified motif letter-probability matrix above
>    it. Summing the information content for each position in the motif
>    gives the total information content of the motif (shown in parentheses
>    to the left of the diagram). This gives a measure of the usefulness of
>    the motif for database searches. For a motif to be useful for database
>    searches, it must as a rule contain at least log_2(N) bits of
>    information where N is the number of sequences in the database
>    being searched. For example, to effectively search a database
>    containing 100,000 sequences for occurrences of a single motif, the
>    motif should have an IC of at least 16.6 bits. Motifs with lower
>    information content are still useful when a family of sequences shares
>    more than one motif since they can be combined in multiple motif
>    searches (using MAST).
>
>    Multilevel Consensus Sequence
>
>    The multilevel consensus sequence corresponding to the motif is an
>    aid in remembering and understanding the motif. It is calculated from
>    the motif letter-probability matrix as follows. Separately for each
>    column of the motif, the letters in the alphabet are sorted in
>    decreasing order by the probability with which they are expected to
>    occur in that position of motif occurrences. The sorted letters are
>    then printed vertically with the most probable letter on top. Only
>    letters with probabilities of 0.2 or higher at that position in the motif
>    are printed. As an example, the multilevel consensus sequence of
>    motif 2 in the sample output is:
>
>    Multilevel       LITGAASGIG
>    consensus         V  GS
>    sequence              G
>
>    This multilevel consensus sequence says several things about the
>    motif. First, the most likely form of the motif can be read from the top
>    line as LITGAASGIG. Second, that only letter L has probability
>    more than 0.2 in position 1 of the motif, both I and V have probability
>    greater than 0.2 in position 2, etc. Third, a rough approximation of the
>    motif can be made by converting the multilevel consensus sequence
>    into the Prosite signature
>    L-[IV]-T-G-[AG]-[ASG]-S-G-I-G. The multilevel
>    consensus sequence is printed so that each column lines up with
>    the same column in the simplified motif and information content
>    diagrams above it.
>
>    Possible Examples of the Motif
>
>    As a further aid in understanding the motif, MEME displays a list of
>    possible occurrences of the motif in the training set. This list is made
>    by converting the motif letter-probability matrix into a
>    position-dependent scoring matrix (log-odds matrix) and using that
>    to compute a match score between each position in the training set
>    and the motif. All positions which score above a threshold score are
>    listed. (The threshold score is chosen by MEME such that the
>    expected number of non-motif positions listed in error will equal the
>    number of actual motif positions not listed.) The format of the list is
>    sequence name, starting position of the (putative) occurrence, match
>    score of the position, and the actual sequence including the ten
>    positions before and after the motif occurrence (`site').
>
>    Position-dependent Scoring Matrix
>
>    The position-dependent scoring matrix corresponding to the motif is
>    printed for use by database search programs such as MAST. This
>    matrix is a log-odds matrix calculated by taking the log (base 2) of
>    the ratio p/f at each position in the motif where p is the probability
>    of a particular letter at that position in the motif, and f is the average
>    frequency of that letter in the training set. The scoring matrix is
>    printed "sideways"--columns correspond to the letters in the
>    alphabet (in the same order as shown in the simplified motif) and
>    rows corresponding to the positions of the motif, position one first.
>    The scoring matrix is preceded by a line starting with "log-odds
>    matrix:" and containing the length of the alphabet, width of the motif,
>    number of characters in the training set and the scoring threshold
>    used in the list of possible motif examples.
>
>    Motif Letter-probability Matrix
>
>    The motif itself is a position-dependent letter-probability matrix
>    giving, for each position in the pattern, the probabilities of each
>    possible letter occurring there. The letter-probability matrix is printed
>    "sideways"--columns correspond to the letters in the alphabet (in
>    the same order as shown in the simplified motif) and rows
>    corresponding to the positions of the motif, position one first. The
>    motif is preceded by a line starting with "letter-probability matrix:" and
>    containing the length of the alphabet, width of the motif and number of
>    characters in the training set.
>***************************************************************************
>*****
>
>
>***************************************************************************
>*****
>MOTIF  1                width =   8     sites =  6.5
>***************************************************************************
>*****
>Simplified     A  11:::12:
>motif letter-  C  :::a:::a
>probability    D  1:::::::
>matrix         E  3:::::1:
>               F  ::::2:::
>               G  ::::::::
>               H  ::::::::
>               I  :2:::1::
>               K  1:9::11:
>               L  :2:::1::
>               M  :1::::::
>               N  ::::::1:
>               P  ::::::::
>               Q  1:::::1:
>               R  :::::11:
>               S  1::::11:
>               T  :1:::11:
>               V  :2:::1::
>               W  ::::::::
>               Y  ::::4:::
>
>         bits 7.0
>              6.3
>              5.6
>              4.9
>Information   4.2
>content       3.5    *   *
>(13.6 bits)   2.8   **   *
>              2.1   **   *
>              1.4   ***  *
>              0.7   ***  *
>              0.0 --------
>
>Multilevel        ExKCYxAC
>consensus             F
>sequence
>
>
>---------------------------------------------------------------------
>        Possible examples of motif 1 in the training set
>---------------------------------------------------------------------
>Sequence name             Start  Score              Site
>-------------             -----  -----            --------
>PBP-1                        67  20.91 VKNRILPTDP EIKCFLYC MFDMFGLIDS
>PBP-1                       127  12.23 GKDGCDTAYE TVKCYIAV NGKFIWEEII
>PBP-2                        65  18.68 LMSHDLPERH EAKCLRAC VMKKLQIMDE
>PBP-3                        79  21.60 FSDGEIHEDE KLKCYMNC FFHEIEVVDD
>PBP-5                        62  14.63 MVKKQPASTY AGKCLRAC VMKNIGILDA
>LUSH                         72  14.17 VGDFNFPPSQ DLMCYTKC VSLMAGTVNK
>LUSH                        138   7.08 FKESCERVYQ TAKCFSEN ADGQFMWP
>---------------------------------------------------------------------
>
>log-odds matrix: alength= 20 w= 8 n= 713 bayes= 6.75752
> -0.072  -2.294   1.033   2.187  -2.229  -1.216  -0.057  -1.836   0.518
>-1.720  -0.846   0.054  -1.223   0.883  -0.232  -0.279  -0.363  -1.286
>-2.274  -1.483
>  0.244   0.158  -2.349  -1.737   0.193  -1.555  -0.768   1.440  -1.425
>0.826   1.219  -1.334  -1.781  -1.078  -1.294  -0.734   0.068   1.447
>-0.613  -0.311
> -3.389  -3.411  -4.513  -4.110  -5.101  -4.489  -3.361  -3.797   3.947
>-4.566  -3.588  -3.325  -4.324  -3.534  -0.293  -4.194  -3.654  -4.408
>-3.882  -4.466
> -3.699   5.724  -5.166  -4.871  -4.666  -5.364  -4.502  -4.113  -5.300
>-4.737  -3.774  -4.857  -5.382  -4.971  -4.733  -4.639  -3.896  -4.592
>-5.347  -5.263
> -1.589  -1.141  -2.342  -2.261   2.365  -2.606   1.052  -1.346  -2.047
>-0.925  -0.707  -1.528  -2.433  -1.535  -1.691  -1.614  -1.865  -1.349
>1.223   3.710
>  0.107  -0.137  -1.233  -0.561  -0.103  -1.335  -0.122   0.773   0.348
>0.368   0.952  -0.556  -1.476   0.015   0.199  -0.385   0.163   0.820
>-0.691  -0.331
>  1.584  -0.866  -0.072   0.400  -1.475  -0.786   0.184  -1.322   0.578
>-1.292  -0.420   0.199  -1.226   0.544   0.095   0.215  -0.040  -0.724
>-1.616  -0.840
> -3.699   5.724  -5.166  -4.871  -4.666  -5.363  -4.502  -4.113  -5.300
>-4.737  -3.773  -4.854  -5.382  -4.971  -4.733  -4.639  -3.896  -4.588
>-5.347  -5.263
>
>letter-probability matrix: alength= 20 w= 8 n= 713
> 0.069599  0.003705  0.105864  0.283811  0.008591  0.029844  0.021561
>0.015762  0.083778  0.027846  0.012834  0.047837  0.021702  0.075184
>0.044203  0.060840  0.046201  0.026387  0.002759  0.011691
> 0.086652  0.020260  0.010154  0.018700  0.046033  0.023599  0.013173
>0.152670  0.021776  0.162565  0.053708  0.018279  0.014747  0.019310
>0.021158  0.044371  0.062288  0.175487  0.008726  0.026342
> 0.006983  0.001707  0.002266  0.003610  0.001173  0.003087  0.002183
>0.004049  0.901987  0.003872  0.001918  0.004599  0.002530  0.003518
>0.042352  0.004031  0.004718  0.003031  0.000905  0.001479
> 0.005632  0.959815  0.001441  0.002130  0.001586  0.001684  0.000990
>0.003252  0.001485  0.003439  0.001687  0.001590  0.001215  0.001299
>0.001952  0.002961  0.003991  0.002669  0.000328  0.000851
> 0.024319  0.008235  0.010208  0.013002  0.207514  0.011384  0.046495
>0.022139  0.014156  0.048297  0.014133  0.015974  0.009386  0.014063
>0.016076  0.024114  0.016311  0.025266  0.031147  0.427782
>



                  ==================================
   Swarm-Support is for discussion of the technical details of the day
   to day usage of Swarm.  For list administration needs (esp.
   [un]subscribing), please send a message to <address@hidden>
   with "help" in the body of the message.
                  ==================================
[Prev in Thread]
Current Thread
[Next in Thread]
MEME job 14367 results: Dean"s Sequences, John W. Fondon III (Trey) <=
Prev by Date: Re: Workshop on Integrating GIS and Multi-Agent Modellingn Techniques announcement
Next by Date: MEME job 14367 confirmation: Dean"s Sequences
Previous by thread: Re: -help -- suggestion for future
Next by thread: MEME job 14367 confirmation: Dean"s Sequences
Index(es):
- Date
- Thread