[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
MEME job 14367 results: Dean"s Sequences
From: |
John W. Fondon III (Trey) |
Subject: |
MEME job 14367 results: Dean"s Sequences |
Date: |
Thu, 12 Feb 1998 15:06:24 -0600 |
>From: address@hidden
>Date: Tue, 10 Feb 1998 14:03:31 +0100 (MET)
>To: address@hidden
>Subject: MEME job 14367 results: Dean"s Sequences
>
>Note: Save these results to a file if you intend to use MAST
>
>***************************************************************************
>*****
>MEME - Motif discovery tool
>***************************************************************************
>*****
>MEME version 2.0 (Release date: 1996/11/17 00:39:06)
>
>For further information on how to interpret these results or to get
>a copy of the MEME software please access http://www.sdsc.edu/MEME.
>
>This file may be used as input to the MAST algorithm for searching
>sequence databases for matches to groups of motifs. MAST is available
>for interactive use and downloading at http://www.sdsc.edu/MEME.
>***************************************************************************
>*****
>
>
>***************************************************************************
>*****
>REFERENCE
>***************************************************************************
>*****
>If you use this program in your research, please cite:
>
>Timothy L. Bailey and Charles Elkan,
>"Fitting a mixture model by expectation maximization to discover
>motifs in biopolymers", Proceedings of the Second International
>Conference on Intelligent Systems for Molecular Biology, pp. 28-36,
>AAAI Press, Menlo Park, California, 1994.
>***************************************************************************
>*****
>
>
>***************************************************************************
>*****
>TRAINING SET
>***************************************************************************
>*****
>DATAFILE= meme.14367.data (deleted by web version of MEME)
>ALPHABET= ACDEFGHIKLMNPQRSTVWY
>Sequence name Length Sequence name Length
>------------- ------ ------------- ------
>PBP-1 148 PBP-2 150
>PBP-3 154 PBP-5 143
>LUSH 153
>***************************************************************************
>*****
>
>
>***************************************************************************
>*****
>EXPLANATION OF RESULTS
>***************************************************************************
>*****
>For each motif that it discovers in the training set, MEME prints the
>following information:
>
> Summary Line
>
> This line gives the width (`width') and expected number of occurrences in
> the training set (`sites') of the motif. MEME numbers the motifs
> consecutively from one as it finds them. MEME usually finds the most
> statistically significant motifs first. Each motif describes a pattern of
> a fixed width--no gaps are allowed in MEME motifs. MEME estimates
> the number of places the motif occurs in the training set. This need
> not be an integer value.
>
> Simplified Motif Letter-probability Matrix
>
> MEME motifs are represented by letter-probability matrices that
> specify the probability of each possible letter appearing at each
> possible position in an occurrence of the motif. In order to make it
> easier to see which letters are most likely in each of the columns of
> the motif, the simplified motif shows the letter probabilities multiplied
> by 10 rounded to the nearest integer. Zeros are replaced by ":" (the
> colon) for readability.
>
> Information Content Diagram
>
> The information content diagram provides an idea of which positions
> in the motif are most highly conserved. Each column (position) in a
> motif can be characterized by the amount of information it contains
> (measured in bits). Highly conserved positions in the motif have high
> information; positions where all letters are equally likely have low
> information. The diagram is printed so that each column lines up with
> the same column in the simplified motif letter-probability matrix above
> it. Summing the information content for each position in the motif
> gives the total information content of the motif (shown in parentheses
> to the left of the diagram). This gives a measure of the usefulness of
> the motif for database searches. For a motif to be useful for database
> searches, it must as a rule contain at least log_2(N) bits of
> information where N is the number of sequences in the database
> being searched. For example, to effectively search a database
> containing 100,000 sequences for occurrences of a single motif, the
> motif should have an IC of at least 16.6 bits. Motifs with lower
> information content are still useful when a family of sequences shares
> more than one motif since they can be combined in multiple motif
> searches (using MAST).
>
> Multilevel Consensus Sequence
>
> The multilevel consensus sequence corresponding to the motif is an
> aid in remembering and understanding the motif. It is calculated from
> the motif letter-probability matrix as follows. Separately for each
> column of the motif, the letters in the alphabet are sorted in
> decreasing order by the probability with which they are expected to
> occur in that position of motif occurrences. The sorted letters are
> then printed vertically with the most probable letter on top. Only
> letters with probabilities of 0.2 or higher at that position in the motif
> are printed. As an example, the multilevel consensus sequence of
> motif 2 in the sample output is:
>
> Multilevel LITGAASGIG
> consensus V GS
> sequence G
>
> This multilevel consensus sequence says several things about the
> motif. First, the most likely form of the motif can be read from the top
> line as LITGAASGIG. Second, that only letter L has probability
> more than 0.2 in position 1 of the motif, both I and V have probability
> greater than 0.2 in position 2, etc. Third, a rough approximation of the
> motif can be made by converting the multilevel consensus sequence
> into the Prosite signature
> L-[IV]-T-G-[AG]-[ASG]-S-G-I-G. The multilevel
> consensus sequence is printed so that each column lines up with
> the same column in the simplified motif and information content
> diagrams above it.
>
> Possible Examples of the Motif
>
> As a further aid in understanding the motif, MEME displays a list of
> possible occurrences of the motif in the training set. This list is made
> by converting the motif letter-probability matrix into a
> position-dependent scoring matrix (log-odds matrix) and using that
> to compute a match score between each position in the training set
> and the motif. All positions which score above a threshold score are
> listed. (The threshold score is chosen by MEME such that the
> expected number of non-motif positions listed in error will equal the
> number of actual motif positions not listed.) The format of the list is
> sequence name, starting position of the (putative) occurrence, match
> score of the position, and the actual sequence including the ten
> positions before and after the motif occurrence (`site').
>
> Position-dependent Scoring Matrix
>
> The position-dependent scoring matrix corresponding to the motif is
> printed for use by database search programs such as MAST. This
> matrix is a log-odds matrix calculated by taking the log (base 2) of
> the ratio p/f at each position in the motif where p is the probability
> of a particular letter at that position in the motif, and f is the average
> frequency of that letter in the training set. The scoring matrix is
> printed "sideways"--columns correspond to the letters in the
> alphabet (in the same order as shown in the simplified motif) and
> rows corresponding to the positions of the motif, position one first.
> The scoring matrix is preceded by a line starting with "log-odds
> matrix:" and containing the length of the alphabet, width of the motif,
> number of characters in the training set and the scoring threshold
> used in the list of possible motif examples.
>
> Motif Letter-probability Matrix
>
> The motif itself is a position-dependent letter-probability matrix
> giving, for each position in the pattern, the probabilities of each
> possible letter occurring there. The letter-probability matrix is printed
> "sideways"--columns correspond to the letters in the alphabet (in
> the same order as shown in the simplified motif) and rows
> corresponding to the positions of the motif, position one first. The
> motif is preceded by a line starting with "letter-probability matrix:" and
> containing the length of the alphabet, width of the motif and number of
> characters in the training set.
>***************************************************************************
>*****
>
>
>***************************************************************************
>*****
>MOTIF 1 width = 8 sites = 6.5
>***************************************************************************
>*****
>Simplified A 11:::12:
>motif letter- C :::a:::a
>probability D 1:::::::
>matrix E 3:::::1:
> F ::::2:::
> G ::::::::
> H ::::::::
> I :2:::1::
> K 1:9::11:
> L :2:::1::
> M :1::::::
> N ::::::1:
> P ::::::::
> Q 1:::::1:
> R :::::11:
> S 1::::11:
> T :1:::11:
> V :2:::1::
> W ::::::::
> Y ::::4:::
>
> bits 7.0
> 6.3
> 5.6
> 4.9
>Information 4.2
>content 3.5 * *
>(13.6 bits) 2.8 ** *
> 2.1 ** *
> 1.4 *** *
> 0.7 *** *
> 0.0 --------
>
>Multilevel ExKCYxAC
>consensus F
>sequence
>
>
>---------------------------------------------------------------------
> Possible examples of motif 1 in the training set
>---------------------------------------------------------------------
>Sequence name Start Score Site
>------------- ----- ----- --------
>PBP-1 67 20.91 VKNRILPTDP EIKCFLYC MFDMFGLIDS
>PBP-1 127 12.23 GKDGCDTAYE TVKCYIAV NGKFIWEEII
>PBP-2 65 18.68 LMSHDLPERH EAKCLRAC VMKKLQIMDE
>PBP-3 79 21.60 FSDGEIHEDE KLKCYMNC FFHEIEVVDD
>PBP-5 62 14.63 MVKKQPASTY AGKCLRAC VMKNIGILDA
>LUSH 72 14.17 VGDFNFPPSQ DLMCYTKC VSLMAGTVNK
>LUSH 138 7.08 FKESCERVYQ TAKCFSEN ADGQFMWP
>---------------------------------------------------------------------
>
>log-odds matrix: alength= 20 w= 8 n= 713 bayes= 6.75752
> -0.072 -2.294 1.033 2.187 -2.229 -1.216 -0.057 -1.836 0.518
>-1.720 -0.846 0.054 -1.223 0.883 -0.232 -0.279 -0.363 -1.286
>-2.274 -1.483
> 0.244 0.158 -2.349 -1.737 0.193 -1.555 -0.768 1.440 -1.425
>0.826 1.219 -1.334 -1.781 -1.078 -1.294 -0.734 0.068 1.447
>-0.613 -0.311
> -3.389 -3.411 -4.513 -4.110 -5.101 -4.489 -3.361 -3.797 3.947
>-4.566 -3.588 -3.325 -4.324 -3.534 -0.293 -4.194 -3.654 -4.408
>-3.882 -4.466
> -3.699 5.724 -5.166 -4.871 -4.666 -5.364 -4.502 -4.113 -5.300
>-4.737 -3.774 -4.857 -5.382 -4.971 -4.733 -4.639 -3.896 -4.592
>-5.347 -5.263
> -1.589 -1.141 -2.342 -2.261 2.365 -2.606 1.052 -1.346 -2.047
>-0.925 -0.707 -1.528 -2.433 -1.535 -1.691 -1.614 -1.865 -1.349
>1.223 3.710
> 0.107 -0.137 -1.233 -0.561 -0.103 -1.335 -0.122 0.773 0.348
>0.368 0.952 -0.556 -1.476 0.015 0.199 -0.385 0.163 0.820
>-0.691 -0.331
> 1.584 -0.866 -0.072 0.400 -1.475 -0.786 0.184 -1.322 0.578
>-1.292 -0.420 0.199 -1.226 0.544 0.095 0.215 -0.040 -0.724
>-1.616 -0.840
> -3.699 5.724 -5.166 -4.871 -4.666 -5.363 -4.502 -4.113 -5.300
>-4.737 -3.773 -4.854 -5.382 -4.971 -4.733 -4.639 -3.896 -4.588
>-5.347 -5.263
>
>letter-probability matrix: alength= 20 w= 8 n= 713
> 0.069599 0.003705 0.105864 0.283811 0.008591 0.029844 0.021561
>0.015762 0.083778 0.027846 0.012834 0.047837 0.021702 0.075184
>0.044203 0.060840 0.046201 0.026387 0.002759 0.011691
> 0.086652 0.020260 0.010154 0.018700 0.046033 0.023599 0.013173
>0.152670 0.021776 0.162565 0.053708 0.018279 0.014747 0.019310
>0.021158 0.044371 0.062288 0.175487 0.008726 0.026342
> 0.006983 0.001707 0.002266 0.003610 0.001173 0.003087 0.002183
>0.004049 0.901987 0.003872 0.001918 0.004599 0.002530 0.003518
>0.042352 0.004031 0.004718 0.003031 0.000905 0.001479
> 0.005632 0.959815 0.001441 0.002130 0.001586 0.001684 0.000990
>0.003252 0.001485 0.003439 0.001687 0.001590 0.001215 0.001299
>0.001952 0.002961 0.003991 0.002669 0.000328 0.000851
> 0.024319 0.008235 0.010208 0.013002 0.207514 0.011384 0.046495
>0.022139 0.014156 0.048297 0.014133 0.015974 0.009386 0.014063
>0.016076 0.024114 0.016311 0.025266 0.031147 0.427782
>
==================================
Swarm-Support is for discussion of the technical details of the day
to day usage of Swarm. For list administration needs (esp.
[un]subscribing), please send a message to <address@hidden>
with "help" in the body of the message.
==================================
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- MEME job 14367 results: Dean"s Sequences,
John W. Fondon III (Trey) <=