Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tell me the different between two version cap?

    Recently , i have used two version cap3, the before version is buided at 04/15/05, the latest VersionDate is 10/15/07.

    when i used the two version software to assemble the following datasets with default parameters, but, i get different result, the old version products a contigs, however, the latest result is two singletons.

    So, i compare the parameters of two versions.I find, there are four different default parameters just bold line, and at the new version, three parameters (just red line)are added.

    My question is that, who can tell me the three new added parameters mean ?

    The another puzzle is that, i want to give the old parameter to new version, the command is "cap3 ./temp.fas -o 40 -p 80 -s 900 -y 250", but the result is also two singletons. Why?????? Is it the three new para meters to effect the assmmble???

    Best regards!

    old parameters(04/15/05)
    -a N specify band expansion size N > 10 (20)
    -b N specify base quality cutoff for differences N > 15 (20)
    -c N specify base quality cutoff for clipping N > 5 (12)
    -d N specify max qscore sum at differences N > 20 (200)
    -e N specify clearance between no. of diff N > 10 (30)
    -f N specify max gap length in any overlap N > 1 (20)
    -g N specify gap penalty factor N > 0 (6)
    -h N specify max overhang percent length N > 2 (20)
    -m N specify match score factor N > 0 (2)
    -n N specify mismatch score factor N < 0 (-5)
    -o N specify overlap length cutoff > 20 (40)
    -p N specify overlap percent identity cutoff N > 65 (80)

    -r N specify reverse orientation value N >= 0 (1)
    -s N specify overlap similarity score cutoff N > 400 (900)
    -t N specify max number of word matches N > 30 (300)
    -u N specify min number of constraints for correction N > 0 (3)
    -v N specify min number of constraints for linking N > 0 (2)
    -w N specify file name for clipping information (none)
    -x N specify prefix string for output file names (cap)
    -y N specify clipping range N > 5 (250)
    -z N specify min no. of good reads at clip pos N > 0 (3)

    new parameters(10/15/07)
    -a N specify band expansion size N > 10 (20)
    -b N specify base quality cutoff for differences N > 15 (20)
    -c N specify base quality cutoff for clipping N > 5 (12)
    -d N specify max qscore sum at differences N > 20 (200)
    -e N specify clearance between no. of diff N > 10 (30)
    -f N specify max gap length in any overlap N > 1 (20)
    -g N specify gap penalty factor N > 0 (6)
    -h N specify max overhang percent length N > 2 (20)
    -i N specify segment pair score cutoff N > 20 (40)
    -j N specify chain score cutoff N > 30 (80)
    -k N specify end clipping flag N >= 0 (1)

    -m N specify match score factor N > 0 (2)
    -n N specify mismatch score factor N < 0 (-5)
    -o N specify overlap length cutoff > 15 (40)
    -p N specify overlap percent identity cutoff N > 65 (90)

    -r N specify reverse orientation value N >= 0 (1)
    -s N specify overlap similarity score cutoff N > 250 (900)
    -t N specify max number of word matches N > 30 (300)
    -u N specify min number of constraints for correction N > 0 (3)
    -v N specify min number of constraints for linking N > 0 (2)
    -w N specify file name for clipping information (none)
    -x N specify prefix string for output file names (cap)
    -y N specify clipping range N > 5 (100)
    -z N specify min no. of good reads at clip pos N > 0 (3)




    >lcl|Seq274836 No definition line found
    TCAGCCGCGCAGGTATACTGACAGTGATATCACTTCCTACTAGCTAGCTGCTACTTGAAA
    CTAAAGTTTTACTCTTAAGGTTCTGAAAGATTTAATAGGAACAGTATGTGGTCCTCCATA
    GGATGAATTGGTTGCAATTGAGCAATAGTGTCAAAATCATCAGTTGATCAATTCTTCTGC
    ATAACCATTTATGTGAAATTGACTAGAAAACAAGTTGCAAGAGAAAAATAAAGCTTTCTG
    GTTTAGCTTGTTGTGTTAGCCCTTTCAGACACAGGTCAGTGTTGAACATATTTCTAAGAT
    AATTAGGTTAGCTAAGATGAGAGGCAAACTCTATTTATTTGTGACCCTAAAAATGGTAGA
    CTTACAAACGCCTAACTTAATCATACTCAATCTTCATGTCTACTTCAGTTAAAGAGAATA
    CAATTACAACAAGTACCCAACCCGCAATCACCAATAAAAACTAAACAATCTAACAGAGAT
    ATTGTTTACACTAAAGAAACAAAAACATTAAGTAAATTGACCAATGACTCCCATCGTACT
    ACTGTCG
    >lcl|Seq158084 No definition line found
    TCAGCCGCGCAGGTATCTTCTACTACAGTGATGACATATCATTTCCATGTCTTGCAGACT
    CTCTCGCATATACTGACAGTGATATCACTTCCTACTAGCTAGCTGCTACTTGAAACTAAA
    GTTTTACTCTTAAGGTTCTGAAAGATTTAATAGGAACAGTATGTGATTCTCCATAGGATG
    AATTGGTTGCTAATTGAGTCAAATAGTGTCAAAATCAATCAGTTGATCATTCTTCTGCAT
    AACCATTTATGTAGAAATTGACTAAGAAAACAAGTTGCAAGAAGAAAAATAAAGACTTTT
    ACTGGTTTAAGCTTTGTTAGTGTTAGCCCTTTACAAGACACAGGTCTAGTGTTGAACATA
    TTTACTAAGATAATTAGGTTAGACTAAAGATGAGTAGCAAACTCTATTTATTGTGTACCC
    AAAAATGGTAGACTTACAAACGTACCCTAATTAATCATACTCATTCTTCATGTCCTACTT
    ACAAGGTTAAAGAAGTAATAACAATTACAAACTAAACGTAACCTAACCCGACAACTACAA
    CCAATAAAAACTAAAAC
    Last edited by robertorun; 06-25-2010, 12:32 AM. Reason: to see conveniencely

  • #2
    Re: tell me the different between two version cap?

    I have a same problem.

    "A version of CAP3 for a 64-bit Linux system with an Intel processor" has a new doc.

    However, I don’t know how to get a contig from your sequences using a new version cap3...

    $ diff old_doc new_doc

    204a208,211
    > If the option -k 0 is given, then no read end is clipped and
    > the whole read is used in assembly. Otherwise, the following procedure
    > is used to determine and clip poor read ends.
    >
    349a357,391
    > Short reads
    >
    > The default values for some of the parameters are selected for
    > assembly of regular reads of lengths 500 to 1000 bp.
    > For assembly of short reads of lengths 20 bp, the following options
    > should be used to change the values for those parameters accordingly.
    >
    > -i 30 -j 31 -o 18 -s 300
    >
    > Note that using short reads increases the likelihood of producing assemblies
    > with false joins. Below we explain the options for short reads.
    > Overlaps between reads are quickly computed by finding segment pairs
    > (ungapped alignments) and combining segment pairs into chains.
    > The -i option is used to specify a score cutoff on segment pairs.
    > The score of a segment pair with 19 base matches and 1 base mismatch
    > is 2 * 19 + (-5) * 1 = 33, where each base match is given a score of 2
    > and each mismatch is given a score of -5.
    > The -j option is used to specify a score cutoff on chains of segment pairs,
    > where the score of a chain is the sum of scores of each segment pair
    > minus penalties for gaps between segment pairs.
    > The score of a chain consisting of one segment pair is simply the score of
    > the segment pair.
    >
    > After a high scoring chain of segment pairs between two reads is computed,
    > an overlap between the reads is computed as an optimal local alignment
    > between the reads, where the chain is used to limit the
    > computation to a small area of the dynamic programming matrix.
    > Unlike the scores of segment pairs and chains, the score of an overlap
    > is weighted by base quality values. Thus, an overlap
    > with 19 base matches, 1 base mismatch, and 0 gap has a score
    > of 10 * [2 * 19 + (-5) * 1] = 330, assuming that each base
    > has a quality value of 10.
    > The -o option is used to specify a length cutoff on overlaps,
    > whereas the -s option is used to specify a score cutoff on overlaps.
    >
    409a452
    > Sanzhen Liu and Pat Schnable for suggesting the options -i, -j, -k.

    Comment


    • #3
      Hi, your sequences could be assembled:

      $cap3 tmp.fas -y 250 -p 80 -m 4

      The default "-m" value had been changed?
      I don't know it because I've never see the new source cap3.c.

      Comment


      • #4
        Sorry, the -m value of old cap3 is actually 2.
        They have other complex causes...

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        47 views
        0 likes
        Last Post seqadmin  
        Working...
        X