Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • crossmatch/phrap: improper assembly

    I've been using phred/phrap/consed for a few months now. I am now encountering a strange failure of the package to do an assembly/alignment

    In my (mini)assembly, I have three contigs (100Kbp, 75Kbp, 900bp) as well as four sanger sequencing reads (from ab1 files). Consed is matching a 152bp region between the two largest somewhat near the ends (55bp from the end of the 75Kb contig, 1766bp from the end of the 100Kb contig) - a region that only has 2 mismatches between the two contigs. It seems somewhat rational to join there, but in no prior assemblies or miniassemblies did it attempt to make this join (pending verification I believe them to be separated by just under 4KB). However, the 900bp contig is being placed as only aligned to 10 consecutive bases of the 75Kb contig, but is embedded therein (near the junction with the largest contig). Three of the Sanger reads are "aligned" to the 75Kbp contig were I don't see ANY significant congruence between the reads and the contig. Further, one of these three reads is offset 16bp from the other two (screenshot). The fourth sanger read is "overlapping" in part of the 2Kb mismatch region between the two largest contigs, but shows no sequence similarity to either at this location. Additionally, no gaps exist in any of the sequences or in the "consensus" in the assembly. In short, sequences are not aligning, nor are gaps being inserted in reads to establish an alignment, yet all are being assembled into a single contig.

    I've used the miniassembly option many times in the past without issue, one thing that changed was I've recently also started using assembly view to merge contigs semi-manually. However, these contigs that I'm attempting to join are unadulterated by the assembly view.

    The phrap command-line the miniassembly runs is, by default,
    /usr/bin/genome/bin/phrap.longreads mini.120907.103603.fasta.screen -new_ace -view -retain_duplicates -trim_qual 14 -trim_start 0 -repeat_stringency .95 -forcelevel 0 -bypasslevel 0 -maxgap 30 -minmatch 14 -minscore 35 -maxmatch 40 -vector_bound 30 -max_subclone_size 8000

    During crossmatch I get this error: NO QUALITY FILE blahOld.contigs.qual WAS FOUND. REMAINING INPUT QUALITIES SET TO 15. Done
    despite all contigs having associated quality values in the screen.ace file and all Sanger sequences converted by p/p/c from .ab1 to both phd and scf.
    Attached Files

  • #2
    Assembly seems to be logical and aligned if I leave out the 100Kb contig.

    Note: I do not have a cross_match.longreads. I re-ran a "make manyreads" in the original download directory which generated a phred, crossmatch, phred.longreads, phred.manyreads and a cross_match.manyreads. There does not appear to be a make rule for "make longreads". should I symlink a cross_match.longreads to the .manyreads version, or would that not be used or be improper? Should I adjust my phred_phrap_longreads script to call an alternate cross_match?

    The odd thing is this has not come up before now. If it's a longread problem, 75K should also be longer than accepted by the standard executable vs longread, nor has the 100Kb ever aligned with the 75Kb in any prior assembly attempts.

    Comment


    • #3
      edited the makefile and adjusted this section
      manyreads:
      touch swat.h;
      make CFLAGS="-O2 -DMANYREADS" phrap cross_match;
      mv phrap phrap.manyreads;
      mv cross_match cross_match.manyreads;
      touch swat.h;
      make CFLAGS="-O2 -DLONGREADS" phrap cross_match;
      mv phrap phrap.longreads;
      mv cross_match cross_match.longreads;
      touch swat.h;
      make phrap cross_match;

      did a diff to confirm that cross_match.longreads differed from the other two, despite the same file size. copied that to my bin directory.

      edited my phred_phrap_crossmatch script to have

      $cross_matchExe = $szConsedHome . "/bin/cross_match.longreads";

      ...still have the problem

      notice the following in my std error:
      UNPOSITIONED READ: 12I12_1108_3511R2_A9_Sep-4-2012.ab1
      UNPOSITIONED READ: 12I12_1108_35ER2_A8_Sep-4-2012.ab1
      UNPOSITIONED READ: Cy6_0335F1800_c35F_C8_May-30-2012.ab1
      UNPOSITIONED READ: contig00235.scf Done
      Total space allocated: 61.488 Mbytes; currently free: 11.130 Mbytes in 61 blocks

      and although the first time cross_match is called, it uses the longreads version, I see further down:
      /usr/bin/genome/bin/cross_match mini.120907.124005.120907_124007.contigs /usr/bin/genome/lib/screenLibs/repeats.fasta -tags -minmatch 10

      and

      /usr/bin/genome/bin/cross_match blahOld.contigs mini.120907.124005New.contigs -minmatch 50 -tags -discrep_lists

      so...had to change consed.fullPathnameOfCrossMatch and additionally edit
      addReads2Consed.perl
      amplifyTranscripts.perl
      findSequenceMatchesForConsed.perl
      phaster2Miniassembly.perl
      phaster2Ace.perl
      tagRepeats.perl
      transferConsensusTags.perl

      now all the calls say .longreads where appropriate. However, the alignment is still broken and has the unpositioned read error.

      I ran: sudo grep "UNPOSITIONED READ" *
      the phrase is in binary files but no scripts: all binary versions of phrap, cross_match and cluster so I can't really diagnose context.
      Last edited by pag; 09-07-2012, 09:19 AM.

      Comment


      • #4
        phrap/phred/consed

        hi,everyone
        i'm a newer, now i have a bacteria genome and want to do gap finishing, and i need the software packages of phrap/phred/consed.could everyone can help me ? my email is [email protected]
        Thanks.

        Comment


        • #5


          you have to e-mail the people concerned with your information that certifies you as an academic user. I think that commercial/industrial users can still get access, but the license then costs money.

          Additionally, polyphred isn't part of the package, so would have to be obtained elsewhere, although the code to use polyphred is already in the phredPhrap script.

          Comment


          • #6
            Originally posted by pag View Post
            Assembly seems to be logical and aligned if I leave out the 100Kb contig.

            Note: I do not have a cross_match.longreads. I re-ran a "make manyreads" in the original download directory which generated a phred, crossmatch, phred.longreads, phred.manyreads and a cross_match.manyreads. There does not appear to be a make rule for "make longreads". should I symlink a cross_match.longreads to the .manyreads version, or would that not be used or be improper? Should I adjust my phred_phrap_longreads script to call an alternate cross_match?

            The odd thing is this has not come up before now. If it's a longread problem, 75K should also be longer than accepted by the standard executable vs longread, nor has the 100Kb ever aligned with the 75Kb in any prior assembly attempts.
            AFAIK there is no longreads/manyreads options anymore ... since years.
            What version of Consed/Phrap/Crossmatch you are running?

            Sven

            Comment


            • #7
              Originally posted by pag View Post
              I've been using phred/phrap/consed for a few months now. I am now encountering a strange failure of the package to do an assembly/alignment

              In my (mini)assembly, I have three contigs (100Kbp, 75Kbp, 900bp) as well as four sanger sequencing reads (from ab1 files). Consed is matching a 152bp region between the two largest somewhat near the ends (55bp from the end of the 75Kb contig, 1766bp from the end of the 100Kb contig) - a region that only has 2 mismatches between the two contigs. It seems somewhat rational to join there, but in no prior assemblies or miniassemblies did it attempt to make this join (pending verification I believe them to be separated by just under 4KB). However, the 900bp contig is being placed as only aligned to 10 consecutive bases of the 75Kb contig, but is embedded therein (near the junction with the largest contig). Three of the Sanger reads are "aligned" to the 75Kbp contig were I don't see ANY significant congruence between the reads and the contig. Further, one of these three reads is offset 16bp from the other two (screenshot). The fourth sanger read is "overlapping" in part of the 2Kb mismatch region between the two largest contigs, but shows no sequence similarity to either at this location. Additionally, no gaps exist in any of the sequences or in the "consensus" in the assembly. In short, sequences are not aligning, nor are gaps being inserted in reads to establish an alignment, yet all are being assembled into a single contig.

              I've used the miniassembly option many times in the past without issue, one thing that changed was I've recently also started using assembly view to merge contigs semi-manually. However, these contigs that I'm attempting to join are unadulterated by the assembly view.

              The phrap command-line the miniassembly runs is, by default,
              /usr/bin/genome/bin/phrap.longreads mini.120907.103603.fasta.screen -new_ace -view -retain_duplicates -trim_qual 14 -trim_start 0 -repeat_stringency .95 -forcelevel 0 -bypasslevel 0 -maxgap 30 -minmatch 14 -minscore 35 -maxmatch 40 -vector_bound 30 -max_subclone_size 8000

              During crossmatch I get this error: NO QUALITY FILE blahOld.contigs.qual WAS FOUND. REMAINING INPUT QUALITIES SET TO 15. Done
              despite all contigs having associated quality values in the screen.ace file and all Sanger sequences converted by p/p/c from .ab1 to both phd and scf.
              You might want to ask on conseed's mailing list (see their webpage) to get help.

              You do not get an error, just a warning, consed uses temp. fasta-converted data,
              without qual file. Nothing to worry about.

              Sven

              Comment


              • #8
                Originally posted by jianweil View Post
                hi,everyone
                i'm a newer, now i have a bacteria genome and want to do gap finishing, and i need the software packages of phrap/phred/consed.could everyone can help me ? my email is [email protected]
                Thanks.
                Please, ... http://lmgtfy.com/?q=get+consed

                Sven

                Comment


                • #9
                  Edit: question: version info: does it follow yymmdd such that 071220 > 020425 > 980806? it's possible I thought that 98 and 99 were much larger than 02, 04 and 07 so I went with the 90s version for phrap.

                  phrap version 0.990329
                  Consed Version 23.0 (120514)
                  phred version: 0.020425.c

                  My phredPhrap script is a modified version of one designed for
                  $szVersion = "030415"; further modified to bring it up to date with
                  $szVersion = "120312";

                  (see last section for more)

                  ===========
                  From Brent Ewing
                  Subject phred distribution
                  Date Tue, Apr 17, 2012 05:51 PM

                  distribution file name
                  ------------ ---------
                  phred phred-dist-020425.c-acd.tar.Z
                  phd2fasta phd2fasta-acd-dist.tar.Z
                  =======
                  From Phil Green
                  Subject phrap/cross_match/swat ver 0.990329 (PROGRAM CODE)
                  Date Fri, Apr 13, 2012 04:42 PM

                  begin 664 distrib.tar.Z
                  =====

                  I just noticed this in one of David Gordon's e-mails that escaped my notice earlier:
                  WORK WITH CONSED

                  You will need to specially request the most recent version of
                  phrap--not the one that you get with a normal request. To request the
                  most recent version of phrap and cross_match (cross_match comes with
                  phrap), send an email to <phil green>, with a Subject line
                  that says "phrap new version request", and an email body that consists
                  of the following two lines (it should be in exactly this format, to be
                  computer readable):

                  Request: phrap ver 1.080721 or later
                  Registered phrap email address:
                  ======
                  So, it looks like I'll need to request the current phrap specifically. I'll make sure to do that before posting my question to the listserv.

                  Edit: phred-dist-071220.c-acd.tar.gz exists in my downloaded programs directory. I have no idea why I'm not using that one. Marked beta? Noted as not stable for my system-type? In any case, I'll attempt to switch to it.
                  Last edited by pag; 09-10-2012, 11:38 AM.

                  Comment


                  • #10
                    Originally posted by pag View Post
                    Edit: question: version info: does it follow yymmdd such that 071220 > 020425 > 980806? it's possible I thought that 98 and 99 were much larger than 02, 04 and 07 so I went with the 90s version for phrap.

                    phrap version 0.990329
                    Consed Version 23.0 (120514)
                    phred version: 0.020425.c
                    Well, yes, update phrap/cross_match; that's what we are using:

                    phrap version 1.090518
                    cross_match version 1.090518

                    And yes, you should use "phred-dist-071220.c-acd", which is the most current version (though marked as "beta"). It works stable here since years ...

                    cheers, Sven

                    Comment


                    • #11
                      The versioning on this really threw me for a loop. I wouldn't expect version numbers to decrease, especially in the immediate aftermath of the Y2K hullabaloo. I would have assumed they would move to 1.000101 at the start of the new millennium if I had realized they were using coded dates. I just naively thought they were actual build/version numbers.

                      The other thing they appear to lack is a list of their various versions and associated dates (and compatibility with other portions of the package), as well as perhaps how many downloads/distributions of that version there have been. Consed has some info on its homepage about past versions, but the other parts of the package do not.
                      Last edited by pag; 09-11-2012, 06:52 AM.

                      Comment


                      • #12
                        received the current version of phrap/crossmatch today. Using my old scripts with that executable, my two long contigs co-align, but the other reads are placed in a separate alignment and are properly positioned.

                        That works for me.

                        Comment


                        • #13
                          gah. spoke too soon. Realized I left out one of the Sanger reads - as soon as I put that read in, all the reads were lumped into a single assembly without proper sub-alignment as before.

                          Making contig sequences ...
                          UNPOSITIONED READ: 12I12_1108_3511R2_A9_Sep-4-2012.ab1
                          UNPOSITIONED READ: 12I12_1108_35ER2_A8_Sep-4-2012.ab1
                          UNPOSITIONED READ: Cy6_0335F1800_c35F_C8_May-30-2012.ab1
                          UNPOSITIONED READ: contig00235.scf Done

                          phred version: 0.071220.c
                          phrap version 1.090518
                          cross_match version 1.090518
                          swat version 1.090518
                          cluster version 1.090518

                          phd2fasta version: 0.990622.f
                          tagRepeats.perl -V: 090209
                          determineReadTypes.perl -V version: 001205
                          transferConsensusTags.perl Version 120312
                          Last edited by pag; 09-13-2012, 08:57 AM.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 06:37 PM
                          0 responses
                          8 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, Yesterday, 06:07 PM
                          0 responses
                          8 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          49 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          67 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X