Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assembly using phrap

    Hej!

    I am working on a data my group received from MWG. They were using, as far as I know 454 sequencing to sequence short 3' fragments of cDNA from two populations. My task is to assemble the data and compare the abundance of the transcript.
    I am a beginner, so please excuse me if my questions are silly, here they are:
    1. I have fasta and quality files. I feed them to phrap and I am getting an output - so far so good. Now, how can I obtain information on what was merged into the resulting contigs? I need this information to make comparison of transcript abundance. Phrap provides me with a huge report file but I am not sure how to find this information. Ideally I want to automate the process using python scripts - that is run assembly in phrap and parse the output so I can have a table with a contig sequence and number of reads that were used to create it.
    2. Is this even possible? Should I perhaps use some other tools?
    Thanks in advance.

    Best regards
    Marian Plaszczyca

  • #2
    Since you said you would like to use Python, I'll just point out that Biopython can parse PHRED and ACE files.

    The ACE contig files tell you which reads went into each contig, which sounds like what you want to know.

    Peter

    Comment


    • #3
      Originally posted by yarri View Post
      Hej!

      I am working on a data my group received from MWG. They were using, as far as I know 454 sequencing to sequence short 3' fragments of cDNA from two populations. My task is to assemble the data and compare the abundance of the transcript.
      I am a beginner, so please excuse me if my questions are silly, here they are:
      1. I have fasta and quality files. I feed them to phrap and I am getting an output - so far so good. Now, how can I obtain information on what was merged into the resulting contigs? I need this information to make comparison of transcript abundance. Phrap provides me with a huge report file but I am not sure how to find this information. Ideally I want to automate the process using python scripts - that is run assembly in phrap and parse the output so I can have a table with a contig sequence and number of reads that were used to create it.
      2. Is this even possible? Should I perhaps use some other tools?
      Thanks in advance.

      Best regards
      Marian Plaszczyca
      Hi, Marian
      I have a PERL script here, hope it will help you.
      Command: perl phraplist.pl phrap.out > phrap.list
      Code:
      #!/usr/bin/perl
      #phraplist.pl
      die "Usage:$0 phrap.out\n" if (@ARGV!=1);
      open(PhrapOut, "$ARGV[0]") ||die "could not open $ARGV[0]";
      @line=<PhrapOut>;
      $real=0;
      foreach $hang (@line) {
              if($hang =~/^Contig\s\d+.\s+\d+\s\w+;\s\d+\sbp/ ) {
                      $real=1;
              }
              $real=0 if($hang =~/Contig quality (.*):$/ || $hang =~/^Overall discrep rates/);
      	$real=0 if($hang=~"Overall");
      	print $hang if($real);
      }
      close(PhrapOut);
      The phrap.list contain information as below:

      Code:
      Contig 1.  7 reads; 685 bp (untrimmed), 653 (trimmed).  Isolated contig.
           -1   682 15_A8-9.ab1   604 (  0)  1.55 0.31 0.00   15 ( 58)   23 ( 23) 
            1   679 22_A8-9.ab1   635 (  0)  0.15 0.30 0.15    0 (  6)   23 ( 19) 
            2   673 11_A8-9_R.ab1  580 (  0)  0.67 0.00 0.17   65 ( 65)    6 ( 15) 
            5   686 10_A8-9.ab1   662 (  0)  0.44 0.15 0.00    2 (  2)    1 ( 27) 
            4   684 21_A8-9.ab1   648 (  0)  0.59 0.15 0.15    7 (  7)    1 ( 24) 
      C   139   522 A8-9.ref.scf  381 (  0)  0.00 0.00 0.00    0 (  0)    0 (  0) 
      C   352   641 23_A8-9.ab1   120 (  0)  0.00 0.00 0.79  147 (147)   16 ( 16)

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      8 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      8 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      49 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      66 views
      0 likes
      Last Post seqadmin  
      Working...
      X