Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assembly using phrap

    Hej!

    I am working on a data my group received from MWG. They were using, as far as I know 454 sequencing to sequence short 3' fragments of cDNA from two populations. My task is to assemble the data and compare the abundance of the transcript.
    I am a beginner, so please excuse me if my questions are silly, here they are:
    1. I have fasta and quality files. I feed them to phrap and I am getting an output - so far so good. Now, how can I obtain information on what was merged into the resulting contigs? I need this information to make comparison of transcript abundance. Phrap provides me with a huge report file but I am not sure how to find this information. Ideally I want to automate the process using python scripts - that is run assembly in phrap and parse the output so I can have a table with a contig sequence and number of reads that were used to create it.
    2. Is this even possible? Should I perhaps use some other tools?
    Thanks in advance.

    Best regards
    Marian Plaszczyca

  • #2
    Since you said you would like to use Python, I'll just point out that Biopython can parse PHRED and ACE files.

    The ACE contig files tell you which reads went into each contig, which sounds like what you want to know.

    Peter

    Comment


    • #3
      Originally posted by yarri View Post
      Hej!

      I am working on a data my group received from MWG. They were using, as far as I know 454 sequencing to sequence short 3' fragments of cDNA from two populations. My task is to assemble the data and compare the abundance of the transcript.
      I am a beginner, so please excuse me if my questions are silly, here they are:
      1. I have fasta and quality files. I feed them to phrap and I am getting an output - so far so good. Now, how can I obtain information on what was merged into the resulting contigs? I need this information to make comparison of transcript abundance. Phrap provides me with a huge report file but I am not sure how to find this information. Ideally I want to automate the process using python scripts - that is run assembly in phrap and parse the output so I can have a table with a contig sequence and number of reads that were used to create it.
      2. Is this even possible? Should I perhaps use some other tools?
      Thanks in advance.

      Best regards
      Marian Plaszczyca
      Hi, Marian
      I have a PERL script here, hope it will help you.
      Command: perl phraplist.pl phrap.out > phrap.list
      Code:
      #!/usr/bin/perl
      #phraplist.pl
      die "Usage:$0 phrap.out\n" if (@ARGV!=1);
      open(PhrapOut, "$ARGV[0]") ||die "could not open $ARGV[0]";
      @line=<PhrapOut>;
      $real=0;
      foreach $hang (@line) {
              if($hang =~/^Contig\s\d+.\s+\d+\s\w+;\s\d+\sbp/ ) {
                      $real=1;
              }
              $real=0 if($hang =~/Contig quality (.*):$/ || $hang =~/^Overall discrep rates/);
      	$real=0 if($hang=~"Overall");
      	print $hang if($real);
      }
      close(PhrapOut);
      The phrap.list contain information as below:

      Code:
      Contig 1.  7 reads; 685 bp (untrimmed), 653 (trimmed).  Isolated contig.
           -1   682 15_A8-9.ab1   604 (  0)  1.55 0.31 0.00   15 ( 58)   23 ( 23) 
            1   679 22_A8-9.ab1   635 (  0)  0.15 0.30 0.15    0 (  6)   23 ( 19) 
            2   673 11_A8-9_R.ab1  580 (  0)  0.67 0.00 0.17   65 ( 65)    6 ( 15) 
            5   686 10_A8-9.ab1   662 (  0)  0.44 0.15 0.00    2 (  2)    1 ( 27) 
            4   684 21_A8-9.ab1   648 (  0)  0.59 0.15 0.15    7 (  7)    1 ( 24) 
      C   139   522 A8-9.ref.scf  381 (  0)  0.00 0.00 0.00    0 (  0)    0 (  0) 
      C   352   641 23_A8-9.ab1   120 (  0)  0.00 0.00 0.79  147 (147)   16 ( 16)

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      25 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      27 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      24 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X