Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assembly using phrap

    Hej!

    I am working on a data my group received from MWG. They were using, as far as I know 454 sequencing to sequence short 3' fragments of cDNA from two populations. My task is to assemble the data and compare the abundance of the transcript.
    I am a beginner, so please excuse me if my questions are silly, here they are:
    1. I have fasta and quality files. I feed them to phrap and I am getting an output - so far so good. Now, how can I obtain information on what was merged into the resulting contigs? I need this information to make comparison of transcript abundance. Phrap provides me with a huge report file but I am not sure how to find this information. Ideally I want to automate the process using python scripts - that is run assembly in phrap and parse the output so I can have a table with a contig sequence and number of reads that were used to create it.
    2. Is this even possible? Should I perhaps use some other tools?
    Thanks in advance.

    Best regards
    Marian Plaszczyca

  • #2
    Since you said you would like to use Python, I'll just point out that Biopython can parse PHRED and ACE files.

    The ACE contig files tell you which reads went into each contig, which sounds like what you want to know.

    Peter

    Comment


    • #3
      Originally posted by yarri View Post
      Hej!

      I am working on a data my group received from MWG. They were using, as far as I know 454 sequencing to sequence short 3' fragments of cDNA from two populations. My task is to assemble the data and compare the abundance of the transcript.
      I am a beginner, so please excuse me if my questions are silly, here they are:
      1. I have fasta and quality files. I feed them to phrap and I am getting an output - so far so good. Now, how can I obtain information on what was merged into the resulting contigs? I need this information to make comparison of transcript abundance. Phrap provides me with a huge report file but I am not sure how to find this information. Ideally I want to automate the process using python scripts - that is run assembly in phrap and parse the output so I can have a table with a contig sequence and number of reads that were used to create it.
      2. Is this even possible? Should I perhaps use some other tools?
      Thanks in advance.

      Best regards
      Marian Plaszczyca
      Hi, Marian
      I have a PERL script here, hope it will help you.
      Command: perl phraplist.pl phrap.out > phrap.list
      Code:
      #!/usr/bin/perl
      #phraplist.pl
      die "Usage:$0 phrap.out\n" if (@ARGV!=1);
      open(PhrapOut, "$ARGV[0]") ||die "could not open $ARGV[0]";
      @line=<PhrapOut>;
      $real=0;
      foreach $hang (@line) {
              if($hang =~/^Contig\s\d+.\s+\d+\s\w+;\s\d+\sbp/ ) {
                      $real=1;
              }
              $real=0 if($hang =~/Contig quality (.*):$/ || $hang =~/^Overall discrep rates/);
      	$real=0 if($hang=~"Overall");
      	print $hang if($real);
      }
      close(PhrapOut);
      The phrap.list contain information as below:

      Code:
      Contig 1.  7 reads; 685 bp (untrimmed), 653 (trimmed).  Isolated contig.
           -1   682 15_A8-9.ab1   604 (  0)  1.55 0.31 0.00   15 ( 58)   23 ( 23) 
            1   679 22_A8-9.ab1   635 (  0)  0.15 0.30 0.15    0 (  6)   23 ( 19) 
            2   673 11_A8-9_R.ab1  580 (  0)  0.67 0.00 0.17   65 ( 65)    6 ( 15) 
            5   686 10_A8-9.ab1   662 (  0)  0.44 0.15 0.00    2 (  2)    1 ( 27) 
            4   684 21_A8-9.ab1   648 (  0)  0.59 0.15 0.15    7 (  7)    1 ( 24) 
      C   139   522 A8-9.ref.scf  381 (  0)  0.00 0.00 0.00    0 (  0)    0 (  0) 
      C   352   641 23_A8-9.ab1   120 (  0)  0.00 0.00 0.79  147 (147)   16 ( 16)

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Recent Advances in Sequencing Analysis Tools
        by seqadmin


        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
        05-06-2024, 07:48 AM
      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 05-10-2024, 06:35 AM
      0 responses
      20 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-09-2024, 02:46 PM
      0 responses
      26 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-07-2024, 06:57 AM
      0 responses
      21 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-06-2024, 07:17 AM
      0 responses
      21 views
      0 likes
      Last Post seqadmin  
      Working...
      X