Seqanswers Leaderboard Ad

**maubp** · 09-16-2009, 04:23 AM

Since you said you would like to use Python, I'll just point out that Biopython can parse PHRED and ACE files.

The ACE contig files tell you which reads went into each contig, which sounds like what you want to know.

Peter

**BENM** · 09-16-2009, 04:34 AM

Originally posted by yarri View Post

Hej!

I am working on a data my group received from MWG. They were using, as far as I know 454 sequencing to sequence short 3' fragments of cDNA from two populations. My task is to assemble the data and compare the abundance of the transcript.
I am a beginner, so please excuse me if my questions are silly, here they are:
1. I have fasta and quality files. I feed them to phrap and I am getting an output - so far so good. Now, how can I obtain information on what was merged into the resulting contigs? I need this information to make comparison of transcript abundance. Phrap provides me with a huge report file but I am not sure how to find this information. Ideally I want to automate the process using python scripts - that is run assembly in phrap and parse the output so I can have a table with a contig sequence and number of reads that were used to create it.
2. Is this even possible? Should I perhaps use some other tools?
Thanks in advance.

Best regards
Marian Plaszczyca

Hi, Marian
I have a PERL script here, hope it will help you.
Command: perl phraplist.pl phrap.out > phrap.list

Code:

#!/usr/bin/perl
#phraplist.pl
die "Usage:$0 phrap.out\n" if (@ARGV!=1);
open(PhrapOut, "$ARGV[0]") ||die "could not open $ARGV[0]";
@line=<PhrapOut>;
$real=0;
foreach $hang (@line) {
        if($hang =~/^Contig\s\d+.\s+\d+\s\w+;\s\d+\sbp/ ) {
                $real=1;
        }
        $real=0 if($hang =~/Contig quality (.*):$/ || $hang =~/^Overall discrep rates/);
	$real=0 if($hang=~"Overall");
	print $hang if($real);
}
close(PhrapOut);

The phrap.list contain information as below:

Code:

Contig 1.  7 reads; 685 bp (untrimmed), 653 (trimmed).  Isolated contig.
     -1   682 15_A8-9.ab1   604 (  0)  1.55 0.31 0.00   15 ( 58)   23 ( 23) 
      1   679 22_A8-9.ab1   635 (  0)  0.15 0.30 0.15    0 (  6)   23 ( 19) 
      2   673 11_A8-9_R.ab1  580 (  0)  0.67 0.00 0.17   65 ( 65)    6 ( 15) 
      5   686 10_A8-9.ab1   662 (  0)  0.44 0.15 0.00    2 (  2)    1 ( 27) 
      4   684 21_A8-9.ab1   648 (  0)  0.59 0.15 0.15    7 (  7)    1 ( 24) 
C   139   522 A8-9.ref.scf  381 (  0)  0.00 0.00 0.00    0 (  0)    0 (  0) 
C   352   641 23_A8-9.ab1   120 (  0)  0.00 0.00 0.79  147 (147)   16 ( 16)

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 20 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 26 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 21 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 21 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

Assembly using phrap

Comment

Comment

Latest Articles

ad_right_rmr

News