SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How I find not assembly read in a reference assembly??? matiasfreired Bioinformatics 1 04-05-2012 12:13 PM
problems when using the software Phrap and consed dingkai0564 454 Pyrosequencing 2 08-29-2011 12:01 AM
de novo assembly vs. reference assembly fadista General 3 02-15-2011 11:11 PM
Using PHRAP to assemble 454 contigs and Sanger reads cleoho175 Bioinformatics 8 11-24-2010 11:45 AM
Calculate phrap quality with two 3730 reads anyone1985 Bioinformatics 0 03-11-2010 05:50 PM

Reply
 
Thread Tools
Old 09-16-2009, 03:39 AM   #1
yarri
Junior Member
 
Location: Stockholm, Sweden

Join Date: Sep 2009
Posts: 1
Default Assembly using phrap

Hej!

I am working on a data my group received from MWG. They were using, as far as I know 454 sequencing to sequence short 3' fragments of cDNA from two populations. My task is to assemble the data and compare the abundance of the transcript.
I am a beginner, so please excuse me if my questions are silly, here they are:
1. I have fasta and quality files. I feed them to phrap and I am getting an output - so far so good. Now, how can I obtain information on what was merged into the resulting contigs? I need this information to make comparison of transcript abundance. Phrap provides me with a huge report file but I am not sure how to find this information. Ideally I want to automate the process using python scripts - that is run assembly in phrap and parse the output so I can have a table with a contig sequence and number of reads that were used to create it.
2. Is this even possible? Should I perhaps use some other tools?
Thanks in advance.

Best regards
Marian Plaszczyca
yarri is offline   Reply With Quote
Old 09-16-2009, 04:23 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Since you said you would like to use Python, I'll just point out that Biopython can parse PHRED and ACE files.

The ACE contig files tell you which reads went into each contig, which sounds like what you want to know.

Peter
maubp is offline   Reply With Quote
Old 09-16-2009, 04:34 AM   #3
BENM
Member
 
Location: PRC

Join Date: May 2009
Posts: 33
Default

Quote:
Originally Posted by yarri View Post
Hej!

I am working on a data my group received from MWG. They were using, as far as I know 454 sequencing to sequence short 3' fragments of cDNA from two populations. My task is to assemble the data and compare the abundance of the transcript.
I am a beginner, so please excuse me if my questions are silly, here they are:
1. I have fasta and quality files. I feed them to phrap and I am getting an output - so far so good. Now, how can I obtain information on what was merged into the resulting contigs? I need this information to make comparison of transcript abundance. Phrap provides me with a huge report file but I am not sure how to find this information. Ideally I want to automate the process using python scripts - that is run assembly in phrap and parse the output so I can have a table with a contig sequence and number of reads that were used to create it.
2. Is this even possible? Should I perhaps use some other tools?
Thanks in advance.

Best regards
Marian Plaszczyca
Hi, Marian
I have a PERL script here, hope it will help you.
Command: perl phraplist.pl phrap.out > phrap.list
Code:
#!/usr/bin/perl
#phraplist.pl
die "Usage:$0 phrap.out\n" if (@ARGV!=1);
open(PhrapOut, "$ARGV[0]") ||die "could not open $ARGV[0]";
@line=<PhrapOut>;
$real=0;
foreach $hang (@line) {
        if($hang =~/^Contig\s\d+.\s+\d+\s\w+;\s\d+\sbp/ ) {
                $real=1;
        }
        $real=0 if($hang =~/Contig quality (.*):$/ || $hang =~/^Overall discrep rates/);
	$real=0 if($hang=~"Overall");
	print $hang if($real);
}
close(PhrapOut);
The phrap.list contain information as below:

Code:
Contig 1.  7 reads; 685 bp (untrimmed), 653 (trimmed).  Isolated contig.
     -1   682 15_A8-9.ab1   604 (  0)  1.55 0.31 0.00   15 ( 58)   23 ( 23) 
      1   679 22_A8-9.ab1   635 (  0)  0.15 0.30 0.15    0 (  6)   23 ( 19) 
      2   673 11_A8-9_R.ab1  580 (  0)  0.67 0.00 0.17   65 ( 65)    6 ( 15) 
      5   686 10_A8-9.ab1   662 (  0)  0.44 0.15 0.00    2 (  2)    1 ( 27) 
      4   684 21_A8-9.ab1   648 (  0)  0.59 0.15 0.15    7 (  7)    1 ( 24) 
C   139   522 A8-9.ref.scf  381 (  0)  0.00 0.00 0.00    0 (  0)    0 (  0) 
C   352   641 23_A8-9.ab1   120 (  0)  0.00 0.00 0.79  147 (147)   16 ( 16)
BENM is offline   Reply With Quote
Reply

Tags
454, assembly, transcript abundance

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:08 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO