SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bfast postprocess "AlignedEntryGetAlignment" error sdvie Bioinformatics 4 03-02-2016 09:13 AM
-r argument from Bfast postprocess david.tamborero Bioinformatics 10 08-22-2012 04:48 PM
BFAST memory error in postprocess Marisa_Miller Bioinformatics 3 09-18-2010 04:35 AM
bfast localalign -U option? Protaeus Bioinformatics 2 09-17-2010 11:50 AM
bfast postprocess error m_elena_bioinfo Bioinformatics 0 11-11-2009 08:45 AM

Reply
 
Thread Tools
Old 04-17-2010, 10:47 AM   #1
dmurdock
Junior Member
 
Location: texas

Join Date: Mar 2010
Posts: 9
Default bfast postprocess -U option

In the latest version of bfast (0.6.4b) does anyone have any experience with the -U option in the postprocess command? In the previous version it seems this wasn't present and I just used the standard -a 3 -O 3 options. How is the output different using -U? Also the bfast guide example commands need to be updated in that -O 3 should be replaced with -O 1 for sam output. Thanks!

David Murdock
Baylor College of Medicine
dmurdock is offline   Reply With Quote
Old 04-17-2010, 11:43 AM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by dmurdock View Post
In the latest version of bfast (0.6.4b) does anyone have any experience with the -U option in the postprocess command? In the previous version it seems this wasn't present and I just used the standard -a 3 -O 3 options. How is the output different using -U? Also the bfast guide example commands need to be updated in that -O 3 should be replaced with -O 1 for sam output. Thanks!

David Murdock
Baylor College of Medicine
Great catch in the manual! The typo is fixed in the latest source and thus will be there in the next release.

You caught the "silent" upgrade . If you use the "-U" option, the output will be the same as prior versions. Without the "-U" option (now the default), the alignments for each end of paired end (mate pairs) will be selected such that the empirical insert size distribution, as well as inversion ratio, will be taken into account. This allows for ambiguous reads (two or more equally likely alignments) to be anchored by their mate. I have seen this helps improve both power and accuracy.

A next step is to add a feature that for unpaired reads (one end maps, the other doesn't) will examine the nearby region implied by the mate pair. This may be quite expensive, especially for color space and/or gapped alignment, but I have seen it successfully used in Novoalign and BWA to improve mapping power while preserving accuracy.
nilshomer is offline   Reply With Quote
Old 04-19-2010, 04:46 AM   #3
eyalbd
Member
 
Location: Hebrew University in Jerusalem

Join Date: Apr 2010
Posts: 11
Default

Hi, I have a related comment and also a question. First, the entire postprocess command example in the manual in the SOLiD section 7.1.2 is outdated, and it would be great if you could update it.
Second, I noticed that postprocess has a -A 1 option for color space (which is not even mentioned in the book but exists in the help output). Should this be used in SOLiD alignments, or is the output of the align command already in NT space.

Thanks for this awesome tool,
Eyal
eyalbd is offline   Reply With Quote
Old 04-19-2010, 09:07 AM   #4
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by eyalbd View Post
Hi, I have a related comment and also a question. First, the entire postprocess command example in the manual in the SOLiD section 7.1.2 is outdated, and it would be great if you could update it.
Second, I noticed that postprocess has a -A 1 option for color space (which is not even mentioned in the book but exists in the help output). Should this be used in SOLiD alignments, or is the output of the align command already in NT space.

Thanks for this awesome tool,
Eyal
Here's where "Release early release often" allows documentation to be out of date. The latest git master branch has an update manual.

The "-A" option should be set wherever possible. It now is included in the "postprocess" step (version 0.6.4* and onwards). I apologize for its sudden inclusion and ambiguity.
nilshomer is offline   Reply With Quote
Old 04-20-2010, 11:51 PM   #5
eyalbd
Member
 
Location: Hebrew University in Jerusalem

Join Date: Apr 2010
Posts: 11
Default

Thanks Nils.

I performed the postprocess as written in the old manual (except for -A 1), I hope there aren't more changes.
The ouput SAM file is about twice as large as I get with BWA or bowtie, even though the number of alignments is similar. Is BFAST outputting also nonaligned reads into the sam file?
What could cause this size difference?
eyalbd is offline   Reply With Quote
Old 04-21-2010, 12:16 AM   #6
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by eyalbd View Post
Thanks Nils.

I performed the postprocess as written in the old manual (except for -A 1), I hope there aren't more changes.
The ouput SAM file is about twice as large as I get with BWA or bowtie, even though the number of alignments is similar. Is BFAST outputting also nonaligned reads into the sam file?
What could cause this size difference?
All reads, aligned and unaligned, should be present in the SAM/BAM file (let me know if they are not). There are also a fair number of optional SAM tags in each alignment record that BWA/bowtie may or may not also produce. These tags are used to annotate each alignment and can help downstream tools. If you don't want to store the optional tags, you can always rip them out using awk/perl (they always appear as the last N columns etc.). Remember to convert your SAM file to BAM for good compaction and compression, as well as fast record retrieval.
nilshomer is offline   Reply With Quote
Old 04-22-2010, 06:14 AM   #7
eyalbd
Member
 
Location: Hebrew University in Jerusalem

Join Date: Apr 2010
Posts: 11
Default

Thanks, I think I managed to get it to work. I think the alignment worked well, although I'm having troubles trying to call SNPs with pileup for the results. The problem is that every base it called as a deletion. This could also explain why I got such a huge SAM file, as well as the reason why now pileup is taking hours to work.
Thanks again for all your help!

Last edited by eyalbd; 04-22-2010 at 06:22 AM.
eyalbd is offline   Reply With Quote
Old 04-22-2010, 08:46 AM   #8
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by eyalbd View Post
Thanks, I think I managed to get it to work. I think the alignment worked well, although I'm having troubles trying to call SNPs with pileup for the results. The problem is that every base it called as a deletion. This could also explain why I got such a huge SAM file, as well as the reason why now pileup is taking hours to work.
Thanks again for all your help!
Did you try running the "samtools.pl varFilter" found in the "misc" directory of samtools (remember to adjust based on coverage)? Try filtering based on SNP quality.
nilshomer is offline   Reply With Quote
Old 04-22-2010, 11:32 AM   #9
eyalbd
Member
 
Location: Hebrew University in Jerusalem

Join Date: Apr 2010
Posts: 11
Default

Quote:
Originally Posted by nilshomer View Post
Did you try running the "samtools.pl varFilter" found in the "misc" directory of samtools (remember to adjust based on coverage)? Try filtering based on SNP quality.
Thanks, I'll try that. However, as every base, even those called like the reference, is called as a deletion (meaning, if I understand correctly, that it aligned it to the correct spot on the reference but as if it came sooner), the problem would seem to be more profound. My coverage, for some reason, starts from base 2 in the reference, not base 1. As the mitochondria genome is circular, could this lead to an artifact which also confuses the alignment? It's highly improbable I really don't have coverage for this base, as I have a lot of coverage for the adjacent bases and the mitochondrial genome is, again, circular.

Another problem with the output SAM I get from BFAST seems to be that tview can't view it, for some reason (it work for me on Bowtie or BWA output with the same reference).

Thanks
Eyal
eyalbd is offline   Reply With Quote
Old 04-22-2010, 12:22 PM   #10
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by eyalbd View Post
Thanks, I'll try that. However, as every base, even those called like the reference, is called as a deletion (meaning, if I understand correctly, that it aligned it to the correct spot on the reference but as if it came sooner), the problem would seem to be more profound. My coverage, for some reason, starts from base 2 in the reference, not base 1. As the mitochondria genome is circular, could this lead to an artifact which also confuses the alignment? It's highly improbable I really don't have coverage for this base, as I have a lot of coverage for the adjacent bases and the mitochondrial genome is, again, circular.
I would be happy to take a look if you want to give me the reads and/or the SAM/BAM file.


Quote:
Originally Posted by eyalbd View Post
Another problem with the output SAM I get from BFAST seems to be that tview can't view it, for some reason (it work for me on Bowtie or BWA output with the same reference).

Thanks
Eyal
Don't use tview out of samtools since it doesn't fully support the specification (with respect to indels). You can use IGV out of the broad, which I use daily remotely as not to have to download each BAM from our servers.
nilshomer is offline   Reply With Quote
Reply

Tags
bfast

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:36 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO