SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bwa color space error rcorbett Bioinformatics 6 06-24-2011 04:21 AM
sequence error detection in color space reads holywoool SOLiD 0 10-11-2010 01:34 AM
bwa color-space index totalnew Bioinformatics 16 04-06-2010 12:49 PM
bwa question: quality discrepancy between a color-space alignment and its csfastq yenhuahuang1 Bioinformatics 4 03-15-2010 07:23 AM
direct mapping of color-space data against color-space begsch SOLiD 1 09-09-2009 10:25 PM

Reply
 
Thread Tools
Old 01-05-2012, 12:14 AM   #1
SOLiDance
Member
 
Location: CAS

Join Date: Jun 2010
Posts: 27
Default bwa:how to align color space reads

Hi,everybody~
This puzzled me for days : I tried to use bwa on SOLiD seq results. But when I finished the manual, couldn't find a in-detail workflow about color space reads alignment. According to some post, I took these steps below:
1 solid2fastq: used the script in the bwa suite(color to double encoded:ACGTN);
2 index the fasta reference,with -c ;
3 bwa aln;
4 bwa samse (my SOLiD reads is fragment library)
5 parse sam , and I found all the beads were Unmapped,But then I used same reads & reference with other tools,such as bioscope , bFast . The results are just fine , thousands of mapped reads.
Then I tried with color space fastq(which means the sequence line is consisted of 1234.), All reads unmapped too~
Maybe this workflow is not suitable? Could anyone please show me how to deal with color space reads with bwa?
Many thanks!
SOLiDance is offline   Reply With Quote
Old 01-05-2012, 01:59 AM   #2
NestorNotabilis
Member
 
Location: Cardiff

Join Date: Dec 2011
Posts: 19
Default

I see you used the -c flag to indicate color-space whilst generating the reference database but did you also use the -c flag with the bwa aln command?

e.g.

bwa aln -c -f <sai output> <ref> <fastq input>

Both the indexing and the aligning require the -c flag. bwa samse, in contrast, does not.


Incidentally, unfortunately as of release 0.6, BWA has dropped color-space support (although the online documentation makes no mention of this) so BWA may no longer be the best mapper to invest time in for the longer term. This is unfortunate given it's usefulness
NestorNotabilis is offline   Reply With Quote
Old 01-05-2012, 04:10 AM   #3
Bukowski
Senior Member
 
Location: UK

Join Date: Jan 2010
Posts: 390
Default

Quote:
Originally Posted by NestorNotabilis View Post
Incidentally, unfortunately as of release 0.6, BWA has dropped color-space support (although the online documentation makes no mention of this) so BWA may no longer be the best mapper to invest time in for the longer term. This is unfortunate given it's usefulness
But this is mentioned in the NEWS file of the release.

Release 0.6.1 (28 November, 2011)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Notable changes to BWA-short:

* Bugfix: duplicated alternative hits in the XA tag.

* Bugfix: when trimming enabled, bwa-aln trims 1bp less.

* Disabled the color-space alignment. 0.6.x is not working with SOLiD reads at
present.


Which is a timely reminder to read all the documentation, and not just what is on potentially infrequently updated web pages
Bukowski is offline   Reply With Quote
Old 01-05-2012, 04:46 AM   #4
kexin
Junior Member
 
Location: China

Join Date: Jan 2012
Posts: 1
Wink

Hi everyone.As we know Bowtie is a software in which we need edit. I want to know if there is a software we don't need edit to map reads to map billions of short reads onto genomes. ThanK you
kexin is offline   Reply With Quote
Old 01-05-2012, 05:53 PM   #5
SOLiDance
Member
 
Location: CAS

Join Date: Jun 2010
Posts: 27
Default

Quote:
Originally Posted by NestorNotabilis View Post
I see you used the -c flag to indicate color-space whilst generating the reference database but did you also use the -c flag with the bwa aln command?

e.g.

bwa aln -c -f <sai output> <ref> <fastq input>

Both the indexing and the aligning require the -c flag. bwa samse, in contrast, does not.


Incidentally, unfortunately as of release 0.6, BWA has dropped color-space support (although the online documentation makes no mention of this) so BWA may no longer be the best mapper to invest time in for the longer term. This is unfortunate given it's usefulness
Thanks for yr help! Actually, I used the -c ,even tried -n 3 or -n4 when proceed bwa aln.Sorry for forget to mention it~
I checked my bwa version, it's 0.6.1, maybe here is the reason,what a shame~
SOLiDance is offline   Reply With Quote
Old 01-05-2012, 06:06 PM   #6
SOLiDance
Member
 
Location: CAS

Join Date: Jun 2010
Posts: 27
Default

Quote:
Originally Posted by Bukowski View Post
But this is mentioned in the NEWS file of the release.

Release 0.6.1 (28 November, 2011)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Notable changes to BWA-short:

* Bugfix: duplicated alternative hits in the XA tag.

* Bugfix: when trimming enabled, bwa-aln trims 1bp less.

* Disabled the color-space alignment. 0.6.x is not working with SOLiD reads at
present.


Which is a timely reminder to read all the documentation, and not just what is on potentially infrequently updated web pages
Thanks for the info. I indeed Not noticed there's a NEWS file~ shoot,My fault!
SOLiDance is offline   Reply With Quote
Old 01-09-2012, 04:31 AM   #7
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

I'm not too sure bwa was working too well with colour space data.

Just a brief result from some trial 60bp exome data alignments to hg19 with default settings:

bioscope - 85 % reads mapped (albeit with iterative read trimming)
bwa ~ 40 %
bowtie ~ 33%
NovoalignCS ~59%

Now I know there is a lot of optimisation to be done but the raw results are extremely diverse
colindaven is offline   Reply With Quote
Old 01-11-2012, 05:14 PM   #8
SOLiDance
Member
 
Location: CAS

Join Date: Jun 2010
Posts: 27
Default

Quote:
Originally Posted by colindaven View Post
I'm not too sure bwa was working too well with colour space data.

Just a brief result from some trial 60bp exome data alignments to hg19 with default settings:

bioscope - 85 % reads mapped (albeit with iterative read trimming)
bwa ~ 40 %
bowtie ~ 33%
NovoalignCS ~59%

Now I know there is a lot of optimisation to be done but the raw results are extremely diverse
emm~ Me too,bioscope can always get obvious higher map rate, I doubt maybe it contains more false positive mapped reads
SOLiDance is offline   Reply With Quote
Old 06-08-2012, 09:41 AM   #9
kbhit
Junior Member
 
Location: Philadelphia

Join Date: Sep 2011
Posts: 9
Default

Be careful, bioscope/lifescope can be misleading on the mapping rate if you're not careful what you look at. If you look at the main summary, it always seems really high. But look at the SAM file it generates and do your own calculation. Most of the time, it maps about half as much as BWA does. Lifescope does give the 'real' stats but you have to dig much deeper to get it - it's highly misleading.
kbhit is offline   Reply With Quote
Old 09-04-2012, 07:28 PM   #10
gigigou
Member
 
Location: Nanjing,CHINA

Join Date: May 2012
Posts: 31
Default

Quote:
Originally Posted by kexin View Post
Hi everyone.As we know Bowtie is a software in which we need edit. I want to know if there is a software we don't need edit to map reads to map billions of short reads onto genomes. ThanK you
What do you mean by "edit"?
As far as I know, in my opinion, bowtie is the easiest to use among all the align tools I have used
gigigou is offline   Reply With Quote
Old 01-07-2013, 12:04 PM   #11
JeremyDay
Registered Vendor
 
Location: San Diego

Join Date: Feb 2012
Posts: 25
Default Solid mapping

Quote:
Originally Posted by kbhit View Post
Be careful, bioscope/lifescope can be misleading on the mapping rate if you're not careful what you look at. If you look at the main summary, it always seems really high. But look at the SAM file it generates and do your own calculation. Most of the time, it maps about half as much as BWA does. Lifescope does give the 'real' stats but you have to dig much deeper to get it - it's highly misleading.
KBhit- Do you mind elaborating on this? I have searched and searched for a better way to map Solid data. When we use Lifescope compared to something like Bowtie, its a difference of 90% and 60%. No one seems to be getting better than 60% mappability with Solid Colorspace, and Lifescope always reports higher. Do you believe Lifescope is misrepresenting it's metrics somehow?

Does anyone have suggestions for the best way to Map Solid data without tossing tons of reads?
JeremyDay is offline   Reply With Quote
Old 01-07-2013, 01:41 PM   #12
kbhit
Junior Member
 
Location: Philadelphia

Join Date: Sep 2011
Posts: 9
Default

Hi Jeremy,
I found that the stats that Lifescope can be misleading. Instead, when I compare it with other aligners like BWA and Shrimp (I like Shrimp2 a lot), I calculate the Lifescope mapping percentage manually. To do this I use (uniquely mapped reads / total starting reads ). In order to get the numerator, I look at the raw SAM file that Lifescope produces to get that value (rather than looking at their automatic report).

Something like:

cat <Lifescope's output sam file> |
grep -v "^@.. " | # remove headers
awk '{if (and($2, 4) == 0) print}' | # mapped
wc -l | # get the total count

I can't remember off hand but you may want to remove the ones with mapping qualities of 0.

If you need more information please let me know and I'll dig a little more
kbhit is offline   Reply With Quote
Old 01-07-2013, 02:55 PM   #13
JeremyDay
Registered Vendor
 
Location: San Diego

Join Date: Feb 2012
Posts: 25
Default

Thanks kbhit! I'll have to take a look at this.

Do you mind if I ask what you are getting for mappability using Shrimp2?

I appreciate your input. Finding a proper pipeline for Solid data is becoming a daunting task. If we use Lifescope (and we havent looked thoroughly), our Bioinformaticians' initial thoughts are similar to yours, and/or they believe that it is low quality mapping. If we use something like Bowtie, it brings are mapping to 65% and below. That's a lot of wasted reads that potentially could be meaningful data. With Wildfire data we are down in the 40's with Bowtie, and Lifescope is still almost 90%.
JeremyDay is offline   Reply With Quote
Old 01-08-2013, 06:27 AM   #14
kbhit
Junior Member
 
Location: Philadelphia

Join Date: Sep 2011
Posts: 9
Default

Hi Jeremy,
We normally get about 55% mappability on good quality long RNA using Shrimp2. Prior to calling Shrimp2 I use the latest version of cutadapt to do quality trimming (q of 15 normally) - this helps boost the quality. For us, when compared to Lifescope (calculated manually), Shrimp unusually performs better with regards to uniq mapping percentage.

Also, for COLORSPACE, be careful when using BWA & Bowtie, they don't handle color space correctly (which might be why there latest versions may be abandoning support for it). It's definitely trickier to handle color-space and it takes more brainpower to get it right. For example, they aren't able to work with the first and last nt of the read which lowers specificity. Crossover handling can also be problematic there. Shrimp and Lifescope don't' have those problems.
kbhit is offline   Reply With Quote
Reply

Tags
bwa align color solid

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:49 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO