SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
TopHat Error: Could not find Bowtie index files /bowtie-0.12.5/indexes/. rebrendi Bioinformatics 11 06-22-2016 09:55 AM
bowtie index problem (bowtie-build and then bowtie-inspect) tgenahmet Bioinformatics 4 09-10-2013 11:51 AM
problem with bowtie input file StephaniePi83 Bioinformatics 0 10-23-2011 11:50 PM
Bowtie colorspace output has less NT than input reads SongLi Bioinformatics 5 01-20-2011 07:39 AM
maximum number of the reference input files for bowtie-build joseph Bioinformatics 2 09-04-2010 02:09 PM

Reply
 
Thread Tools
Old 11-09-2009, 07:28 AM   #1
Layla
Member
 
Location: London

Join Date: Sep 2008
Posts: 58
Default BOWTIE input

Does bowtie need to have reads in the standard sanger format or can it accept the default file created from the 1.4 illumina pipeline in which the quals are not standard sanger?

Cheers

L
Layla is offline   Reply With Quote
Old 11-09-2009, 07:36 AM   #2
dawe
Senior Member
 
Location: 4530'25.22"N / 915'53.00"E

Join Date: Apr 2009
Posts: 258
Default

If you specify '--phred64-quals' or '--solexa1.3-quals' option you can use thos illumina reads without conversion

d
dawe is offline   Reply With Quote
Old 11-09-2009, 07:43 AM   #3
Layla
Member
 
Location: London

Join Date: Sep 2008
Posts: 58
Default

Just seen it!

Thanks

L
Layla is offline   Reply With Quote
Old 11-10-2009, 11:23 PM   #4
tujchl
Member
 
Location: BEIJING, CHINA

Join Date: Sep 2009
Posts: 74
Default

I try data directlly from solexa without '--phred64-quals' or '--solexa1.3-quals' option. and the output looks well.
tujchl is offline   Reply With Quote
Old 11-11-2009, 12:02 AM   #5
dawe
Senior Member
 
Location: 4530'25.22"N / 915'53.00"E

Join Date: Apr 2009
Posts: 258
Default

Quote:
Originally Posted by tujchl View Post
I try data directlly from solexa without '--phred64-quals' or '--solexa1.3-quals' option. and the output looks well.
Of course you can, but in that case you're probably estimating base qualities in a wrong way... I guess low quality bases are overestimated by a ~1000x factor...
dawe is offline   Reply With Quote
Old 11-11-2009, 12:52 AM   #6
tujchl
Member
 
Location: BEIJING, CHINA

Join Date: Sep 2009
Posts: 74
Default

hi dawe
thank you for you replying, I just have two more questions
1. what do you mean by "overestimated by a ~1000x factor", could you please explain in detail?
2. I just test bowtie and it`s my feeling that bowtie do NOT use quality while running. so the quality control could been done before bowtie.
Thank you in advance
tujchl is offline   Reply With Quote
Old 11-11-2009, 01:03 AM   #7
dawe
Senior Member
 
Location: 4530'25.22"N / 915'53.00"E

Join Date: Apr 2009
Posts: 258
Default

Quote:
Originally Posted by tujchl View Post
hi dawe
thank you for you replying, I just have two more questions
1. what do you mean by "overestimated by a ~1000x factor", could you please explain in detail?
phred-33 and phred-64 scores are different by a 31 offset in ASCII code. As this code is -10log10(p) (plus the offset), a difference in 30 is a difference in 1000x on probability values. The worst illumina score is "@" which means (and correct me if I'm wrong) p = 1. In a Sanger framework 64 is p~0.001 which is 1000x smaller.
For qualities in the "mid-range" the difference is not relevant.

Quote:
Originally Posted by tujchl View Post
2. I just test bowtie and it`s my feeling that bowtie do NOT use quality while running. so the quality control could been done before bowtie.
That's probably because you have lot of good quality reads, AFAIK bowtie uses qualities (I wonder why Ben included the phred33/phred64 option after all).
dawe is offline   Reply With Quote
Old 11-11-2009, 02:28 AM   #8
Layla
Member
 
Location: London

Join Date: Sep 2008
Posts: 58
Cool

From looking into Bowtie's defaults --phred33 -quals is "on" and hence assumes you are providing reads in the standard sanger format (phred33). If you are providing data with quality scores in phred64 you should specify --phred64 -quals which is "off" by default. --solexa1.3 -quals is a good option which assumes you are providing unconverted data from the solexa GA 1.3 pipeline or later.

Alternatively you could use maq to convert the reads from phred64 to phred33 and simply put this through bowtie using bowtie's defaults!

Hope this helps

L

p.s A slight digression - I cannot unzip the hg18 version of the pre-built index h_sapiens_asm.ebwt.zip. I tried both part 1, part 2 and the entire genome, but I get an error saying
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive.

Any ideas?
Layla is offline   Reply With Quote
Old 11-11-2009, 04:53 PM   #9
tujchl
Member
 
Location: BEIJING, CHINA

Join Date: Sep 2009
Posts: 74
Default

thank you dawe:
tow more questions:
1. accordding to your words, Can I consider that bowtie indeed ues the quality and filter some reads that can not pass?
2. where can I get the ASCII code of phred64 and phred33?

and thank Layla for your suggestions and poster this thread
I build my human genome index by myself for I don`t have so powerful computer that I build index chr by chr and run chr by chr ........
tujchl is offline   Reply With Quote
Old 11-12-2009, 01:02 AM   #10
dawe
Senior Member
 
Location: 4530'25.22"N / 915'53.00"E

Join Date: Apr 2009
Posts: 258
Default

Quote:
Originally Posted by Layla View Post
p.s A slight digression - I cannot unzip the hg18 version of the pre-built index h_sapiens_asm.ebwt.zip. I tried both part 1, part 2 and the entire genome, but I get an error saying
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive.

Any ideas?
Try to index your own genome. I'm dowloading the ebwt right now but it will take more than indexing (at least here...).
BTW, you should ask bowtie webmaster the md5sum for the zip files.
dawe is offline   Reply With Quote
Old 11-12-2009, 01:08 AM   #11
dawe
Senior Member
 
Location: 4530'25.22"N / 915'53.00"E

Join Date: Apr 2009
Posts: 258
Default

Quote:
Originally Posted by tujchl View Post
1. accordding to your words, Can I consider that bowtie indeed ues the quality and filter some reads that can not pass?
You should ask bowtie developers, but AFAIK bowtie doesn't apply quality filters *before* the alignment. Base quality is used at alignment time to score mismatches.

Quote:
Originally Posted by tujchl View Post
2. where can I get the ASCII code of phred64 and phred33?
Code:
man ascii
look at the decimal set.
dawe is offline   Reply With Quote
Old 11-12-2009, 01:17 AM   #12
svl
Member
 
Location: Netherlands

Join Date: Sep 2009
Posts: 43
Default perl script comparison table

Quote:
Originally Posted by tujchl View Post
2. where can I get the ASCII code of phred64 and phred33?
If you run the perl code below, you'll see a table with a comparison.


Code:
#!/usr/bin/env perl
################################################
# prints a table with phred, ASCII, phred+33, phred+64, p
################################################
use strict;
use warnings;

my @phreds = (0..62);
my $step = 2;

printf "%6s  %6s  %6s  %6s  %10s\n", 'phred', 'ASCII', 'Ill33', 'Ill64', 'p'; 

for (my $i = 0; $i < @phreds; $i+=$step ){
   my $phred = $phreds[$i];
   printf "%6d  %6d  %6s  %6s  %10f\n", $phred, $phred+64, chr($phred+33), chr($phred+64), phred2p($phred);
}

sub phred2p{
   return 10 ** (-(shift) / 10.0 );
}
svl is offline   Reply With Quote
Old 11-12-2009, 11:05 PM   #13
tujchl
Member
 
Location: BEIJING, CHINA

Join Date: Sep 2009
Posts: 74
Default

Thank all of you, I learned lots from you.
and two more questions:
1. when I used data directly from solexa as bowtie input, should I specify "--phred64" or "--solexa1.3" or both?
2. when I used option "--concise" to save my disk space and the output is like this
1-:<0,2852852,1>
and there is 0 other than my ref_index name !!! (I build my ref_index chr by chr and run bowtie chr by chr as well), could you please tell me how to get my ref_index name?
(ref_index name such as "chr1" wiil be back if I run bowtie without --concise ).
tujchl is offline   Reply With Quote
Old 11-12-2009, 11:34 PM   #14
dawe
Senior Member
 
Location: 4530'25.22"N / 915'53.00"E

Join Date: Apr 2009
Posts: 258
Default

Quote:
Originally Posted by tujchl View Post
Thank all of you, I learned lots from you.
and two more questions:
1. when I used data directly from solexa as bowtie input, should I specify "--phred64" or "--solexa1.3" or both?
As stated in the bowtie help

Code:
--phred64-quals    input quals are Phred+64 (same as --solexa1.3-quals)
They are synonyms.

Quote:
Originally Posted by tujchl View Post
2. when I used option "--concise" to save my disk space and the output is like this
1-:<0,2852852,1>
Sorry, I can't help. To save space and get valuable information from my results I keep all in BAM format (directly from bowtie output).
dawe is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:45 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO