SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Illumina/Solexa (http://seqanswers.com/forums/forumdisplay.php?f=6)
-   -   bowtie command line for Illumina Hiseq 2000 with Illumina 1.5+ quality encoding files (http://seqanswers.com/forums/showthread.php?t=14404)

rworthi 09-28-2011 08:25 AM

bowtie command line for Illumina Hiseq 2000 with Illumina 1.5+ quality encoding files
 
I am trying to align some paired end exome reads with bowti, but despite using the bow tie -q" switch to recognize fast file, I keep getting the error:

./bowtie -q -n -r -S -t -p 2 -I 0 -X 100 -ff --solexa1.3-quals indexes/hg19 -1 ~/Otogenetics_final_exome-original-110707_I327_FCB0AMFABXX_L3_index11_1.fastq.gz -2 ~/Otogenetics_final_exome-original-110707_I327_FCB0AMFABXX_L3_index11_2.fastq.gz bowtie.exome.sam
Time loading reference: 00:00:10
Time loading forward index: 00:00:18
Time loading mirror index: 00:00:14
Error: reads file does not look like a FASTA file
Seeded quality full-index search: 00:00:00
Time searching: 00:00:42
Overall time: 00:00:42

Same error when the files are unzipped. The fastq data look like:


@FCB0AMFABXX:3:1101:1497:2086#GGCTAAAT/1
TTACTTGCAAAGGAAAGACAATTTTGCATTACATGGGAACCAGCACATTTTCTGAATACACAGTTGTGGCTGGTATCTCTGTTGCTAAAA
+
gggggggdggdggdggggggggggggdggggggggggeggggggggegggegggeggggggeggggdefga`BBBBBBBBBBBBBBBBBB
@FCB0AMFABXX:3:1101:1378:2107#GGCTACAT/1
AATGCAGGAAGTGCTTCAAGGGAAATTTCTCACATTAAAAGCATATATTAGAACATTCAGGAGATCTAAAAGTCAATGCTCTAAGTTTCC
+
gggggggggggdggggggggggdggggggggdggggggggggggegdggdggfgegggefgcgeeegdacegbggfggfagggdgbfceT

Any ideas why bowtie will not accept the fastq format from Illumina Hiseq 2000 output?

fkrueger 09-28-2011 08:59 AM

There seem to be quite a few things going wrong here. From the error message "Error: reads file does not look like a FASTA file" you can see that Bowtie expects Fasta sequences, and not FastQ. My guess is that the switch "-ff" does that. Did you mean to use "--ff"?

A few other things:
-q is default already, no need to specify
-n expects a value between 0 and 3, default is 2
-r and --solexa1.3-quals are conflicting. Your reads are not in Raw format, so simply use --solexa1.3-quals
-I 0 is default so not needed
-X 100 seems to be a very low value, what was the read length in your experiment?
- Did you really mean to use the --ff option?
From the Bowtie manual:
--ff The upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand. E.g., if --fr is specified and there is a candidate paired-end alignment where mate1 appears upstream of the reverse complement of mate2 and the insert length constraints are met, that alignment is valid. Also, if mate2 appears upstream of the reverse complement of mate1 and all other constraints are met, that too is valid. --rf likewise requires that an upstream mate1 be reverse-complemented and a downstream mate2 be forward-oriented. --ff requires both an upstream mate1 and a downstream mate2 to be forward-oriented. Default: --fr when -C (colorspace alignment) is not specified, --ff when -C is specified.

-Bowtie does not read input in .gz format (at least not yet)
And as a final remark, if you want to align mRNA reads you might want to use an aligner which can cope with exon-spanning reads, such as TopHat.

Hope this helps.

rworthi 09-28-2011 09:25 AM

thanks very much for all that great advice! The read length is 90 bp. I made your modifications and shortened the read file names (and unzipped them) and now get this error, which I am sure is progress:


pgx:bowtie-0.12.7 rworthi$ ./bowtie -n 2 -S -t -p 2 -X 100 --solexa1.3-quals indexes/hg19 -1 ~/oto-L3_1.fq -2 ~/oto-L3_2.fq bowtie.exome.sam
Time loading reference: 00:00:09
Time loading forward index: 00:00:14
Time loading mirror index: 00:00:14
Warning: Exhausted best-first chunk memory for read FCB0AMFABXX:3:1101:16474:2240#GGCTACAT/1 (patid 400); skipping read
... continuing indefinitely for each line

Do I need a memory switch in the line?

fkrueger 09-28-2011 12:15 PM

This is much better aleady, this time Bowtie doesn't die but merely produces warnings. You can avoid these by giving it more memory for each alignment with the --chunkmbs option. Try --chunkmbs 512 to start with, this normally fixes everything (consult the Bowtie manual for more info on this parameter).

Other than that:
-n 2 is not needed as it is the default
-X 100: I still think this is likely not what you meant.

From the Bowtie Manual:
-X/--maxins <int>
The maximum insert size for valid paired-end alignments. E.g. if -X 100 is specified and a paired-end alignment consists of two 20-bp alignments in the proper orientation with a 60-bp gap between them, that alignment is considered valid (as long as -I is also satisfied). A 61-bp gap would not be valid in that case. If trimming options -3 or -5 are also used, the -X constraint is applied with respect to the untrimmed mates, not the trimmed mates. Default: 250.

If you specify -X 100 at a read length of 90bp then you will only get reads which overlap over nearly the entire length. Remember length(read1)+lenght(insert)+length(read2) need to be smaller than X in order to yield a valid alignment. So in your case something like 400-600 would seem more realistic.

Good luck (and still think about using a splice mapper instead)!

rworthi 09-28-2011 12:25 PM

excellent! bow tie is purring away building the sam file. Thanks so much!

What splice mapper program(s) would you recommend, and is there anything I can use to recalibrate quality scores on these Illumina reads (these are exome capture reads from gDNA, not RNA-seq)? Stampy does not seem to work so far.


All times are GMT -8. The time now is 12:58 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.