SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Help to analyze Illumina HiSeq 2000 Human data kiradi Bioinformatics 4 12-09-2011 05:30 AM
Illumina HiSeq 2000 publication vinay052003 Illumina/Solexa 1 11-07-2011 02:28 PM
Setting Bowtie options from the Tophat command line GiladZil RNA Sequencing 2 08-02-2011 02:42 PM
BWA mapping fastq files with Illumina quality maricu Bioinformatics 3 11-19-2010 12:18 PM
HiSeq 2000 File Sizes E_Klee Illumina/Solexa 3 09-28-2010 12:16 PM

Reply
 
Thread Tools
Old 09-28-2011, 08:25 AM   #1
rworthi
Member
 
Location: St. Louis, MO

Join Date: May 2011
Posts: 14
Default bowtie command line for Illumina Hiseq 2000 with Illumina 1.5+ quality encoding files

I am trying to align some paired end exome reads with bowti, but despite using the bow tie -q" switch to recognize fast file, I keep getting the error:

./bowtie -q -n -r -S -t -p 2 -I 0 -X 100 -ff --solexa1.3-quals indexes/hg19 -1 ~/Otogenetics_final_exome-original-110707_I327_FCB0AMFABXX_L3_index11_1.fastq.gz -2 ~/Otogenetics_final_exome-original-110707_I327_FCB0AMFABXX_L3_index11_2.fastq.gz bowtie.exome.sam
Time loading reference: 00:00:10
Time loading forward index: 00:00:18
Time loading mirror index: 00:00:14
Error: reads file does not look like a FASTA file
Seeded quality full-index search: 00:00:00
Time searching: 00:00:42
Overall time: 00:00:42

Same error when the files are unzipped. The fastq data look like:


@FCB0AMFABXX:3:1101:1497:2086#GGCTAAAT/1
TTACTTGCAAAGGAAAGACAATTTTGCATTACATGGGAACCAGCACATTTTCTGAATACACAGTTGTGGCTGGTATCTCTGTTGCTAAAA
+
gggggggdggdggdggggggggggggdggggggggggeggggggggegggegggeggggggeggggdefga`BBBBBBBBBBBBBBBBBB
@FCB0AMFABXX:3:1101:1378:2107#GGCTACAT/1
AATGCAGGAAGTGCTTCAAGGGAAATTTCTCACATTAAAAGCATATATTAGAACATTCAGGAGATCTAAAAGTCAATGCTCTAAGTTTCC
+
gggggggggggdggggggggggdggggggggdggggggggggggegdggdggfgegggefgcgeeegdacegbggfggfagggdgbfceT

Any ideas why bowtie will not accept the fastq format from Illumina Hiseq 2000 output?
__________________
Professor of Pharmaceutical Sciences at Southern Illinois University. Subject of the Personal Genomes Project with CGI genome sequencing, funded by a university seed grant. Annotated human and mouse genomes at the Washington University (St. Louis) GSC during Bob Waterston's tenure. Worked in nematode gene discovery at Divergence, Inc. Recent NSF grant in yeast recombinant genetics. Teach pharmacogenomics, human genomics and pharmaceutical biotechnology.

Last edited by rworthi; 09-28-2011 at 08:49 AM.
rworthi is offline   Reply With Quote
Old 09-28-2011, 08:59 AM   #2
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 594
Default

There seem to be quite a few things going wrong here. From the error message "Error: reads file does not look like a FASTA file" you can see that Bowtie expects Fasta sequences, and not FastQ. My guess is that the switch "-ff" does that. Did you mean to use "--ff"?

A few other things:
-q is default already, no need to specify
-n expects a value between 0 and 3, default is 2
-r and --solexa1.3-quals are conflicting. Your reads are not in Raw format, so simply use --solexa1.3-quals
-I 0 is default so not needed
-X 100 seems to be a very low value, what was the read length in your experiment?
- Did you really mean to use the --ff option?
From the Bowtie manual:
--ff The upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand. E.g., if --fr is specified and there is a candidate paired-end alignment where mate1 appears upstream of the reverse complement of mate2 and the insert length constraints are met, that alignment is valid. Also, if mate2 appears upstream of the reverse complement of mate1 and all other constraints are met, that too is valid. --rf likewise requires that an upstream mate1 be reverse-complemented and a downstream mate2 be forward-oriented. --ff requires both an upstream mate1 and a downstream mate2 to be forward-oriented. Default: --fr when -C (colorspace alignment) is not specified, --ff when -C is specified.

-Bowtie does not read input in .gz format (at least not yet)
And as a final remark, if you want to align mRNA reads you might want to use an aligner which can cope with exon-spanning reads, such as TopHat.

Hope this helps.
fkrueger is offline   Reply With Quote
Old 09-28-2011, 09:25 AM   #3
rworthi
Member
 
Location: St. Louis, MO

Join Date: May 2011
Posts: 14
Default

thanks very much for all that great advice! The read length is 90 bp. I made your modifications and shortened the read file names (and unzipped them) and now get this error, which I am sure is progress:


pgx:bowtie-0.12.7 rworthi$ ./bowtie -n 2 -S -t -p 2 -X 100 --solexa1.3-quals indexes/hg19 -1 ~/oto-L3_1.fq -2 ~/oto-L3_2.fq bowtie.exome.sam
Time loading reference: 00:00:09
Time loading forward index: 00:00:14
Time loading mirror index: 00:00:14
Warning: Exhausted best-first chunk memory for read FCB0AMFABXX:3:1101:16474:2240#GGCTACAT/1 (patid 400); skipping read
... continuing indefinitely for each line

Do I need a memory switch in the line?
__________________
Professor of Pharmaceutical Sciences at Southern Illinois University. Subject of the Personal Genomes Project with CGI genome sequencing, funded by a university seed grant. Annotated human and mouse genomes at the Washington University (St. Louis) GSC during Bob Waterston's tenure. Worked in nematode gene discovery at Divergence, Inc. Recent NSF grant in yeast recombinant genetics. Teach pharmacogenomics, human genomics and pharmaceutical biotechnology.
rworthi is offline   Reply With Quote
Old 09-28-2011, 12:15 PM   #4
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 594
Default

This is much better aleady, this time Bowtie doesn't die but merely produces warnings. You can avoid these by giving it more memory for each alignment with the --chunkmbs option. Try --chunkmbs 512 to start with, this normally fixes everything (consult the Bowtie manual for more info on this parameter).

Other than that:
-n 2 is not needed as it is the default
-X 100: I still think this is likely not what you meant.

From the Bowtie Manual:
-X/--maxins <int>
The maximum insert size for valid paired-end alignments. E.g. if -X 100 is specified and a paired-end alignment consists of two 20-bp alignments in the proper orientation with a 60-bp gap between them, that alignment is considered valid (as long as -I is also satisfied). A 61-bp gap would not be valid in that case. If trimming options -3 or -5 are also used, the -X constraint is applied with respect to the untrimmed mates, not the trimmed mates. Default: 250.

If you specify -X 100 at a read length of 90bp then you will only get reads which overlap over nearly the entire length. Remember length(read1)+lenght(insert)+length(read2) need to be smaller than X in order to yield a valid alignment. So in your case something like 400-600 would seem more realistic.

Good luck (and still think about using a splice mapper instead)!
fkrueger is offline   Reply With Quote
Old 09-28-2011, 12:25 PM   #5
rworthi
Member
 
Location: St. Louis, MO

Join Date: May 2011
Posts: 14
Default

excellent! bow tie is purring away building the sam file. Thanks so much!

What splice mapper program(s) would you recommend, and is there anything I can use to recalibrate quality scores on these Illumina reads (these are exome capture reads from gDNA, not RNA-seq)? Stampy does not seem to work so far.
__________________
Professor of Pharmaceutical Sciences at Southern Illinois University. Subject of the Personal Genomes Project with CGI genome sequencing, funded by a university seed grant. Annotated human and mouse genomes at the Washington University (St. Louis) GSC during Bob Waterston's tenure. Worked in nematode gene discovery at Divergence, Inc. Recent NSF grant in yeast recombinant genetics. Teach pharmacogenomics, human genomics and pharmaceutical biotechnology.

Last edited by rworthi; 09-28-2011 at 12:28 PM.
rworthi is offline   Reply With Quote
Reply

Tags
bowtie, illumina hiseq 2000 reads

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:15 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO