SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bowtie index problem (bowtie-build and then bowtie-inspect) tgenahmet Bioinformatics 4 09-10-2013 12:51 PM
bowtie command line for Illumina Hiseq 2000 with Illumina 1.5+ quality encoding files rworthi Illumina/Solexa 4 09-28-2011 12:25 PM
Setting Bowtie options from the Tophat command line GiladZil RNA Sequencing 2 08-02-2011 02:42 PM
EBI NGS Workshop 4th-6th April 2011: spaces still available cochrane Events / Conferences 0 03-19-2011 02:29 AM
Bowtie -Color spaces anusha Bioinformatics 7 02-03-2010 06:05 AM

Reply
 
Thread Tools
Old 02-29-2012, 09:26 PM   #1
rfrancis
Junior Member
 
Location: Perth, Australia

Join Date: Jul 2011
Posts: 7
Default Bowtie truncates ID line if it has spaces

Dear all,
Has anyone seen this before? I am using bowtie v0.12.7 to align reads from the short read archive which have IDs as follows:

SRR064286.51418 HWI-EAS418:1:5:1357:1070 length=50

In the resultant SAM file where bowtie finds a match, for some reason the ID is truncated to the first space:

SRR064286.51418

However when no match is found the ID is reported in full.

This seems odd, so I would appreciate someone trying to replicate this for me. Below are a couple of reads and a very short sequence to use as a reference. The first read should match but the other should not. Can someone try and align these using bowtie and let me know what you get.

Many thanks in advance.

Reads: Save as test.fq
@SRR064286.10 HWI-EAS418:1:4:1:147 length=50
TGGCTTCTTCTGTCTTCATAAGTTTTTCCAGGCGGTCTTCCAAGTCCAAA
+SRR064286.10 HWI-EAS418:1:4:1:147 length=50
BCBCCCCCCCCA8::>:?:>8!/@:1&7>6@BCBA@CACCA6>!<BB<BA
@SRR064286.11 HWI-EAS418:1:4:1:119 length=50
GGTTGTAGGACAGCATTTCAAGAACTAAACAGAGATGGTTTCGGAACATA
+SRR064286.11 HWI-EAS418:1:4:1:119 length=50
BBABA@BAABB:3707::9</!.B>:76:8;B9BAAAB>BBC<!<BCBB?

Ref: Save as ref.fa and run "bowtie-build ref.fa ref" to make a reference
>testref
ATTTCGATGCGAGCTTATTCGAGGCGTATCGTAGCGAGTGCTAGGGCTAT
TGGCTTCTTCTGTCTTCATAAGTTTTTCCAGGCGGTCTTCCAAGTCCAAA
GCGGATTGCTGATGCGAGCGTAGTCGTAGTGTGCGTATTGCGATTCGATG

Run bowtie with "bowtie --sam ref test.fq test.sam" and check out the SAM file test.sam.

Thanks for your help
Rich
rfrancis is offline   Reply With Quote
Old 02-29-2012, 10:15 PM   #2
rfrancis
Junior Member
 
Location: Perth, Australia

Join Date: Jul 2011
Posts: 7
Default

I posted this in another lengthy thread to which Xi Wang replied advising the use of the --fullref parameter. I think this only applies to the reference sequence not the read ID as this had no affect on my test data and I still get the read that matches having a truncated ID. If someone can confirm this is happening on their system then I would very much appreciate it.
Regards,
Rich
rfrancis is offline   Reply With Quote
Old 03-01-2012, 01:07 AM   #3
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

Quote:
Originally Posted by rfrancis View Post
Dear all,
Has anyone seen this before? I am using bowtie v0.12.7 to align reads from the short read archive which have IDs as follows:

SRR064286.51418 HWI-EAS418:1:5:1357:1070 length=50
Most tools would say given that in the FASTA > line or FASTQ @ line that the identifier was just SRR064286.51418 and the rest is free form description text.

In this case with SRA reads you might want to remove the SRR ID leaving the original Illumina ID of HWI-EAS418:1:4:1:147 on its own.
maubp is offline   Reply With Quote
Old 03-01-2012, 01:20 AM   #4
rfrancis
Junior Member
 
Location: Perth, Australia

Join Date: Jul 2011
Posts: 7
Default

Thanks maubp. I agree that anything after the first space is description I'm just concerned that bowtie is not consistently returning either the full ID or a truncated one. I'd rather not have to edit all my reads so I hope someone knows a solution to this. Unless it's a bug of course!
Thanks again for your reply.
Rich
rfrancis is offline   Reply With Quote
Old 03-01-2012, 08:04 PM   #5
rfrancis
Junior Member
 
Location: Perth, Australia

Join Date: Jul 2011
Posts: 7
Default

For anyone following this thread I've just submitted this as a bug on their sourceforge site (ID: 3496148). There's also a similar report there too so I know it's not just me having this problem!
Hopefully they can fix this easily.
Regards,
Rich
rfrancis is offline   Reply With Quote
Old 03-02-2012, 07:47 AM   #6
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

For bowtie version 0.12.7 I confirm what you see:

1) For non-SAM output, the full ID of the mapped sequence is given.

2) For SAM output, only a partial ID of the mapped sequence is given while the full ID is given for a non-mapped sequence.

Don't know if it is a bug or not but it does seem like strange and unexpected behavior.
westerman is offline   Reply With Quote
Old 03-02-2012, 07:56 AM   #7
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

bowtie version 0.11.3 does not have the problem. See output below:

Code:
@HD	VN:1.0	SO:unsorted
@SQ	SN:testref	LN:150
@PG	ID=Bowtie	VN=0.11.3	CL="bowtie --fullref --sam ref test.fq test.sam"
SRR064286.10 HWI-EAS418:1:4:1:147 length=50	0	testref	51	255	50M	*	0 0TGGCTTCTTCTGTCTTCATAAGTTTTTCCAGGCGGTCTTCCAAGTCCAAA	BCBCCCCCCCCA8::>:?:>8!/@:1&7>6@BCBA@
CACCA6>!<BB<BA	XA:i:0	MD:Z:50	NM:i:0
SRR064286.11 HWI-EAS418:1:4:1:119 length=50	4	*	0	0	*	*	0	0GGTTGTAGGACAGCATTTCAAGAACTAAACAGAGATGGTTTCGGAACATA	BBABA@BAABB:3707::9</!.B>:76:8;B9BAA
Compare the above to the output from
bowtie version 0.12.7
below:

Code:
@HD	VN:1.0	SO:unsorted
@SQ	SN:testref	LN:150
@PG	ID:Bowtie	VN:0.12.7	CL:"bowtie --fullref --sam ref test.fq test.sam"
SRR064286.10	0	testref	51	255	50M	*	0	0	TGGCTTCTTCTGTCTTCATA
AGTTTTTCCAGGCGGTCTTCCAAGTCCAAA	BCBCCCCCCCCA8::>:?:>8!/@:1&7>6@BCBA@CACCA6>!<BB<BA	XA:i:0	MD:Z
:50	NM:i:0
SRR064286.11 HWI-EAS418:1:4:1:119 length=50	4	*	0	0	*	*	0	0GGTTGTAGGACAGCATTTCAAGAACTAAACAGAGATGGTTTCGGAACATA	BBABA@BAABB:3707::9</!.B>:76:8;B9BAA
westerman is offline   Reply With Quote
Old 03-02-2012, 08:44 AM   #8
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 614
Default

Quote:
Originally Posted by westerman View Post
For bowtie version 0.12.7 I confirm what you see:

1) For non-SAM output, the full ID of the mapped sequence is given.

2) For SAM output, only a partial ID of the mapped sequence is given while the full ID is given for a non-mapped sequence.

Don't know if it is a bug or not but it does seem like strange and unexpected behavior.
Don't know if it helps but I found that tab characters within the read IDs also truncate the non-SAM output while spaces don't elicit this behavior (the question is of course: why would someone put tabs into a read ID...?)
fkrueger is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:24 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO