SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Interleaved mate pair fastq after quality filtering natstreet Bioinformatics 77 04-11-2015 05:24 PM
Interleaved mate pair fastq after quality filtering natstreet Illumina/Solexa 1 08-10-2010 04:19 AM
Filtering clonal reads AlexB 454 Pyrosequencing 2 05-11-2010 01:30 PM
Filtering SOLiD reads k-gun12 Bioinformatics 8 03-12-2010 09:51 PM
Filtering and excluding reads dawe Bioinformatics 3 09-09-2009 09:47 AM

Reply
 
Thread Tools
Old 08-12-2009, 01:51 PM   #1
samt
Member
 
Location: NYC

Join Date: Aug 2009
Posts: 14
Default Filtering short reads from .fastq

I am getting an error from Bowtie as follows:

Error: Read (SRR015250.2
S0014_20071116_1_ES_EStranscriptome_1_39_146_F3 length=35) is less
than 2 characters long

Is there a program or option in Bowtie to filter these out from the .fastq file?
samt is offline   Reply With Quote
Old 08-13-2009, 03:25 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Has your file already been processed?

It looks like the record header says it is length 35, but Bowtie says it is less than 2. Why not search the file to see what this read looks like? It is possible the file has been corrupted or truncated...

[If you have a valid FASTQ file I could share a short Biopython script to filter out short reads]
maubp is offline   Reply With Quote
Old 08-13-2009, 06:58 PM   #3
samt
Member
 
Location: NYC

Join Date: Aug 2009
Posts: 14
Default

Hi,

I took this fastq file straight from NCBIs Short Read Archive
the contents look like:
@SRR015253.1 LIZ_20071025_2_GrimmondsMES_SS7747_13_23_38_F3 length=35
T1011122220100230032132.2111111002.1
+SRR015253.1 LIZ_20071025_2_GrimmondsMES_SS7747_13_23_38_F3 length=35
!)+%.*%*+2'0%%%-%+%*5'%!%9+'%+<+0%!%
@SRR015253.2 LIZ_20071025_2_GrimmondsMES_SS7747_13_23_119_F3 length=35
T0101233211103200232333.2111211002.1
+SRR015253.2 LIZ_20071025_2_GrimmondsMES_SS7747_13_23_119_F3 length=35
!,.+'+')'390%%%%%%%'%%%!-<++++<99%!%
@SRR015253.3 LIZ_20071025_2_GrimmondsMES_SS7747_13_23_146_F3 length=35
T0312202213101213131111.1110131102.1
+SRR015253.3 LIZ_20071025_2_GrimmondsMES_SS7747_13_23_146_F3 length=35
!93<*/18+%:9%+075*%:;+6!3<26%/<%-%!%

I'm not sure how to make sense of this but bowtie seems to think the reads are less than 2 bp
samt is offline   Reply With Quote
Old 08-13-2009, 08:21 PM   #4
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by samt View Post
Hi,

I took this fastq file straight from NCBIs Short Read Archive
the contents look like:
@SRR015253.1 LIZ_20071025_2_GrimmondsMES_SS7747_13_23_38_F3 length=35
T1011122220100230032132.2111111002.1
+SRR015253.1 LIZ_20071025_2_GrimmondsMES_SS7747_13_23_38_F3 length=35
!)+%.*%*+2'0%%%-%+%*5'%!%9+'%+<+0%!%
@SRR015253.2 LIZ_20071025_2_GrimmondsMES_SS7747_13_23_119_F3 length=35
T0101233211103200232333.2111211002.1
+SRR015253.2 LIZ_20071025_2_GrimmondsMES_SS7747_13_23_119_F3 length=35
!,.+'+')'390%%%%%%%'%%%!-<++++<99%!%
@SRR015253.3 LIZ_20071025_2_GrimmondsMES_SS7747_13_23_146_F3 length=35
T0312202213101213131111.1110131102.1
+SRR015253.3 LIZ_20071025_2_GrimmondsMES_SS7747_13_23_146_F3 length=35
!93<*/18+%:9%+075*%:;+6!3<26%/<%-%!%

I'm not sure how to make sense of this but bowtie seems to think the reads are less than 2 bp
These are color space data. Bowtie does not support ABI SOLiD data. You could try aligners that do, including BFASThttp://genome.ucla.edu/bfast, BWA, MAQ, or SHRiMP.
nilshomer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:18 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO