SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
error with sam output ->Parse error at line xxxxx: missing colon in auxiliary data manore Bioinformatics 11 11-25-2013 01:50 PM
Illumina FASTQ Quality Scores - Missing Value Bio.X2Y Bioinformatics 24 08-29-2013 07:01 AM
"#" in illumina reads fastq quality line doublealice Bioinformatics 2 06-09-2012 03:18 PM
Missing V600E mutation in WES of A375 cell line? Samtools problem? angerusso Bioinformatics 15 02-19-2012 05:02 PM
Fastq groomer from command line Giles Bioinformatics 4 12-14-2011 12:04 AM

Reply
 
Thread Tools
Old 12-21-2012, 09:43 PM   #1
a_mt
Member
 
Location: C:/Program files/Google/Chrome

Join Date: Jul 2012
Posts: 34
Question fastq with reads missing 3rd line

Hi all,
I have a set of fastq files. Some of the fastq files have reads which are missing the 3rd line (which begins with +).

Code:
@HWI-ST750:151:C1C6AACXX:5:2316:17997:100881 1:N:0:
AGCGGTNCGCAATATTTTAGTAGCTCGTTACAGTCCGGTGCGTTTTTGGTTTTTTGAAAGTGCGTCTTCAGAGCGCTTTTGGTTTTCAAAAGCGCTCTGAAGTT+
1:+B+0#2<DDDDIIIIIIFIIIIIIIIIIIIIEEIID?DDDBBADIII@DDDDD@@AAAAAAA??A?ADAAAA?????A>?8<>?>AAAAA8>>?><AAAA:4
@HWI-ST750:151:C1C6AACXX:5:2316:19129:100793 1:N:0:
AGCGCTNCTTGATATCAATCAACTGCTAGACAAATCCAATAGTAAATTGGGTAAACCAAATCTCGATATCGACAGCAAAGTATCACAATATGCCTATAACTACA+
;1=DD?#2ACDDDIDEEIIIEIEIEIEEIIIIDEIEIIIDEDDCEDDDEID?BBDIDIIIIEECDIDA@DD=A?D@DAAAA@DDDA>AAABE>AAAA>A>A>AA
@HWI-ST750:151:C1C6AACXX:5:2316:19695:100854 1:N:0:
AGCGCTNCACCGCGGTAAGCTTTAGCAGATCTCACTTTGTCTAGCGTTTGAACCATGTTTTCAAGGATATTGGCTCTAAGTTGTGGGTATTTTTCGATCACTTC+
@<1DDD#2<DFDDGI@CEEGHIIIIIIIEGIIHCGHIIIHGGIGIIAFFHFHAHHIG?CCHFHEEBBC@CDCCCCACCCC5>CCBBB'>ACDECCBBDB7?CC>
And also sequence line contains the + at the end. I guess 3rd line has been concatenated to the end of 2nd line.
Any thoughts on how to proceed with this kind of data?? Any scripts to change it into proper format ??
a_mt is offline   Reply With Quote
Old 12-21-2012, 11:04 PM   #2
jiaco
Member
 
Location: GMT +1

Join Date: May 2010
Posts: 33
Default

If this problem is consistent throughout the file

Code:
sed "s/+$/\\`echo -e '\n\r'`+/g" bad.fastq > good.fastq
should do the trick.

EDIT: never remember adding a newline with sed to be complicated like that, but just tested on a mac and this was required. Maybe on linux it is simpler but I do not have a system here to test.

Last edited by jiaco; 12-21-2012 at 11:16 PM.
jiaco is offline   Reply With Quote
Old 12-21-2012, 11:57 PM   #3
a_mt
Member
 
Location: C:/Program files/Google/Chrome

Join Date: Jul 2012
Posts: 34
Default

Thanks for the snippet jiaco.
Even I had tried this before but this aslso messes up with quality score line which are ending with +

And also there are reads having empty lines in between. My question is what is the source of this kind of output?? Is this some sort of of sequencing error ??

Code:
@@1D4A#2AFHHFIHIIIIIIIIIIIIIIIIIBHHIIIIIIIIIIIIIIIIIIIIIIIDEHIIIHFFHFEEBDEEECCCBBBCC?CB?CCCBBBBB@BBBBBBB/1

@HWI-ST750:151:C1C6AACXX:5:2316:9996:50328/1
GGCCCCNATACATTTACTGATTCATCCTCAGCGGACTCTGATATGACATCCACTAAAAAATATGTCAGACCACCACCAATGTTAACCTCACCTAATGACTTTCC+
=71?A@#23CDCD@E@ED?FEFCEI<ECFEA>CDDD6?BDEEC9<DBEEIC<BEEIE3@8?;=>?BA>A:(;;@;=???3:>>D####################/1

@HWI-ST750:151:C1C6AACXX:5:2316:9999:44022/1
GGCCACNATCTCGATAATTATAAGATATCTTTAGCACAGGCAAATTGGAACGCAAGCGAAGTTTCGAAAAAGCTAGTAAATATTCAAACAGATGGGTCTATTTC+
???D;B#2ADDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIEIIIIIIIIIIID?CDEDDD@AAA?AAAADEAEEDDDEDBA?AAAAA?A>?ADDD3/1
a_mt is offline   Reply With Quote
Old 12-22-2012, 12:20 AM   #4
jiaco
Member
 
Location: GMT +1

Join Date: May 2010
Posts: 33
Default

You could expand the expression to match
Code:
/^[ACGT].*+$/
to avoid quality lines, but I have no idea where you got the file, let alone how it got corrupted.

EDIT: saw your new example just now, there is an issue with this file. Maybe someone else has seen it before.
But I would not try to fix this mess. You need to re-acquire the data.
jiaco is offline   Reply With Quote
Old 12-22-2012, 12:32 AM   #5
a_mt
Member
 
Location: C:/Program files/Google/Chrome

Join Date: Jul 2012
Posts: 34
Default

Yes, sequence files were given to me by our sequence provider, which I demultiplexed. But after demultiplexing this is the result. May be there is an issue with this. Anyways I will contact them. Thanks for the suggestion.
a_mt is offline   Reply With Quote
Old 12-28-2012, 01:59 AM   #6
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 623
Default

How do the original files look like? Format?
How did you multiplex? What program?

Sven
sklages is offline   Reply With Quote
Reply

Tags
fastq reads 3rd line

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:47 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO