Seqanswers Leaderboard Ad

**jiaco** · 12-22-2012, 12:04 AM

If this problem is consistent throughout the file

Code:

sed "s/+$/\\`echo -e '\n\r'`+/g" bad.fastq > good.fastq

should do the trick.

EDIT: never remember adding a newline with sed to be complicated like that, but just tested on a mac and this was required. Maybe on linux it is simpler but I do not have a system here to test.

**a_mt** · 12-22-2012, 12:57 AM

Thanks for the snippet jiaco.
Even I had tried this before but this aslso messes up with quality score line which are ending with +

And also there are reads having empty lines in between. My question is what is the source of this kind of output?? Is this some sort of of sequencing error ??

Code:

@@1D4A#2AFHHFIHIIIIIIIIIIIIIIIIIBHHIIIIIIIIIIIIIIIIIIIIIIIDEHIIIHFFHFEEBDEEECCCBBBCC?CB?CCCBBBBB@BBBBBBB/1

@HWI-ST750:151:C1C6AACXX:5:2316:9996:50328/1
GGCCCCNATACATTTACTGATTCATCCTCAGCGGACTCTGATATGACATCCACTAAAAAATATGTCAGACCACCACCAATGTTAACCTCACCTAATGACTTTCC+
=71?A@#23CDCD@E@ED?FEFCEI<ECFEA>CDDD6?BDEEC9<DBEEIC<BEEIE3@8?;=>?BA>A:(;;@;=???3:>>D####################/1

@HWI-ST750:151:C1C6AACXX:5:2316:9999:44022/1
GGCCACNATCTCGATAATTATAAGATATCTTTAGCACAGGCAAATTGGAACGCAAGCGAAGTTTCGAAAAAGCTAGTAAATATTCAAACAGATGGGTCTATTTC+
???D;B#2ADDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIEIIIIIIIIIIID?CDEDDD@AAA?AAAADEAEEDDDEDBA?AAAAA?A>?ADDD3/1

**jiaco** · 12-22-2012, 01:20 AM

You could expand the expression to match

Code:

/^[ACGT].*+$/

to avoid quality lines, but I have no idea where you got the file, let alone how it got corrupted.

EDIT: saw your new example just now, there is an issue with this file. Maybe someone else has seen it before.
But I would not try to fix this mess. You need to re-acquire the data.

**a_mt** · 12-22-2012, 01:32 AM

Yes, sequence files were given to me by our sequence provider, which I demultiplexed. But after demultiplexing this is the result. May be there is an issue with this. Anyways I will contact them. Thanks for the suggestion.

**sklages** · 12-28-2012, 02:59 AM

How do the original files look like? Format?
How did you multiplex? What program?

Sven

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 54 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 50 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 44 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

fastq with reads missing 3rd line

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News