SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   NGS users (http://seqanswers.com/forums/showthread.php?t=42867)

archana 04-28-2014 01:06 AM

NGS users
 
Hi All,

I am new user to NGS, i got my NGS data in illumina platform, with the format,

HWUSI-EAS481_8291_FC30KTA:3:1:825:1199:GTTAAGTTTATAGATCAGGTGTAGTCGTATGCCGT:hhhhh
hhhhhhhhhhhhhGhhV^hhhhhhhhhhhh
HWUSI-EAS481_8291_FC30KTA:3:1:1027:394:GATGGTCATTATAAAACTTCAATCGTATGCCGTCT:hhhhh
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
can anybody please tell me how to convert it to fastq, input format for pre-processing using FastQ toolkit.

Thankyou,
Archana

dpryan 04-28-2014 02:41 AM

That's an annoying format that you have to deal with. You can split each line and output things in fastq format with:

Code:

awk -F ":" '{printf("@%s:%s:%s:%s:%s\n",$1,$2,$3,$4,$5);print $6; print"+"; print $7}' foo.txt > foo.fastq
You'll need to rename "foo.txt" to whatever your files are actually called.

Brian Bushnell 04-28-2014 09:06 AM

That's called "scarf" format. BBTools can translate it to other formats with the "reformat" tool:

reformat.sh in=reads.scarf extin=.scarf out=reads.fq

archana 05-14-2014 12:34 AM

Thankyou sir.. it worked well.


All times are GMT -8. The time now is 08:21 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.