SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
illumina raw genotype data format nans_bn Illumina/Solexa 1 11-21-2012 04:54 PM
Combine PE and SE raw data nguyendofx Bioinformatics 3 01-03-2012 10:51 AM
23andme raw Illumina intensity reads rworthi Bioinformatics 4 12-01-2011 08:29 PM
Illumina Raw output sphil Bioinformatics 8 08-03-2011 04:42 AM
how to evaluate raw data oceanxie Bioinformatics 1 04-15-2011 01:52 AM

Reply
 
Thread Tools
Old 07-23-2008, 08:16 AM   #1
kwebb
Member
 
Location: Wahington, DC

Join Date: Jul 2008
Posts: 21
Default What does Illumina raw data look like?

Hi

I'm trying to work through some of the various assembler programs before actually collecting my own Illumina data. I've found some test datasets here:

http://sharcgs.molgen.mpg.de/download.shtml

but I'm not sure if the file formats are the same as raw data from the Genome Analzyer.

The files are s_4_seq.txt and s_4_prb.txt and the first few lines look like this:
s_4_seq.txt
4 1 56 910 AACTTACAATTGAAAATATAAACTCAT
4 1 64 716 AAGATGATTATATGTCTTCCTTTTCGA
4 1 890 894 TCAAACCAATCAGACCTATGTTTCATA

s_4_prb.txt
40 -40 -40 -40 40 -40 -40 -40 -40 40 -40 -40 -40 -4
0 -40 40 -40 -40 -40 40 40 -40 -40 -40 -40 40 -40
-40 40 -40 -40 -40 40 -40 -40 -40 -40 -40 -40 40

So my questions are
1. Is this the raw data format from the machine?
2. How do I get these files into fastq format? The maq converter and sanger perl scripts previously mentioned do not seem to work.

Thank you!
kwebb is offline   Reply With Quote
Old 07-23-2008, 10:17 AM   #2
kwebb
Member
 
Location: Wahington, DC

Join Date: Jul 2008
Posts: 21
Default

Update

I've managed to convert my data using the solexa2fasta.pl script. However the tool included with Maq, sol2sanger, does not work with my data. Can someone please explain?

Thank you!
kwebb is offline   Reply With Quote
Old 07-23-2008, 09:31 PM   #3
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

There's a really good tool in the maq package (latest release) called fq_all2std.pl

See below:


Usage: fq_all2std.pl <command> <in.txt>

Command: scarf2std Convert SCARF format to the standard/Sanger FASTQ
fqint2std Convert FASTQ-int format to the standard/Sanger FASTQ
sol2std Convert Solexa/Illumina FASTQ to the standard FASTQ
fa2std Convert FASTA to the standard FASTQ
seqprb2std Convert .seq and .prb files to the standard FASTQ
fq2fa Convert various FASTQ-like format to FASTA
export2sol Convert Solexa export format to Solexa FASTQ
export2std Convert Solexa export format to Sanger FASTQ
csfa2std Convert AB SOLiD read format to Sanger FASTQ
instruction Explanation to different format
example Show examples of various formats

Note: Read/quality sequences MUST be presented in one line.
zee is offline   Reply With Quote
Old 07-23-2008, 09:32 PM   #4
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

There's a really good tool in the maq package (latest release) called fq_all2std.pl

See below:


Usage: fq_all2std.pl <command> <in.txt>

Command: scarf2std Convert SCARF format to the standard/Sanger FASTQ
fqint2std Convert FASTQ-int format to the standard/Sanger FASTQ
sol2std Convert Solexa/Illumina FASTQ to the standard FASTQ
fa2std Convert FASTA to the standard FASTQ
seqprb2std Convert .seq and .prb files to the standard FASTQ
fq2fa Convert various FASTQ-like format to FASTA
export2sol Convert Solexa export format to Solexa FASTQ
export2std Convert Solexa export format to Sanger FASTQ
csfa2std Convert AB SOLiD read format to Sanger FASTQ
instruction Explanation to different format
example Show examples of various formats

Note: Read/quality sequences MUST be presented in one line.
zee is offline   Reply With Quote
Old 07-24-2008, 05:35 AM   #5
kwebb
Member
 
Location: Wahington, DC

Join Date: Jul 2008
Posts: 21
Default

Great tool!

Thanks for the info!
kwebb is offline   Reply With Quote
Old 01-26-2009, 02:23 AM   #6
hannat
Member
 
Location: Germany

Join Date: Jan 2009
Posts: 16
Default

I have similar data, seq.txt and prb just like you,
seq.txt
........................................................................
6 1 914 893 GCTACTGCCGTGACCTCATTTCTCTTA
6 1 898 905 GAAAAAGAGAAAGTTTAGGAGATCGAT
.....................................................................................
prob.txt
.....................................................................................
-30 -30 30 -30 -30 30 -30 -30 -30 -30 -30 30 30 -30 -30
-30 -30 30 -30 -30 -30 -30 -30 30 -30 -30 30 -30 -30 3
0 -30 -30 -30 30 -30 -30 -30 -30 30 -30 -30 -30 -30 30
-30 -30 30 -30 30 -30 -30 -30 -30 30 -30 -30 -30 30 -30
-30 -30 -30 -30...
.............................................................................

but when i run
fq_all2std.pl seqprb2std seq.txt prb.txt
The output is like following,
...........................................
@6:1:914:893
GCTACTGCCGTGACCTCATTTCTCTTA
+
???????????????????????????
..................................................

And i had lots of the warnings, similar things like this

but there is other problems, i got lots of warning message like this:
Argument "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0..." isn't numeric in numeric gt (>) at /usr/local/bin/fq_all2std.pl line 152, <$fhq> line 6609.
....................................................................................................................................................................................................... line ......


i wonder if this kind of warning is happening to others too, if so, what do you think the problem is?
now i am checking my prb.txt, i guess there is some lines which was not accpeted.

Last edited by hannat; 01-26-2009 at 03:00 AM.
hannat is offline   Reply With Quote
Old 03-19-2009, 06:40 PM   #7
alig
Member
 
Location: adelaide

Join Date: Sep 2008
Posts: 43
Default

Hi,

Re : There's a really good tool in the maq package (latest release) called fq_all2std.pl

I tried to use

fq2fa Convert various FASTQ-like format to FASTA

to convert my illumina seq data from fastq to fasta as I want the quality in fasta format to run Mosaik's gigBayes program.

But the Maq perl script fq_all2std.pl fq2fa <in.txt> command

just seemed to print the results to the screen & not place them in a fasta file.

Am I doing something really silly here?

Only I've got a 1.8 Gb illumina seq text file so this process takes a while & I need it in a file, not printed to the screen

thanks alig
alig is offline   Reply With Quote
Old 03-23-2009, 06:37 AM   #8
lparsons
Member
 
Location: NJ

Join Date: Nov 2008
Posts: 28
Default

You simply need to redirect the standard output (which is printing to your screen) to a file:

fq_all2std.pl fq2fa in.txt > out.fasta

See http://www.december.com/unix/tutor/redirect.html for more info.
lparsons is offline   Reply With Quote
Old 03-25-2009, 03:49 PM   #9
alig
Member
 
Location: adelaide

Join Date: Sep 2008
Posts: 43
Default convert fastq to fasta

To lparsons,

Thank you. Yes I realised that later after I'd sent my post.

Also in case anyone else is looking to separate a fastq file into seq.fasta & qual.fasta files you actually need the other command within Maq

fq_all2std.pl std2qual <out.prefix> <in.fastq>

Thanks again

alig
alig is offline   Reply With Quote
Old 08-05-2009, 07:01 AM   #10
subram28@msu.edu
Junior Member
 
Location: Michigan

Join Date: Jul 2009
Posts: 1
Default Bowtie

Has anybody used Bowtie for mapping?
subram28@msu.edu is offline   Reply With Quote
Old 09-27-2009, 03:38 PM   #11
spadejac
Junior Member
 
Location: Newark, Delaware

Join Date: Sep 2009
Posts: 4
Default Bowtie for alignment

Quote:
Originally Posted by subram28@msu.edu View Post
Has anybody used Bowtie for mapping?
Oh yeah! We have. And that is the best that I've come across in my career for alignment of short reads. Just too fast - Great for expression data.

Spade
spadejac is offline   Reply With Quote
Old 09-29-2009, 12:19 PM   #12
der_eiskern
Member
 
Location: California

Join Date: Jul 2009
Posts: 46
Default

Quote:
Originally Posted by spadejac View Post
Oh yeah! We have. And that is the best that I've come across in my career for alignment of short reads. Just too fast - Great for expression data.

Spade
I heard bowtie is great for mapping Chromatin IPs and RNA back to a reference but isn't as good as MAQ for finding snps though. Is this accurate?
der_eiskern is offline   Reply With Quote
Old 11-27-2009, 04:20 AM   #13
lhw_genome
Junior Member
 
Location: Toky

Join Date: Nov 2009
Posts: 3
Default

hi,
everyone, I am a new user of BWA. Greatly appreciate if I could get any of your help!
I have paired-end Solexa data (in two files s_2_1.export.txt ; s_2_2_export.txt) presented in the following format (SCARF ASCII with mapping information)

HWI-EAS433 16 3 11 255 71 0 2 TGAAAGGGAATATCTTCATATAAAATCTAGACAAAAGCATTCTCAGAATC abbb``b_`aaab_bb``babaa_`a^b_a__aaa`aa`aa`_`aa[^a_
chr9.fa 66572916 F G32G3A10G1 33 0 chr7.fa 61087451 R Y

now, I would like to convert the Solexa export file to fastq format file so that I could use BWA, I tried the scripts fq_all2std.pl export2std command, but it doesn't work. i also tried scarf2std command, it converted my file, but the export file was not the fastq format, there was other information (Eland mapping position also included in the output file.

I don't have any experience to write perl or other scripts.
Could you please help me?
Many thanks!
lhw_genome is offline   Reply With Quote
Old 11-30-2009, 02:28 AM   #14
federica torri
Junior Member
 
Location: Milan, Italy

Join Date: Nov 2009
Posts: 2
Default script format converter

Quote:
Originally Posted by alig View Post
To lparsons,

Thank you. Yes I realised that later after I'd sent my post.

Also in case anyone else is looking to separate a fastq file into seq.fasta & qual.fasta files you actually need the other command within Maq

fq_all2std.pl std2qual <out.prefix> <in.fastq>

Thanks again

alig
Hi,

I checked in my maq 0.7.1 version for this script but I didn't find it...do you know if it is anymore available or did you find it as a supplemetary maq script? Thanks
federica torri is offline   Reply With Quote
Old 02-08-2010, 02:20 AM   #15
niazi84@hotmail.com
Member
 
Location: Uppsala

Join Date: Jan 2010
Posts: 25
Default

@hannat..

Did you get the solution of your problem. I have same kind of problem with my data.

Thanks
__________________
~Adnan~
niazi84@hotmail.com is offline   Reply With Quote
Old 07-15-2010, 02:14 PM   #16
vjimenez
Junior Member
 
Location: Cuernavaca, Morelos, Mexico

Join Date: Oct 2009
Posts: 4
Default

If you are working in LINUX, you can use awk as follows:

awk '$12 ~ /Y/{print "@"$1"_000"$2":"$3":"$4":"$5":"$6"#"$7"/"$8"\n"$9"\n+"$1"_000"$2":"$3":"$4":"$5":"$6"#"$7"/"$8"\n"$10}' s_1_export.txt > s_1_sequence.txt
__________________
Veronica Jimenez Jacinto
UUSM.
vjimenez is offline   Reply With Quote
Old 08-11-2010, 12:44 AM   #17
Asifullah
Junior Member
 
Location: Pakistan

Join Date: Aug 2010
Posts: 5
Default

Dear All,
I am anew user and i am analyzing Illumina NGS data. I downloaded the bowtie on Linux on 32 bit Linux system for reference based assembly. I sucessfully follow its tutorial for aligning an exemplary data already given within software folder. But I am stuck at Samtool step of aligning visualization. could some one please help me beyond that step. I thing i can,t compiled accurately the Samtool. could you please provide ready to run compiled version of samtool for 32 bit Suse linx system. I will higly oblige. my email address for corresponding is (asifullah111@gmail.com).
Thanks all and sorry if my question is too silly as i am a new user of bowite.

Asif
Asifullah is offline   Reply With Quote
Old 08-11-2010, 12:49 AM   #18
Asifullah
Junior Member
 
Location: Pakistan

Join Date: Aug 2010
Posts: 5
Default

Quote:
Originally Posted by kwebb View Post
Hi

I'm trying to work through some of the various assembler programs before actually collecting my own Illumina data. I've found some test datasets here:

http://sharcgs.molgen.mpg.de/download.shtml

but I'm not sure if the file formats are the same as raw data from the Genome Analzyer.

The files are s_4_seq.txt and s_4_prb.txt and the first few lines look like this:
s_4_seq.txt
4 1 56 910 AACTTACAATTGAAAATATAAACTCAT
4 1 64 716 AAGATGATTATATGTCTTCCTTTTCGA
4 1 890 894 TCAAACCAATCAGACCTATGTTTCATA

s_4_prb.txt
40 -40 -40 -40 40 -40 -40 -40 -40 40 -40 -40 -40 -4
0 -40 40 -40 -40 -40 40 40 -40 -40 -40 -40 40 -40
-40 40 -40 -40 -40 40 -40 -40 -40 -40 -40 -40 40

So my questions are
1. Is this the raw data format from the machine?
2. How do I get these files into fastq format? The maq converter and sanger perl scripts previously mentioned do not seem to work.

Thank you!
Hi,

I my self facing the same format within my illumina sequencing file which you have shown here. could you please provide me any perl script for converting such data in to fasta or fastq format. i will be highly oblige to find any guidelines from your side. my email address for corresponding is (asifullah111"gmail.com).

regards
asif
Asifullah is offline   Reply With Quote
Old 08-25-2010, 11:38 AM   #19
husamia
Member
 
Location: cinci

Join Date: Apr 2010
Posts: 66
Default

Quote:
Originally Posted by vjimenez View Post
If you are working in LINUX, you can use awk as follows:

awk '$12 ~ /Y/{print "@"$1"_000"$2":"$3":"$4":"$5":"$6"#"$7"/"$8"\n"$9"\n+"$1"_000"$2":"$3":"$4":"$5":"$6"#"$7"/"$8"\n"$10}' s_1_export.txt > s_1_sequence.txt
just to clarify, is this to convert the format SCARF ASCII mentioned above? is there any quality trimming done? because I got a file that was smaller than what I expected. I started out with file that has 43,236,910 reads to a file that has 80,81,040 lines. here is sample of input to I take it same as above post
HWI-EAS393 0031 5 1 1295 9710 0 3 AGACGTGTGTCTGAGTAAGGAACCCGCGGGGAAGGG ]PLLPU\]Z_`^`L`aL^`LYb^bbc`^^cH``TL^ c10.fa 130687332 F 3A26T3T1 70 188 128 R Y

Last edited by husamia; 08-25-2010 at 11:44 AM.
husamia is offline   Reply With Quote
Old 09-02-2010, 06:58 AM   #20
vschulz
Junior Member
 
Location: northeast

Join Date: Apr 2009
Posts: 8
Default

The awk line only outputs sequences with Y in the 12th (QC??) field. If you want all sequences in fastq output, you can do

awk ' {print "@"$1"_000"$2":"$3":"$4":"$5":"$6"#"$7"/"$8"\n"$9"\n+"$1"_000"$2":"$3":"$4":"$5":"$6"# "$7"/"$8"\n"$10}' s_1_export.txt > s_1_sequence.txt

caveats that I don't know awk , but output seems correct.
vschulz is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:35 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO