SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SRA to fastq conversion with fastq-dump loses sequences pcantalupo Bioinformatics 13 10-08-2015 04:09 PM
How to convert sra-lite format to fastq? tbusch0000 Bioinformatics 23 08-21-2013 08:53 PM
Is SRA format to vcf conversion possible amruta.bn Bioinformatics 3 05-29-2012 07:42 PM
Reduce file size after Illumina FASTQ to Sanger FASTQ conversion? jjw14 Illumina/Solexa 2 06-01-2010 04:35 PM
Question about using sra_toolkit to transform the SRA format into FASTQ format areyousad Bioinformatics 0 05-16-2010 10:56 PM

Reply
 
Thread Tools
Old 05-02-2012, 07:59 PM   #1
snape_ar
Junior Member
 
Location: Seattle

Join Date: Oct 2011
Posts: 5
Default SRA Toolkit and Conversion to Illumina Fastq Format

Hi Seqers,

I am trying to convert the SRA ChIP-Seq file (SRA Archive) to Illumina Fastq format. I ran illumina-dump -A <Accession Number> <filename>. I got about more 100 qcal and seq files. Now, I would like to know what should me my input file for ELAND_standalone.pl aligner program.

Do I have to concatenate all my 100 seq files into 1 file and then run ELAND_standalone.pl ?

Any help/hints/suggestions/advice would highly be appreciated.

Thanks.
snape_ar is offline   Reply With Quote
Old 05-03-2012, 12:32 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

You could use EMBOSS seqret (or BioPerl or Biopython or ...) to convert from Sanger FASTQ encoding to the old Illumina encoding - but you might also need to massage the record names to suit ELAND.
maubp is offline   Reply With Quote
Old 05-17-2012, 05:13 AM   #3
mathew
Member
 
Location: australia

Join Date: Jan 2011
Posts: 81
Default EMBOSS seqret command for converting FAstq Sanger old illumina FASTQ

I am trying to find what unix command I may use to convert FAStq new QC format (sanger) to old illumina qc format. I have PE data. Thanks

Last edited by mathew; 05-17-2012 at 05:55 AM.
mathew is offline   Reply With Quote
Old 05-18-2012, 12:45 AM   #4
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by mathew View Post
I am trying to find what unix command I may use to convert FAStq new QC format (sanger) to old illumina qc format. I have PE data. Thanks
What do you mean by QC?

EMBOSS seqret (mentioned earlier) can interconvert Sanger FASTQ (used for Illumina 1.8+), the original Solexa to Illumina 1.2 FASTQ (which did not use PHRED scores), and the Illumina 1.3 to 1.7 FASTQ variants.
maubp is offline   Reply With Quote
Old 05-18-2012, 06:06 AM   #5
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

i.e. If your input Sanger encoded FASTQ file is called input.fastq, and you want to turn it into Illumina 1.3+ encoded FASTQ, try:

seqret -sequence=input.fastq -sformat=fastq-sanger -osformat=fastq-illumina -outseq=output.fastq
maubp is offline   Reply With Quote
Old 05-19-2012, 05:49 AM   #6
mathew
Member
 
Location: australia

Join Date: Jan 2011
Posts: 81
Default conversion to illumina 1.3Fastq

Hi maubp,

Thanks for your help I ran the command it did not gave me any error. Here is a part of file before running command (before, sanger) and after running comand (after Illumina). I dont see a difference. Am I missed something or did something wrong.
I just inserted in put and out put file names. Any advice please.
__
(Before, sanger)


@HWI-ST413:193:D092FACXX:1:1101:1180:1912 1:N:0:
NATGTACCTGACGAAGCAGCTACCATCTCAGCAGTTGCTGGTCACTGTGCAGTGGAAAAGAGAGAAGTGCATGAAGTCAGCAATTATACTTGGCCTGGAAG
+
#1=DDDFFHGHHGHAHGIIIIEIGHGHGIIGDIACDHIIIIHBDHHGIEHIIFHIGCHG@@FGIEHI=CHGEFCB?DCDFAECCECCDACCCC>>A@BBCC
@HWI-ST413:193:D092FACXX:1:1101:1225:1915 1:N:0:
NAACAGAATAAAGATTATAATTACATTTGATTTAGTTCCAAAAACGGAGTCAAAAATCTTAACCTTTGACAAGACCTGTGTAAAGAAGCTGAGGTAAGCAT
+
#1:BDAAB:DFDB9E:E:<?IECFEABA<EEF@@C?B?<FFGII<0:)?09BGFEC>8=)=B8B4=)7.)7=2=D))7)=@B@BBB96>6;ABB5;>@BBB
@HWI-ST413:193:D092FACXX:1:1101:1201:1926 1:N:0:
NAGGCTTCCTTCATCTCTCCTCTACACAATCTCTTCCTAGTCTTGCTATAGCCAAATTTGTCTCCTTGCTGTTTGTGAAGAAGCCAAACATATTTCTACCT
+
#1=DADDFHHHHDGIIFCHIGGHGIDHEIJIGEIGIIIFGGHIJIJIJJJDIIEFGIIJJCHIIIIJGGHCHJJIGGIIGHGFEE>CEFECCEEEEECCC>
@HWI-ST413:193:D092FACXX:1:1101:1176:1929 1:N:0:
NCATCTCCAAGTTGCTAAAGCCTAATGAGAAAAAAAAATGGTAAATATCCATATCATCTCTTATGATGAAAAGCTATTATGTTTTCAAAACTTAACTAAAC
+
#11AB;BDDDBDBEEEBEAEEBEFIIIEEIIIIIIDIIIEIEEEEEEIEEECCEEII;7?;;ACCDDD;?D@A96(;>D>A>AD?A>A>:9AAAAAAAAA9
@HWI-ST413:193:D092FACXX:1:1101:1249:1946 1:N:0:
NAATTTAACCAACAAGGTGAAATATCTGTTATACCAAAAATTATAAAACATTGAGGAAATTGCCGATGACACAAATAAGTGGAAAGGTATCCCATGTTCAT
+
#11ADDDDHDHDHDAFA2<?EDFFF<BHHE@EFEEFEDEHHFBFFCFCGIIIFII9BHCGHGECGGIEDHCCEHDBEEEEDAC@CAB>@@CCCAAC>CD@>
@HWI-ST413:193:D092FACXX:1:1101:1227:1952 1:N:0:
NTCTGCCTTTACCTTCAAAGTCTGAGCAAATATGATTTTATATCTTTTTAATTAGAGATTCTTTTAAAGACCAAGTTACTGCAGTCCTGTCTTGTTCTTCT
+
#1=DDDFFHHGGHJJGIJGHEHHHDFHFHFHGIIFFIIIJJCHEIIJIGICHEIIFGIJJJIGGGIGCHCGIIJIHEHCHIGEHHCEHFBE>@DEECDAC@
@HWI-ST413:193:D092FACXX:1:1101:1157:1988 1:N:0:
CNAGAAGCGCTAACAATTATTTTGTATGATCAATAGAGAATTGCAACAGTTTTTGTTGTGTTGATACTCAATGACTTATGATGCTGAAAAACTAGTGAGGA
+
@#1ADDDDGFFHHJJJJJIJIIJICGIIHIGIEGCGGHHEHGIDHIEHI@FIHJIIIJGHGGIEGICHEHEEHCB;CFEF@CEECCACDCDDCDDCCCCAA
@HWI-ST413:193:D092FACXX:1:1101:1225:2000 1:N:0:
TTTGTTTACATTCTATTCGATTCCATTCCATTTGAATCAATTATATTGCAATTTATTGCATTGGAGTCCGTTCAAATGCACTCCATACCGTTCCATTCCAT
+

###########################################
After _ Illumina

@HWI-ST413:193:D092FACXX:1:1101:1180:1912 1:N:0:
NATGTACCTGACGAAGCAGCTACCATCTCAGCAGTTGCTGGTCACTGTGCAGTGGAAAAGAGAGAAGTGCATGAAGTCAGCAATTATACTTGGCCTGGAAG
+
BP\ccceegfggfg`gfhhhhdhfgfgfhhfch`bcghhhhgacggfhdghheghfbgf__efhdgh\bgfdeba^cbce`dbbdbbc`bbbb]]`_aabb
@HWI-ST413:193:D092FACXX:1:1101:1225:1915 1:N:0:
NAACAGAATAAAGATTATAATTACATTTGATTTAGTTCCAAAAACGGAGTCAAAAATCTTAACCTTTGACAAGACCTGTGTAAAGAAGCTGAGGTAAGCAT
+
BPYac``aYcecaXdYdY[^hdbed`a`[dde__b^a^[eefhh[OYH^OXafedb]W\H\aWaS\HVMHV\Q\cHHVH\_a_aaaXU]UZ`aaTZ]_aaa
@HWI-ST413:193:D092FACXX:1:1101:1201:1926 1:N:0:
NAGGCTTCCTTCATCTCTCCTCTACACAATCTCTTCCTAGTCTTGCTATAGCCAAATTTGTCTCCTTGCTGTTTGTGAAGAAGCCAAACATATTTCTACCT
+
BP\c`cceggggcfhhebghffgfhcgdhihfdhfhhheffghihihiiichhdefhhiibghhhhiffgbgiihffhhfgfedd]bdedbbdddddbbb]
@HWI-ST413:193:D092FACXX:1:1101:1176:1929 1:N:0:
NCATCTCCAAGTTGCTAAAGCCTAATGAGAAAAAAAAATGGTAAATATCCATATCATCTCTTATGATGAAAAGCTATTATGTTTTCAAAACTTAACTAAAC
+
BPP`aZacccacadddad`ddadehhhddhhhhhhchhhdhddddddhdddbbddhhZV^ZZ`bbcccZ^c_`XUGZ]c]`]`c^`]`]YX`````````X
@HWI-ST413:193:D092FACXX:1:1101:1249:1946 1:N:0:
NAATTTAACCAACAAGGTGAAATATCTGTTATACCAAAAATTATAAAACATTGAGGAAATTGCCGATGACACAAATAAGTGGAAAGGTATCCCATGTTCAT
+
BPP`ccccgcgcgc`e`Q[^dceee[aggd_deddedcdggeaeebebfhhhehhXagbfgfdbffhdcgbbdgcaddddc`b_b`a]__bbb``b]bc_]
@HWI-ST413:193:D092FACXX:1:1101:1227:1952 1:N:0:
NTCTGCCTTTACCTTCAAAGTCTGAGCAAATATGATTTTATATCTTTTTAATTAGAGATTCTTTTAAAGACCAAGTTACTGCAGTCCTGTCTTGTTCTTCT
+
BP\ccceeggffgiifhifgdgggcegegegfhheehhhiibgdhhihfhbgdhhefhiiihfffhfbgbfhhihgdgbghfdggbdgead]_cddbc`b_
@HWI-ST413:193:D092FACXX:1:1101:1157:1988 1:N:0:
CNAGAAGCGCTAACAATTATTTTGTATGATCAATAGAGAATTGCAACAGTTTTTGTTGTGTTGATACTCAATGACTTATGATGCTGAAAAACTAGTGAGGA
+
_BP`ccccfeeggiiiiihihhihbfhhghfhdfbffggdgfhcghdgh_ehgihhhifgffhdfhbgdgddgbaZbede_bddbb`bcbccbccbbbb``
@HWI-ST413:193:D092FACXX:1:1101:1225:2000 1:N:0:
TTTGTTTACATTCTATTCGATTCCATTCCATTTGAATCAATTATATTGCAATTTATTGCATTGGAGTCCGTTCAAATGCACTCCATACCGTTCCATTCCAT
+
bbbeceeegggggiiiiiiiiiiiiiihiiiiiiifiiihiifhiiiihiiiiihiiihiiiiiiiiiighhiiiiiiiiiiiihgggggeeeedceeddd
@HWI-ST413:193:D092FACXX:1:1101:1361:1913 1:N:0:
NTCACAGTCCCAGTGGGCCTTGTCTGTCACTGAGTTACAAGCCACACTCAATCCCTGGAGATGCTGAGTGCTGTTAATGGACACGTGATGCCGGCTAAACA
+
BP\accdeac`gagfafff`fdg`eadghhh]df^b[cffh`g_efdfhbcgfhffbgggefffgghfbbgggeg_db]ababdb_aabbbb__aY[G]]b
@HWI-ST413:193:D092FACXX:1:1101:1439:1915 1:N:0:
NCATGTCAACTACTTGTGATGAGTTTCTGAGTCTAGCAAAGTCCGTAAACCCTAGTATTTCTCTCCTTTTTTCCCTGCAGAAAGGATCTTGCTCTGTGGCC
+
BPYc^caccaea^ef``ba[ba[[^ecgbagd`eadc[ee_fdfdedeccede_^e^_ecaaeffedfdefede_V^^_a_cR]`a`aaa``aaa`]`[_^
@HWI-ST413:193:D092FACXX:1:1101:1383:1918 1:N:0:
NAGTGATCCTCTTAACTAATGCTTAAGCTCCAATTTCTTGCCATAGTGCTTATCACAGATTGTACTCCTAAGACTGACCTCCAGATTTATCTCCTGAAGCA
+
mathew is offline   Reply With Quote
Old 05-19-2012, 06:08 AM   #7
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by mathew View Post
Hi maubp,

Thanks for your help I ran the command it did not gave me any error. Here is a part of file before running command (before, sanger) and after running comand (after Illumina). I dont see a difference. Am I missed something or did something wrong.
I just inserted in put and out put file names. Any advice please.
That has changed the data - look at the first record for instance,
(Before, sanger)
Code:
@HWI-ST413:193:D092FACXX:1:1101:1180:1912 1:N:0:
NATGTACCTGACGAAGCAGCTACCATCTCAGCAGTTGCTGGTCACTGTGCAGTGGAAAAGAGAGAAGTGCATGAAGTCAGCAATTATACTTGGCCTGGAAG
+
#1=DDDFFHGHHGHAHGIIIIEIGHGHGIIGDIACDHIIIIHBDHHGIEHIIFHIGCHG@@FGIEHI=CHGEFCB?DCDFAECCECCDACCCC>>A@BBCC
After _ Illumina
Code:
@HWI-ST413:193:D092FACXX:1:1101:1180:1912 1:N:0:
NATGTACCTGACGAAGCAGCTACCATCTCAGCAGTTGCTGGTCACTGTGCAGTGGAAAAGAGAGAAGTGCATGAAGTCAGCAATTATACTTGGCCTGGAAG
+
BP\ccceegfggfg`gfhhhhdhfgfgfhhfch`bcghhhhgacggfhdghheghfbgf__efhdgh\bgfdeba^cbce`dbbdbbc`bbbb]]`_aabb
The fourth line which is the qualities has changed. I've not doubled checked, but it looks OK.
maubp is offline   Reply With Quote
Old 07-25-2012, 06:59 PM   #8
qqtwee
Member
 
Location: Beijing

Join Date: Feb 2011
Posts: 16
Default SRA database fastq format

Hello, I want to ask a quenstion:when I directly download FASTQ format from SRA database, it looks like this, as follows, I want to know how can I convert it to an available data to analyse it directly? I have no idea how to deal with it, can anybody help me ? Thank you!

@SRR031126.1.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:41.1 length=76
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+SRR031126.1.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:41.1 length=76
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
@SRR031126.2.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:69.1 length=76
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+SRR031126.2.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:69.1 length=76
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
@SRR031126.3.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:129.1 length=76
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+SRR031126.3.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:129.1 length=76
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
@SRR031126.4.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:154.1 length=76
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+SRR031126.4.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:154.1 length=76
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
@SRR031126.5.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:171.1 length=76
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+SRR031126.5.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:171.1 length=76
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
@SRR031126.6.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:273.1 length=76
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+SRR031126.6.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:273.1 length=76
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
@SRR031126.7.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:374.1 length=76
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+SRR031126.7.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:374.1 length=76
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
@SRR031126.8.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:404.1 length=76
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
qqtwee is offline   Reply With Quote
Old 07-26-2012, 12:26 AM   #9
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Something is very wrong with that data - all the bases are N and all the qualities are zero (the "!" is ASCII 33 means it encodes PHRED zero). Perhaps this is just an edge effect - the first and last reads on a Solexa/Illumina run are not as good as those in the middle of the slide.

How exactly did you get this data from the SRA?
maubp is offline   Reply With Quote
Old 07-26-2012, 06:32 PM   #10
qqtwee
Member
 
Location: Beijing

Join Date: Feb 2011
Posts: 16
Default

Quote:
Originally Posted by maubp View Post
Something is very wrong with that data - all the bases are N and all the qualities are zero (the "!" is ASCII 33 means it encodes PHRED zero). Perhaps this is just an edge effect - the first and last reads on a Solexa/Illumina run are not as good as those in the middle of the slide.

How exactly did you get this data from the SRA?
I download the data with the selection of fltered download,and then select FASTQ format. BTW, there are two ways that we can get FASTQ format,the one is directly download FASTQ format like that from SRA;the other one is first download .sra files, and then convert to fastq format. Do anyone know the difference of FASTQ files between the two ways ? Thank you!
qqtwee is offline   Reply With Quote
Old 07-29-2012, 07:45 PM   #11
ymc
Senior Member
 
Location: Hong Kong

Join Date: Mar 2010
Posts: 498
Default

Can it go wrong if I do

fastq-dump --split-3 --gzip SRR012345.sra

??
ymc is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:59 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO