SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting Tophats bam output back to separate paired end read fastq files bob-loblaw Bioinformatics 0 12-03-2012 05:23 AM
Bfast alignement with paired end reads in separate files david.tamborero Bioinformatics 2 11-29-2011 08:49 AM
Mira assembler: Medium sized genomes;How to use 2 separate files for paired-end reads ndeshpan Bioinformatics 3 05-23-2011 06:59 PM
Why are Illumina paired-end SRA datasets made up of 3 FASTQ files? Bio.X2Y Illumina/Solexa 9 12-21-2010 12:36 PM

Reply
 
Thread Tools
Old 03-26-2014, 07:39 AM   #1
JonB
Member
 
Location: Norway

Join Date: Jan 2010
Posts: 83
Default Illumina paired-end sra data in three separate files - what next?

Hi,

I have used fastq-dump to split paired-end illumina data. I get three files, one for each different pair and one file with barcodes. This is transcriptome data and I want to do de novo assembly. I have two questions:

First, on the SRA website where I got the data it is only mentioned one barcode while there are several different in the barcodes file. Should I only use the sequences with the barcode given on the web?

Second, how can I split the files according to the different barcodes while keeping the pairs? I looked at the fastx toolkit and the qiime split_libraries, but I don't think my illumina barcodes are inlcuded in the sequences themselves?

Examples of the files:

Code:
-bash-4.1$ head SRR343051_1.fastq 
@SRR343051.1.1 B0A05ABXX110604:3:1101:18610:1087 length=101
NTCTTCTTGCGTACGCATTTGGACTTAATCCTAATCTTGGATTTGTTTCTTCTAAATATGTACCAATCACAATGCTTGAATCTCTTATTATAATATATTTA
+SRR343051.1.1 B0A05ABXX110604:3:1101:18610:1087 length=101
#####################################################################################################
@SRR343051.2.1 B0A05ABXX110604:3:1101:14471:1088 length=101
NCGAAGGGCAATGTAATAAAGTTTATTATTATGTGTGTACAATGCAAAAAAAAGGGACTCGACTCTAATCCTGGTCGAAGCACAGGGCAAGACCACCAATG
+SRR343051.2.1 B0A05ABXX110604:3:1101:14471:1088 length=101
#####################################################################################################
@SRR343051.3.1 B0A05ABXX110604:3:1101:20187:1088 length=101
NATCATAATCTTCAATTTTCAAATTACTCTTGTTGCCTTTGGAAAGATCGTTAGTTTTCGGGTCTTTTATATTTTACTATTGCTTTATACTTGTTTTCACT

-bash-4.1$ head SRR343051_2.fastq 
@SRR343051.1.2 B0A05ABXX110604:3:1101:18610:1087 length=8
TTGAGCCT
+SRR343051.1.2 B0A05ABXX110604:3:1101:18610:1087 length=8
CCCFFFFF
@SRR343051.2.2 B0A05ABXX110604:3:1101:14471:1088 length=8
TTGAGCCT
+SRR343051.2.2 B0A05ABXX110604:3:1101:14471:1088 length=8
CCCFFFFF
@SRR343051.3.2 B0A05ABXX110604:3:1101:20187:1088 length=8
TTGAGCCT

-bash-4.1$ head SRR343051_3.fastq 
@SRR343051.1.3 B0A05ABXX110604:3:1101:18610:1087 length=101
GAGAAAATAAAATATGAGAAAATAGTAAAGAAGAAATTAACTGATATAATTACAGAAGAGAATGAATAATTGAAACAATTAAAAAATCATTAAATGAAGAT
+SRR343051.1.3 B0A05ABXX110604:3:1101:18610:1087 length=101
CCCFFFFFGHHHHJJJIJIJJIJJJHJIJJJJJJJJJJJJJJJJJJJJHIGIIIIGHHIJIJJJJJJIJJJJJEGIIJJJJGFHHFFCEEEECCDDDCCCC
@SRR343051.2.3 B0A05ABXX110604:3:1101:14471:1088 length=101
CTGATGGTGTACGTTGAACTTGGTCTGGTGGTGCTGATTCTGAGCAACAGTCTGCGTCGCGCCGCCTCCTTCTTCCTGATTCTCTCGCTGGCCGTGTCGCT
+SRR343051.2.3 B0A05ABXX110604:3:1101:14471:1088 length=101
BCCFFFFDHHHHHJJIIGIJJJJHIJJIIJJFHIJJIJJJJIIJJJJJJJJIIJJIGIJJHFFDDDBDDDDDDDDDDDCDDDDCDD<BD39??&09B?9A<
@SRR343051.3.3 B0A05ABXX110604:3:1101:20187:1088 length=101
AGGTGATTCATCATCTTCAAAATATTAATAAAAAGTATATTAATATAAAGACAATTATATATCGAAAGTGAATAGTACTGTGAAGGAAAGTAGGAAATATT
JonB is offline   Reply With Quote
Old 03-26-2014, 07:52 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,138
Default

Hopefully you have the information about barcode <--> sample.

Try this script for demultiplexing: http://qiime.org/scripts/split_libraries_fastq.html
GenoMax is offline   Reply With Quote
Old 03-26-2014, 07:57 AM   #3
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,138
Default

@Jon B: You have not used the

Quote:
-F | --origfmt Defline contains only original sequence name.
option with fastq-dump so you have the SRR* in the names. Just keep that in mind.
GenoMax is offline   Reply With Quote
Old 03-27-2014, 12:05 AM   #4
JonB
Member
 
Location: Norway

Join Date: Jan 2010
Posts: 83
Default

Quote:
Originally Posted by GenoMax View Post
@Jon B: You have not used the



option with fastq-dump so you have the SRR* in the names. Just keep that in mind.
Thanks! I didn't see that option.
JonB is offline   Reply With Quote
Old 03-27-2014, 12:07 AM   #5
JonB
Member
 
Location: Norway

Join Date: Jan 2010
Posts: 83
Default

Quote:
Originally Posted by GenoMax View Post
Hopefully you have the information about barcode <--> sample.

Try this script for demultiplexing: http://qiime.org/scripts/split_libraries_fastq.html
GenoMax, do you mind telling me how I could use this script? I was looking at it before, but I don't understand how it assigns my reads into files based on the barcodes, and how does it deal with the two read pairs? Can I still use it on my data with the pairs in separate files?

Thanks
JonB is offline   Reply With Quote
Old 03-27-2014, 04:45 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,138
Default

Jon: This appears to be a single sample even though the barcode read is included as a separate file in the SRA archive. See the corresponding ENA record (http://www.ebi.ac.uk/ena/data/view/SRR343051).

In short, demultiplexing is not needed for this sample. You can use the _1 and _3 files as the R1/R2 read pair.
GenoMax is offline   Reply With Quote
Old 03-28-2014, 11:03 AM   #7
JonB
Member
 
Location: Norway

Join Date: Jan 2010
Posts: 83
Default

Quote:
Originally Posted by GenoMax View Post
Jon: This appears to be a single sample even though the barcode read is included as a separate file in the SRA archive. See the corresponding ENA record (http://www.ebi.ac.uk/ena/data/view/SRR343051).

In short, demultiplexing is not needed for this sample. You can use the _1 and _3 files as the R1/R2 read pair.
Thank you!
JonB is offline   Reply With Quote
Reply

Tags
barcode, fastq, illumina, split fastq, sra

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:59 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO