Hi,
I have used fastq-dump to split paired-end illumina data. I get three files, one for each different pair and one file with barcodes. This is transcriptome data and I want to do de novo assembly. I have two questions:
First, on the SRA website where I got the data it is only mentioned one barcode while there are several different in the barcodes file. Should I only use the sequences with the barcode given on the web?
Second, how can I split the files according to the different barcodes while keeping the pairs? I looked at the fastx toolkit and the qiime split_libraries, but I don't think my illumina barcodes are inlcuded in the sequences themselves?
Examples of the files:
I have used fastq-dump to split paired-end illumina data. I get three files, one for each different pair and one file with barcodes. This is transcriptome data and I want to do de novo assembly. I have two questions:
First, on the SRA website where I got the data it is only mentioned one barcode while there are several different in the barcodes file. Should I only use the sequences with the barcode given on the web?
Second, how can I split the files according to the different barcodes while keeping the pairs? I looked at the fastx toolkit and the qiime split_libraries, but I don't think my illumina barcodes are inlcuded in the sequences themselves?
Examples of the files:
Code:
-bash-4.1$ head SRR343051_1.fastq @SRR343051.1.1 B0A05ABXX110604:3:1101:18610:1087 length=101 NTCTTCTTGCGTACGCATTTGGACTTAATCCTAATCTTGGATTTGTTTCTTCTAAATATGTACCAATCACAATGCTTGAATCTCTTATTATAATATATTTA +SRR343051.1.1 B0A05ABXX110604:3:1101:18610:1087 length=101 ##################################################################################################### @SRR343051.2.1 B0A05ABXX110604:3:1101:14471:1088 length=101 NCGAAGGGCAATGTAATAAAGTTTATTATTATGTGTGTACAATGCAAAAAAAAGGGACTCGACTCTAATCCTGGTCGAAGCACAGGGCAAGACCACCAATG +SRR343051.2.1 B0A05ABXX110604:3:1101:14471:1088 length=101 ##################################################################################################### @SRR343051.3.1 B0A05ABXX110604:3:1101:20187:1088 length=101 NATCATAATCTTCAATTTTCAAATTACTCTTGTTGCCTTTGGAAAGATCGTTAGTTTTCGGGTCTTTTATATTTTACTATTGCTTTATACTTGTTTTCACT -bash-4.1$ head SRR343051_2.fastq @SRR343051.1.2 B0A05ABXX110604:3:1101:18610:1087 length=8 TTGAGCCT +SRR343051.1.2 B0A05ABXX110604:3:1101:18610:1087 length=8 CCCFFFFF @SRR343051.2.2 B0A05ABXX110604:3:1101:14471:1088 length=8 TTGAGCCT +SRR343051.2.2 B0A05ABXX110604:3:1101:14471:1088 length=8 CCCFFFFF @SRR343051.3.2 B0A05ABXX110604:3:1101:20187:1088 length=8 TTGAGCCT -bash-4.1$ head SRR343051_3.fastq @SRR343051.1.3 B0A05ABXX110604:3:1101:18610:1087 length=101 GAGAAAATAAAATATGAGAAAATAGTAAAGAAGAAATTAACTGATATAATTACAGAAGAGAATGAATAATTGAAACAATTAAAAAATCATTAAATGAAGAT +SRR343051.1.3 B0A05ABXX110604:3:1101:18610:1087 length=101 CCCFFFFFGHHHHJJJIJIJJIJJJHJIJJJJJJJJJJJJJJJJJJJJHIGIIIIGHHIJIJJJJJJIJJJJJEGIIJJJJGFHHFFCEEEECCDDDCCCC @SRR343051.2.3 B0A05ABXX110604:3:1101:14471:1088 length=101 CTGATGGTGTACGTTGAACTTGGTCTGGTGGTGCTGATTCTGAGCAACAGTCTGCGTCGCGCCGCCTCCTTCTTCCTGATTCTCTCGCTGGCCGTGTCGCT +SRR343051.2.3 B0A05ABXX110604:3:1101:14471:1088 length=101 BCCFFFFDHHHHHJJIIGIJJJJHIJJIIJJFHIJJIJJJJIIJJJJJJJJIIJJIGIJJHFFDDDBDDDDDDDDDDDCDDDDCDD<BD39??&09B?9A< @SRR343051.3.3 B0A05ABXX110604:3:1101:20187:1088 length=101 AGGTGATTCATCATCTTCAAAATATTAATAAAAAGTATATTAATATAAAGACAATTATATATCGAAAGTGAATAGTACTGTGAAGGAAAGTAGGAAATATT
Comment