Hello,
I am working with RNA-seq Illumina data files in Galaxy (http://main.g2.bx.psu.edu/). The files are 100bp paired-end reads, multiplexed with barcoding to distinguish samples. There are two files, one for each end of the paired-end reads (first three reads of the files are pasted below). The barcodes are the first four bases of the sequences in the s_7_1_sequence.txt file.
Would the following Galaxy workflow be correct?
1. Upload both s_7_1_sequence.txt and s_7_2_sequence.txt to Galaxy with the reference genome selected
2. Run NGS: QC and manipulation --> FASTQ Groomer on each file to convert to Sanger FASTQ
3. Run NGS: QC and manipulation --> FASTQ joiner to combine the data from the two files
4. Run FASTX-TOOLKIT FOR FASTQ DATA --> Barcode Splitter to generate separate FASTQ files for each barcode group
5. Run NGS: RNA Analysis --> Tophat to map the reads from each group to the reference genome
The problem I am having is that if I select paired-end for the library in Tophat, it requests two FASTQ files. Would I have to use FASTQ Splitter to separate the joined FASTQ files? I would be very appreciative of any suggestions.
Thanks very much in advance,
jjw
File 1: s_7_1_sequence.txt
@HWI-ST538_0096:7:1:1443:1917#0/1
CGTTNCAGACTCTGCAACGACAGCCTGCCCCCCGGCACCGTGAAGCTCTAGGCACGGCCTGCTCGCCGCCCGGGGACAAGGACTCCTGCCGCTGCCCCCG
+HWI-ST538_0096:7:1:1443:1917#0/1
aaa`BcccccggggggfgggfgagggggdaggggggcegedeaadaggdegeeggdebgdZccc]Z`Z^c`S__[^_`aO_Zc^cd`Y`dBBBBBBBBBB
@HWI-ST538_0096:7:1:1468:1938#0/1
ACGTNGTCTGTGATGCCCTTAGATGTCCGGGGCTGCACGCGCGCTACACTGACTGGCTCAGCGTGTGCCTACCCTACGCCGGCAGGGGCGGGGAACCCCC
+HWI-ST538_0096:7:1:1468:1938#0/1
`_`^Bb_babegggggggggceeeedeggddeggeggegeeedeeeeegddeddccVacVX\ZSXXSX_Xb_XbBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWI-ST538_0096:7:1:1484:1983#0/1
AAAGTAGCAAATACACAGCATGAGAAATCGGCATCGGATGTCACAGGGAAAGTAGCAAACACACAGCATGAGAAATCAGCATCGGTTGTCACAGAGAAAG
+HWI-ST538_0096:7:1:1484:1983#0/1
gggggfggegfggeggfgggggegggggeggefggggegg\dd^dadc]dda\dcddecZb[b`e_^]_\bbbee`TdY^_Y^BBBBBBBBBBBBBBBBB
File 2: s_7_2_sequence.txt
@HWI-ST538_0096:7:1:1443:1917#0/2
CGTTGGCAGCAGGCAGAGGTGGTGCAGTGGCAGCGGCAGGGGGCCTTGTCCCCGGGCGGCGGGCAGGCGCGGCCCCAGGCGTTACGGGGGCCGGGGGGGG
+HWI-ST538_0096:7:1:1443:1917#0/2
ggggggggggggdgegcgaahebfefebbeecfabcaX`cBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWI-ST538_0096:7:1:1468:1938#0/2
ACGTTGGGAATTCCTCGTTCATGGGGAATAATTGCAATCCCCGATCCCCATCACGAATGGGGTTCAACGGGTTACCCGCGCCTGCCGGCGTAGGGTAGGA
+HWI-ST538_0096:7:1:1468:1938#0/2
gggggggggggggggggggdgggggeedegggadbffedb[dedddcecdgaehefegfdeebcdfadecacafPbb`Lbbd_ZdUX^BBBBBBBBBBBB
@HWI-ST538_0096:7:1:1484:1983#0/2
TTTCCCCATGACATCCGATGCTGATTTCTCATGCTGTGCGTTTGCTACTTTCTCTGTGACAACCGATGCTGATTTCTCATGCTGTGTGTTTGCTACTTTC
+HWI-ST538_0096:7:1:1484:1983#0/2
gggggggggggggggggggggggbgggedggggggeagefegbgdgaeeggggfgcgfegagggggegg`aad_fee_bgdddd^cX`caQX[M[bce`c
I am working with RNA-seq Illumina data files in Galaxy (http://main.g2.bx.psu.edu/). The files are 100bp paired-end reads, multiplexed with barcoding to distinguish samples. There are two files, one for each end of the paired-end reads (first three reads of the files are pasted below). The barcodes are the first four bases of the sequences in the s_7_1_sequence.txt file.
Would the following Galaxy workflow be correct?
1. Upload both s_7_1_sequence.txt and s_7_2_sequence.txt to Galaxy with the reference genome selected
2. Run NGS: QC and manipulation --> FASTQ Groomer on each file to convert to Sanger FASTQ
3. Run NGS: QC and manipulation --> FASTQ joiner to combine the data from the two files
4. Run FASTX-TOOLKIT FOR FASTQ DATA --> Barcode Splitter to generate separate FASTQ files for each barcode group
5. Run NGS: RNA Analysis --> Tophat to map the reads from each group to the reference genome
The problem I am having is that if I select paired-end for the library in Tophat, it requests two FASTQ files. Would I have to use FASTQ Splitter to separate the joined FASTQ files? I would be very appreciative of any suggestions.
Thanks very much in advance,
jjw
File 1: s_7_1_sequence.txt
@HWI-ST538_0096:7:1:1443:1917#0/1
CGTTNCAGACTCTGCAACGACAGCCTGCCCCCCGGCACCGTGAAGCTCTAGGCACGGCCTGCTCGCCGCCCGGGGACAAGGACTCCTGCCGCTGCCCCCG
+HWI-ST538_0096:7:1:1443:1917#0/1
aaa`BcccccggggggfgggfgagggggdaggggggcegedeaadaggdegeeggdebgdZccc]Z`Z^c`S__[^_`aO_Zc^cd`Y`dBBBBBBBBBB
@HWI-ST538_0096:7:1:1468:1938#0/1
ACGTNGTCTGTGATGCCCTTAGATGTCCGGGGCTGCACGCGCGCTACACTGACTGGCTCAGCGTGTGCCTACCCTACGCCGGCAGGGGCGGGGAACCCCC
+HWI-ST538_0096:7:1:1468:1938#0/1
`_`^Bb_babegggggggggceeeedeggddeggeggegeeedeeeeegddeddccVacVX\ZSXXSX_Xb_XbBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWI-ST538_0096:7:1:1484:1983#0/1
AAAGTAGCAAATACACAGCATGAGAAATCGGCATCGGATGTCACAGGGAAAGTAGCAAACACACAGCATGAGAAATCAGCATCGGTTGTCACAGAGAAAG
+HWI-ST538_0096:7:1:1484:1983#0/1
gggggfggegfggeggfgggggegggggeggefggggegg\dd^dadc]dda\dcddecZb[b`e_^]_\bbbee`TdY^_Y^BBBBBBBBBBBBBBBBB
File 2: s_7_2_sequence.txt
@HWI-ST538_0096:7:1:1443:1917#0/2
CGTTGGCAGCAGGCAGAGGTGGTGCAGTGGCAGCGGCAGGGGGCCTTGTCCCCGGGCGGCGGGCAGGCGCGGCCCCAGGCGTTACGGGGGCCGGGGGGGG
+HWI-ST538_0096:7:1:1443:1917#0/2
ggggggggggggdgegcgaahebfefebbeecfabcaX`cBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWI-ST538_0096:7:1:1468:1938#0/2
ACGTTGGGAATTCCTCGTTCATGGGGAATAATTGCAATCCCCGATCCCCATCACGAATGGGGTTCAACGGGTTACCCGCGCCTGCCGGCGTAGGGTAGGA
+HWI-ST538_0096:7:1:1468:1938#0/2
gggggggggggggggggggdgggggeedegggadbffedb[dedddcecdgaehefegfdeebcdfadecacafPbb`Lbbd_ZdUX^BBBBBBBBBBBB
@HWI-ST538_0096:7:1:1484:1983#0/2
TTTCCCCATGACATCCGATGCTGATTTCTCATGCTGTGCGTTTGCTACTTTCTCTGTGACAACCGATGCTGATTTCTCATGCTGTGTGTTTGCTACTTTC
+HWI-ST538_0096:7:1:1484:1983#0/2
gggggggggggggggggggggggbgggedggggggeagefegbgdgaeeggggfgcgfegagggggegg`aad_fee_bgdddd^cX`caQX[M[bce`c