View Single Post
Old 04-19-2011, 01:58 PM   #1
jjw14
Member
 
Location: Missouri

Join Date: Apr 2010
Posts: 39
Question RNA-seq Galaxy workflow for PE barcoded samples?

Hello,

I am working with RNA-seq Illumina data files in Galaxy (http://main.g2.bx.psu.edu/). The files are 100bp paired-end reads, multiplexed with barcoding to distinguish samples. There are two files, one for each end of the paired-end reads (first three reads of the files are pasted below). The barcodes are the first four bases of the sequences in the s_7_1_sequence.txt file.

Would the following Galaxy workflow be correct?

1. Upload both s_7_1_sequence.txt and s_7_2_sequence.txt to Galaxy with the reference genome selected
2. Run NGS: QC and manipulation --> FASTQ Groomer on each file to convert to Sanger FASTQ
3. Run NGS: QC and manipulation --> FASTQ joiner to combine the data from the two files
4. Run FASTX-TOOLKIT FOR FASTQ DATA --> Barcode Splitter to generate separate FASTQ files for each barcode group
5. Run NGS: RNA Analysis --> Tophat to map the reads from each group to the reference genome

The problem I am having is that if I select paired-end for the library in Tophat, it requests two FASTQ files. Would I have to use FASTQ Splitter to separate the joined FASTQ files? I would be very appreciative of any suggestions.

Thanks very much in advance,
jjw

File 1: s_7_1_sequence.txt

@HWI-ST538_0096:7:1:1443:1917#0/1
CGTTNCAGACTCTGCAACGACAGCCTGCCCCCCGGCACCGTGAAGCTCTAGGCACGGCCTGCTCGCCGCCCGGGGACAAGGACTCCTGCCGCTGCCCCCG
+HWI-ST538_0096:7:1:1443:1917#0/1
aaa`BcccccggggggfgggfgagggggdaggggggcegedeaadaggdegeeggdebgdZccc]Z`Z^c`S__[^_`aO_Zc^cd`Y`dBBBBBBBBBB
@HWI-ST538_0096:7:1:1468:1938#0/1
ACGTNGTCTGTGATGCCCTTAGATGTCCGGGGCTGCACGCGCGCTACACTGACTGGCTCAGCGTGTGCCTACCCTACGCCGGCAGGGGCGGGGAACCCCC
+HWI-ST538_0096:7:1:1468:1938#0/1
`_`^Bb_babegggggggggceeeedeggddeggeggegeeedeeeeegddeddccVacVX\ZSXXSX_Xb_XbBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWI-ST538_0096:7:1:1484:1983#0/1
AAAGTAGCAAATACACAGCATGAGAAATCGGCATCGGATGTCACAGGGAAAGTAGCAAACACACAGCATGAGAAATCAGCATCGGTTGTCACAGAGAAAG
+HWI-ST538_0096:7:1:1484:1983#0/1
gggggfggegfggeggfgggggegggggeggefggggegg\dd^dadc]dda\dcddecZb[b`e_^]_\bbbee`TdY^_Y^BBBBBBBBBBBBBBBBB

File 2: s_7_2_sequence.txt

@HWI-ST538_0096:7:1:1443:1917#0/2
CGTTGGCAGCAGGCAGAGGTGGTGCAGTGGCAGCGGCAGGGGGCCTTGTCCCCGGGCGGCGGGCAGGCGCGGCCCCAGGCGTTACGGGGGCCGGGGGGGG
+HWI-ST538_0096:7:1:1443:1917#0/2
ggggggggggggdgegcgaahebfefebbeecfabcaX`cBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWI-ST538_0096:7:1:1468:1938#0/2
ACGTTGGGAATTCCTCGTTCATGGGGAATAATTGCAATCCCCGATCCCCATCACGAATGGGGTTCAACGGGTTACCCGCGCCTGCCGGCGTAGGGTAGGA
+HWI-ST538_0096:7:1:1468:1938#0/2
gggggggggggggggggggdgggggeedegggadbffedb[dedddcecdgaehefegfdeebcdfadecacafPbb`Lbbd_ZdUX^BBBBBBBBBBBB
@HWI-ST538_0096:7:1:1484:1983#0/2
TTTCCCCATGACATCCGATGCTGATTTCTCATGCTGTGCGTTTGCTACTTTCTCTGTGACAACCGATGCTGATTTCTCATGCTGTGTGTTTGCTACTTTC
+HWI-ST538_0096:7:1:1484:1983#0/2
gggggggggggggggggggggggbgggedggggggeagefegbgdgaeeggggfgcgfegagggggegg`aad_fee_bgdddd^cX`caQX[M[bce`c
jjw14 is offline   Reply With Quote