SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   RNA-seq Galaxy workflow for PE barcoded samples? (http://seqanswers.com/forums/showthread.php?t=10860)

jjw14 04-19-2011 01:58 PM

RNA-seq Galaxy workflow for PE barcoded samples?
 
Hello,

I am working with RNA-seq Illumina data files in Galaxy (http://main.g2.bx.psu.edu/). The files are 100bp paired-end reads, multiplexed with barcoding to distinguish samples. There are two files, one for each end of the paired-end reads (first three reads of the files are pasted below). The barcodes are the first four bases of the sequences in the s_7_1_sequence.txt file.

Would the following Galaxy workflow be correct?

1. Upload both s_7_1_sequence.txt and s_7_2_sequence.txt to Galaxy with the reference genome selected
2. Run NGS: QC and manipulation --> FASTQ Groomer on each file to convert to Sanger FASTQ
3. Run NGS: QC and manipulation --> FASTQ joiner to combine the data from the two files
4. Run FASTX-TOOLKIT FOR FASTQ DATA --> Barcode Splitter to generate separate FASTQ files for each barcode group
5. Run NGS: RNA Analysis --> Tophat to map the reads from each group to the reference genome

The problem I am having is that if I select paired-end for the library in Tophat, it requests two FASTQ files. Would I have to use FASTQ Splitter to separate the joined FASTQ files? I would be very appreciative of any suggestions.

Thanks very much in advance,
jjw

File 1: s_7_1_sequence.txt

@HWI-ST538_0096:7:1:1443:1917#0/1
CGTTNCAGACTCTGCAACGACAGCCTGCCCCCCGGCACCGTGAAGCTCTAGGCACGGCCTGCTCGCCGCCCGGGGACAAGGACTCCTGCCGCTGCCCCCG
+HWI-ST538_0096:7:1:1443:1917#0/1
aaa`BcccccggggggfgggfgagggggdaggggggcegedeaadaggdegeeggdebgdZccc]Z`Z^c`S__[^_`aO_Zc^cd`Y`dBBBBBBBBBB
@HWI-ST538_0096:7:1:1468:1938#0/1
ACGTNGTCTGTGATGCCCTTAGATGTCCGGGGCTGCACGCGCGCTACACTGACTGGCTCAGCGTGTGCCTACCCTACGCCGGCAGGGGCGGGGAACCCCC
+HWI-ST538_0096:7:1:1468:1938#0/1
`_`^Bb_babegggggggggceeeedeggddeggeggegeeedeeeeegddeddccVacVX\ZSXXSX_Xb_XbBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWI-ST538_0096:7:1:1484:1983#0/1
AAAGTAGCAAATACACAGCATGAGAAATCGGCATCGGATGTCACAGGGAAAGTAGCAAACACACAGCATGAGAAATCAGCATCGGTTGTCACAGAGAAAG
+HWI-ST538_0096:7:1:1484:1983#0/1
gggggfggegfggeggfgggggegggggeggefggggegg\dd^dadc]dda\dcddecZb[b`e_^]_\bbbee`TdY^_Y^BBBBBBBBBBBBBBBBB

File 2: s_7_2_sequence.txt

@HWI-ST538_0096:7:1:1443:1917#0/2
CGTTGGCAGCAGGCAGAGGTGGTGCAGTGGCAGCGGCAGGGGGCCTTGTCCCCGGGCGGCGGGCAGGCGCGGCCCCAGGCGTTACGGGGGCCGGGGGGGG
+HWI-ST538_0096:7:1:1443:1917#0/2
ggggggggggggdgegcgaahebfefebbeecfabcaX`cBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWI-ST538_0096:7:1:1468:1938#0/2
ACGTTGGGAATTCCTCGTTCATGGGGAATAATTGCAATCCCCGATCCCCATCACGAATGGGGTTCAACGGGTTACCCGCGCCTGCCGGCGTAGGGTAGGA
+HWI-ST538_0096:7:1:1468:1938#0/2
gggggggggggggggggggdgggggeedegggadbffedb[dedddcecdgaehefegfdeebcdfadecacafPbb`Lbbd_ZdUX^BBBBBBBBBBBB
@HWI-ST538_0096:7:1:1484:1983#0/2
TTTCCCCATGACATCCGATGCTGATTTCTCATGCTGTGCGTTTGCTACTTTCTCTGTGACAACCGATGCTGATTTCTCATGCTGTGTGTTTGCTACTTTC
+HWI-ST538_0096:7:1:1484:1983#0/2
gggggggggggggggggggggggbgggedggggggeagefegbgdgaeeggggfgcgfegagggggegg`aad_fee_bgdddd^cX`caQX[M[bce`c


All times are GMT -8. The time now is 07:26 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.