Having some problems with my RNAseq analysis. In fact, I haven't even started yet, just converting files to the correct format for tophat. We do have a local version of tophat that can use colorspace fastq files. These are the problems I am having now:
1) I have now both csfasta and qual files for all of my data. Can I convert to csfastq in GALAXY using the "Convert SOLiD output to fastq" tool? I tried this and got:
@1_3_824
T20201110030030121130131011102223311101111120212131
+
@*@@>0*;@@@8@@@@<**/.7-7--2566-*0>.3*52*02.5-.*6-2
@1_5_4704
T03120120133300020012131133123223033031000021030102
+
@@>/;@2/;;2=@2@@@2@2@@/@<@@@?@@//-/=@22/@/6;<@@@?;
Is this color space fastq format? Ready to put into Tophat for mapping?
2) Is it better for some reason to convert xsq files DIRECTLY to csfastq using the python script? If so, I am working on that with another collaborator, but I wanted to know before I do any in depth analysis with the files I can generate now.
3) For some reason our local install of "Tophat for SOLiD Find splice junctions using RNA-seq data" does not have the reference genomes listed. I downloaded hg19 from iGenomes, but needed to convert it from gtf to fasta to use it. The program sees the file now, but I was wondering if this is the correct thing to do.
Cheers,
LTR
1) I have now both csfasta and qual files for all of my data. Can I convert to csfastq in GALAXY using the "Convert SOLiD output to fastq" tool? I tried this and got:
@1_3_824
T20201110030030121130131011102223311101111120212131
+
@*@@>0*;@@@8@@@@<**/.7-7--2566-*0>.3*52*02.5-.*6-2
@1_5_4704
T03120120133300020012131133123223033031000021030102
+
@@>/;@2/;;2=@2@@@2@2@@/@<@@@?@@//-/=@22/@/6;<@@@?;
Is this color space fastq format? Ready to put into Tophat for mapping?
2) Is it better for some reason to convert xsq files DIRECTLY to csfastq using the python script? If so, I am working on that with another collaborator, but I wanted to know before I do any in depth analysis with the files I can generate now.
3) For some reason our local install of "Tophat for SOLiD Find splice junctions using RNA-seq data" does not have the reference genomes listed. I downloaded hg19 from iGenomes, but needed to convert it from gtf to fasta to use it. The program sees the file now, but I was wondering if this is the correct thing to do.
Cheers,
LTR