Dear all,
I am trying to map reads from an Illumina RNA-seq run to the human genome.
I am using the command:
~/bin/tophat-1.4.1.Linux_x86_64/tophat --solexa1.3-quals --library-type fr-unstranded --bowtie-n -g 1 -N 2 -n 2 -G ~/data/GTF/ --transcriptome-index ~/data/GTF/transcriptome ~/data/indexes/genome data.fastq.gz
Basically I am mapping to the transcriptome first, allowing up to 2 mismatches, but not putting any limit on the number of hits (-x option). This is because the GTF can contain several transcripts from the same gene that have overlapping coordinates. But a read mapping to many transcript of the same gene should have only one mapping location on the genome coordinates, this is why I used the option "-g 1" (I don't want reads mapping to many genes in the transcriptome).
However I am not sure what tophat is exactly doing... Is it actually doing what I described? I get the impression that it might be aligning reads non uniquely to the transcriptome, without considering the -g option, and later only use the -g restriction when mapping the fragments of unmapped reads to discover new junctions.
Do you have any experience with this mode?
Thanks a lot for your feedback
Julien
I am trying to map reads from an Illumina RNA-seq run to the human genome.
I am using the command:
~/bin/tophat-1.4.1.Linux_x86_64/tophat --solexa1.3-quals --library-type fr-unstranded --bowtie-n -g 1 -N 2 -n 2 -G ~/data/GTF/ --transcriptome-index ~/data/GTF/transcriptome ~/data/indexes/genome data.fastq.gz
Basically I am mapping to the transcriptome first, allowing up to 2 mismatches, but not putting any limit on the number of hits (-x option). This is because the GTF can contain several transcripts from the same gene that have overlapping coordinates. But a read mapping to many transcript of the same gene should have only one mapping location on the genome coordinates, this is why I used the option "-g 1" (I don't want reads mapping to many genes in the transcriptome).
However I am not sure what tophat is exactly doing... Is it actually doing what I described? I get the impression that it might be aligning reads non uniquely to the transcriptome, without considering the -g option, and later only use the -g restriction when mapping the fragments of unmapped reads to discover new junctions.
Do you have any experience with this mode?
Thanks a lot for your feedback
Julien
Comment