Hello everyone,
I've run into some ambiguity in the TopHat manual and was looking to see if anyone could help with my question.
I wanted to align my sequenced reads against the genome with a known gtf file so that reads align to junctions but without tophat identifying novel junctions.
I originally did this by running the following params:
tophat2 \
--no-novel-juncs \
-G ensembl.gtf \
--solexa1.3-quals \
--library-type fr-unstranded \
-p 8 \
-r 50 \
--mate-std-dev 100 \
-o ./s_1_tophat_out \
s_1_1_sequence.txt \
s_1_2_sequence.txt
I used --no-novel-juncs with a supplied gtf file with -G to align my reads to the genome and get back a junctions.bed file with the number of reads that overlap the known junctions found by tophat, etc. That makes sense.
Here is the log of that run:
I then noticed that there is also a -T function.
I then wasn't sure if this was what I wanted or if I wanted the --no-novel-juncs function. I also wasn't sure how --no-novel-juncs and -T interacted together, so I ran the following two runs with and with --no-novel-juncs:
tophat2 \
--no-novel-juncs \ # In one run this was included in the other it was not
-G ensembl.gtf \
-T
--solexa1.3-quals \
--library-type fr-unstranded \
-p 8 \
-r 50 \
--mate-std-dev 100 \
-o ./s_1_tophat_out \
s_1_1_sequence.txt \
s_1_2_sequence.txt
It appears like whether or not I included --no-novel-juncs that I got a similar size output (the final bam files were only different by about 10 bytes). Though, interestingly, I did not get any returned junctions.bed file or indel bed files. Here is the log file:
EDIT: I did actually get the junction and indel files back, I just had an error in where I thought they were being output, so my bad on that statement above. I'm still going to look into what the differences might be.
It looks like it only maps to the known transcriptome (what I want), but doesn't do the longer set of steps that returns the junctions.bed file (something I want, but am not getting with -T option). Any idea what the differences are here and why one returns a bed file of junctions but the other does not?
EDIT: Again, my bad, I did actually get those files back, I just had an error in where I thought they were being output. I'm still going to look into what the differences might be between the -T and --no-novel-juncs runs in output.
I've run into some ambiguity in the TopHat manual and was looking to see if anyone could help with my question.
I wanted to align my sequenced reads against the genome with a known gtf file so that reads align to junctions but without tophat identifying novel junctions.
I originally did this by running the following params:
tophat2 \
--no-novel-juncs \
-G ensembl.gtf \
--solexa1.3-quals \
--library-type fr-unstranded \
-p 8 \
-r 50 \
--mate-std-dev 100 \
-o ./s_1_tophat_out \
s_1_1_sequence.txt \
s_1_2_sequence.txt
I used --no-novel-juncs with a supplied gtf file with -G to align my reads to the genome and get back a junctions.bed file with the number of reads that overlap the known junctions found by tophat, etc. That makes sense.
--no-novel-juncs
Only look for reads across junctions indicated in the supplied GFF or junctions file. (ignored without -G/-j)
Only look for reads across junctions indicated in the supplied GFF or junctions file. (ignored without -G/-j)
[2012-10-24 00:19:16] Beginning TopHat run (v2.0.4)
-----------------------------------------------
[2012-10-24 00:19:16] Checking for Bowtie
Bowtie version: 2.0.0.7
[2012-10-24 00:19:16] Checking for Samtools
Samtools version: 0.1.18.0
[2012-10-24 00:19:16] Checking for Bowtie index files
[2012-10-24 00:19:16] Checking for reference FASTA file
[2012-10-24 00:19:16] Generating SAM header for /data/share/pulm/Deep_Seq/TopHat2/hg19
format: fastq
quality scale: phred64 (reads generated with GA pipeline version >= 1.3)
[2012-10-24 00:19:56] Reading known junctions from GTF file
[2012-10-24 00:20:22] Preparing reads
left reads: min. length=75, max. length=75, 27052490 kept reads (12433 discarded)
right reads: min. length=75, max. length=75, 27041987 kept reads (22936 discarded)
[2012-10-24 00:45:48] Creating transcriptome data files..
[2012-10-24 00:47:25] Building Bowtie index from Homo_sapiens.GRCh37.64.mod.fa
[2012-10-24 01:27:40] Mapping left_kept_reads to transcriptome Homo_sapiens.GRCh37.64.mod with Bowtie2
[2012-10-24 03:38:37] Mapping right_kept_reads to transcriptome Homo_sapiens.GRCh37.64.mod with Bowtie2
[2012-10-24 05:20:48] Resuming TopHat pipeline with unmapped reads
[2012-10-24 05:20:49] Mapping left_kept_reads.m2g_um to genome hg19 with Bowtie2
[2012-10-24 05:59:56] Mapping left_kept_reads.m2g_um_seg1 to genome hg19 with Bowtie2 (1/3)
[2012-10-24 06:06:18] Mapping left_kept_reads.m2g_um_seg2 to genome hg19 with Bowtie2 (2/3)
[2012-10-24 06:14:03] Mapping left_kept_reads.m2g_um_seg3 to genome hg19 with Bowtie2 (3/3)
[2012-10-24 06:20:19] Mapping right_kept_reads.m2g_um to genome hg19 with Bowtie2
[2012-10-24 07:17:53] Mapping right_kept_reads.m2g_um_seg1 to genome hg19 with Bowtie2 (1/3)
[2012-10-24 07:27:27] Mapping right_kept_reads.m2g_um_seg2 to genome hg19 with Bowtie2 (2/3)
[2012-10-24 07:37:15] Mapping right_kept_reads.m2g_um_seg3 to genome hg19 with Bowtie2 (3/3)
[2012-10-24 07:44:00] Retrieving sequences for splices
[2012-10-24 07:47:26] Indexing splices
[2012-10-24 07:48:27] Mapping left_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/3)
[2012-10-24 07:50:03] Mapping left_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/3)
[2012-10-24 07:51:58] Mapping left_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/3)
[2012-10-24 07:53:29] Joining segment hits
[2012-10-24 08:28:49] Mapping right_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/3)
[2012-10-24 08:30:46] Mapping right_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/3)
[2012-10-24 08:32:47] Mapping right_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/3)
[2012-10-24 08:34:46] Joining segment hits
[2012-10-24 09:08:06] Reporting output tracks
Warning: couldn't remove all temporary files in ./s_1_tophat_out/tmp/
-----------------------------------------------
[2012-10-24 10:42:54] Run complete: 10:23:38 elapsed
-----------------------------------------------
[2012-10-24 00:19:16] Checking for Bowtie
Bowtie version: 2.0.0.7
[2012-10-24 00:19:16] Checking for Samtools
Samtools version: 0.1.18.0
[2012-10-24 00:19:16] Checking for Bowtie index files
[2012-10-24 00:19:16] Checking for reference FASTA file
[2012-10-24 00:19:16] Generating SAM header for /data/share/pulm/Deep_Seq/TopHat2/hg19
format: fastq
quality scale: phred64 (reads generated with GA pipeline version >= 1.3)
[2012-10-24 00:19:56] Reading known junctions from GTF file
[2012-10-24 00:20:22] Preparing reads
left reads: min. length=75, max. length=75, 27052490 kept reads (12433 discarded)
right reads: min. length=75, max. length=75, 27041987 kept reads (22936 discarded)
[2012-10-24 00:45:48] Creating transcriptome data files..
[2012-10-24 00:47:25] Building Bowtie index from Homo_sapiens.GRCh37.64.mod.fa
[2012-10-24 01:27:40] Mapping left_kept_reads to transcriptome Homo_sapiens.GRCh37.64.mod with Bowtie2
[2012-10-24 03:38:37] Mapping right_kept_reads to transcriptome Homo_sapiens.GRCh37.64.mod with Bowtie2
[2012-10-24 05:20:48] Resuming TopHat pipeline with unmapped reads
[2012-10-24 05:20:49] Mapping left_kept_reads.m2g_um to genome hg19 with Bowtie2
[2012-10-24 05:59:56] Mapping left_kept_reads.m2g_um_seg1 to genome hg19 with Bowtie2 (1/3)
[2012-10-24 06:06:18] Mapping left_kept_reads.m2g_um_seg2 to genome hg19 with Bowtie2 (2/3)
[2012-10-24 06:14:03] Mapping left_kept_reads.m2g_um_seg3 to genome hg19 with Bowtie2 (3/3)
[2012-10-24 06:20:19] Mapping right_kept_reads.m2g_um to genome hg19 with Bowtie2
[2012-10-24 07:17:53] Mapping right_kept_reads.m2g_um_seg1 to genome hg19 with Bowtie2 (1/3)
[2012-10-24 07:27:27] Mapping right_kept_reads.m2g_um_seg2 to genome hg19 with Bowtie2 (2/3)
[2012-10-24 07:37:15] Mapping right_kept_reads.m2g_um_seg3 to genome hg19 with Bowtie2 (3/3)
[2012-10-24 07:44:00] Retrieving sequences for splices
[2012-10-24 07:47:26] Indexing splices
[2012-10-24 07:48:27] Mapping left_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/3)
[2012-10-24 07:50:03] Mapping left_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/3)
[2012-10-24 07:51:58] Mapping left_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/3)
[2012-10-24 07:53:29] Joining segment hits
[2012-10-24 08:28:49] Mapping right_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/3)
[2012-10-24 08:30:46] Mapping right_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/3)
[2012-10-24 08:32:47] Mapping right_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/3)
[2012-10-24 08:34:46] Joining segment hits
[2012-10-24 09:08:06] Reporting output tracks
Warning: couldn't remove all temporary files in ./s_1_tophat_out/tmp/
-----------------------------------------------
[2012-10-24 10:42:54] Run complete: 10:23:38 elapsed
I then noticed that there is also a -T function.
-T/--transcriptome-only
Only align the reads to the transcriptome and report only those mappings as genomic mappings.
Only align the reads to the transcriptome and report only those mappings as genomic mappings.
tophat2 \
--no-novel-juncs \ # In one run this was included in the other it was not
-G ensembl.gtf \
-T
--solexa1.3-quals \
--library-type fr-unstranded \
-p 8 \
-r 50 \
--mate-std-dev 100 \
-o ./s_1_tophat_out \
s_1_1_sequence.txt \
s_1_2_sequence.txt
It appears like whether or not I included --no-novel-juncs that I got a similar size output (the final bam files were only different by about 10 bytes). Though, interestingly, I did not get any returned junctions.bed file or indel bed files. Here is the log file:
EDIT: I did actually get the junction and indel files back, I just had an error in where I thought they were being output, so my bad on that statement above. I'm still going to look into what the differences might be.
[2013-01-03 12:17:14] Beginning TopHat run (v2.0.6)
-----------------------------------------------
[2013-01-03 12:17:14] Checking for Bowtie
Bowtie version: 2.0.0.7
[2013-01-03 12:17:14] Checking for Samtools
Samtools version: 0.1.18.0
[2013-01-03 12:17:14] Checking for Bowtie index files
[2013-01-03 12:17:14] Checking for reference FASTA file
[2013-01-03 12:17:14] Generating SAM header for /data/share/pulm/Deep_Seq/TopHat2/hg19
format: fastq
quality scale: phred64 (reads generated with GA pipeline version >= 1.3)
[2013-01-03 12:18:20] Reading known junctions from GTF file
[2013-01-03 12:18:39] Preparing reads
left reads: min. length=75, max. length=75, 27052490 kept reads (12433 discarded)
right reads: min. length=75, max. length=75, 27041987 kept reads (22936 discarded)
[2013-01-03 12:38:54] Creating transcriptome data files..
[2013-01-03 12:40:32] Building Bowtie index from Homo_sapiens.GRCh37.64.mod.fa
[2013-01-03 13:08:25] Mapping left_kept_reads to transcriptome Homo_sapiens.GRCh37.64.mod with Bowtie2
[2013-01-03 13:39:57] Mapping right_kept_reads to transcriptome Homo_sapiens.GRCh37.64.mod with Bowtie2
[2013-01-03 14:11:03] Reporting output tracks
-----------------------------------------------
[2013-01-03 14:35:47] Run complete: 02:18:33 elapsed
-----------------------------------------------
[2013-01-03 12:17:14] Checking for Bowtie
Bowtie version: 2.0.0.7
[2013-01-03 12:17:14] Checking for Samtools
Samtools version: 0.1.18.0
[2013-01-03 12:17:14] Checking for Bowtie index files
[2013-01-03 12:17:14] Checking for reference FASTA file
[2013-01-03 12:17:14] Generating SAM header for /data/share/pulm/Deep_Seq/TopHat2/hg19
format: fastq
quality scale: phred64 (reads generated with GA pipeline version >= 1.3)
[2013-01-03 12:18:20] Reading known junctions from GTF file
[2013-01-03 12:18:39] Preparing reads
left reads: min. length=75, max. length=75, 27052490 kept reads (12433 discarded)
right reads: min. length=75, max. length=75, 27041987 kept reads (22936 discarded)
[2013-01-03 12:38:54] Creating transcriptome data files..
[2013-01-03 12:40:32] Building Bowtie index from Homo_sapiens.GRCh37.64.mod.fa
[2013-01-03 13:08:25] Mapping left_kept_reads to transcriptome Homo_sapiens.GRCh37.64.mod with Bowtie2
[2013-01-03 13:39:57] Mapping right_kept_reads to transcriptome Homo_sapiens.GRCh37.64.mod with Bowtie2
[2013-01-03 14:11:03] Reporting output tracks
-----------------------------------------------
[2013-01-03 14:35:47] Run complete: 02:18:33 elapsed
EDIT: Again, my bad, I did actually get those files back, I just had an error in where I thought they were being output. I'm still going to look into what the differences might be between the -T and --no-novel-juncs runs in output.
Comment