Hi, this is my first post, so I hope I don't miss any details.
I'm trying to align RNA sequenced from the subcellular nucleoplasm fraction using TopHat2. The command I'm using is something like this:
I used -r 3000 because previously I had to align chromatin fraction RNA and that was the only way to get the reads properly paired.
After aligning, my align_summary.txt file looks like this:
Left reads:
Input: 23788444
Mapped: 22989305 (96.6% of input)
of these: 148191 ( 0.6%) have multiple alignments (148191 have >1)
Right reads:
Input: 23788444
Mapped: 22863972 (96.1% of input)
of these: 151161 ( 0.7%) have multiple alignments (151161 have >1)
96.4% overall read alignment rate.
Aligned pairs: 22400527
of these: 4825 ( 0.0%) have multiple alignments
and: 8473407 (37.8%) are discordant alignments
58.5% concordant pair alignment rate.
Because of the high % of discordant alignments, I used samtools flagstat to check for any issues:
46399134 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
46399134 + 0 mapped (100.00%:-nan%)
46399134 + 0 paired in sequencing
23308173 + 0 read1
23090961 + 0 read2
19935278 + 0 properly paired (42.96%:-nan%)
45487286 + 0 with itself and mate mapped
911848 + 0 singletons (1.97%:-nan%)
11390956 + 0 with mate mapped to a different chr
11390956 + 0 with mate mapped to a different chr (mapQ>=5)
So it seems I get many read pairs with reads in different chromosomes.
I have tried varying some paremeters (library type, -r) but had no success.
I should also mention that this happens for 2 different experiments (with 2 different cell types), in a total of 4 datasets (one fore one, three for the other).
I also looked at other (rare) cases where subcellular fraction RNA was aligned, but they didn't report anything like this.
Thanks in advance to anyone who can help
I'm trying to align RNA sequenced from the subcellular nucleoplasm fraction using TopHat2. The command I'm using is something like this:
Code:
tophat2 --num-threads 5 -g 1 --library-type fr-secondstrand -a 5 -r 3000 -o ./Nucleop_untreated_rep1_topHat /GenoStorage/Genomas/hg19/Genome_indexFiles/Bowtie2/hg19 NucleopRNA_1_1.fastq,NucleopRNA_2_1.fastq NucleopRNA_1_2.fastq,NucleopRNA_2_2.fastq
After aligning, my align_summary.txt file looks like this:
Left reads:
Input: 23788444
Mapped: 22989305 (96.6% of input)
of these: 148191 ( 0.6%) have multiple alignments (148191 have >1)
Right reads:
Input: 23788444
Mapped: 22863972 (96.1% of input)
of these: 151161 ( 0.7%) have multiple alignments (151161 have >1)
96.4% overall read alignment rate.
Aligned pairs: 22400527
of these: 4825 ( 0.0%) have multiple alignments
and: 8473407 (37.8%) are discordant alignments
58.5% concordant pair alignment rate.
Because of the high % of discordant alignments, I used samtools flagstat to check for any issues:
46399134 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
46399134 + 0 mapped (100.00%:-nan%)
46399134 + 0 paired in sequencing
23308173 + 0 read1
23090961 + 0 read2
19935278 + 0 properly paired (42.96%:-nan%)
45487286 + 0 with itself and mate mapped
911848 + 0 singletons (1.97%:-nan%)
11390956 + 0 with mate mapped to a different chr
11390956 + 0 with mate mapped to a different chr (mapQ>=5)
So it seems I get many read pairs with reads in different chromosomes.
I have tried varying some paremeters (library type, -r) but had no success.
I should also mention that this happens for 2 different experiments (with 2 different cell types), in a total of 4 datasets (one fore one, three for the other).
I also looked at other (rare) cases where subcellular fraction RNA was aligned, but they didn't report anything like this.
Thanks in advance to anyone who can help