I've set up a galaxy workflow for paired end first stranded RNAseq, and I've gotten some odd summary results from Tophat2 alignment. At least I think they're odd as I'm new to this.
Here's the flagstat output
For the number of reads mapped, the concordant pairs seem extremely low. I'm wondering if I missed a parameter in Tophat or Bowtie? Notably, I have not set a read group identifier in Bowtie (necessary?), nor could I figure out how from the Bowtie documentation. I also wonder if something could be awry with my fastq files, as they have been concatenated from a split dataset. Here are the first couple reads from the foreward and reverse data respectively.
Thanks in advance for any help!
-Jeremy
Left reads:
Input : 218685181
Mapped : 193500858 (88.5% of input)
of these: 14727362 ( 7.6%) have multiple alignments (40016 have >20)
Right reads:
Input : 218685181
Mapped : 196263585 (89.7% of input)
of these: 14724480 ( 7.5%) have multiple alignments (40380 have >20)
Unpaired reads:
Input : 5950944
Mapped : 5300035 (89.1% of input)
of these: 227937 ( 4.3%) have multiple alignments (142 have >20)
89.1% overall read mapping rate.
Aligned pairs: 173668750
of these: 13863688 ( 8.0%) have multiple alignments
170432898 (98.1%) are discordant alignments
1.5% concordant pair alignment rate.
Input : 218685181
Mapped : 193500858 (88.5% of input)
of these: 14727362 ( 7.6%) have multiple alignments (40016 have >20)
Right reads:
Input : 218685181
Mapped : 196263585 (89.7% of input)
of these: 14724480 ( 7.5%) have multiple alignments (40380 have >20)
Unpaired reads:
Input : 5950944
Mapped : 5300035 (89.1% of input)
of these: 227937 ( 4.3%) have multiple alignments (142 have >20)
89.1% overall read mapping rate.
Aligned pairs: 173668750
of these: 13863688 ( 8.0%) have multiple alignments
170432898 (98.1%) are discordant alignments
1.5% concordant pair alignment rate.
490744296 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
490744296 + 0 mapped (100.00%:-nan%)
486148534 + 0 paired in sequencing
241299292 + 0 read1
244849242 + 0 read2
523372 + 0 properly paired (0.11%:-nan%)
443477134 + 0 with itself and mate mapped
42671400 + 0 singletons (8.78%:-nan%)
418612688 + 0 with mate mapped to a different chr
312416516 + 0 with mate mapped to a different chr (mapQ>=5)
0 + 0 duplicates
490744296 + 0 mapped (100.00%:-nan%)
486148534 + 0 paired in sequencing
241299292 + 0 read1
244849242 + 0 read2
523372 + 0 properly paired (0.11%:-nan%)
443477134 + 0 with itself and mate mapped
42671400 + 0 singletons (8.78%:-nan%)
418612688 + 0 with mate mapped to a different chr
312416516 + 0 with mate mapped to a different chr (mapQ>=5)
@HW-ST997:217:C3KKGACXX:4:1101:1432:2038 1:N:0:TGACCA
TTCATCTTTAGATAATGAATTATATCCAAGATCAGACTGGCCACCTGTACTAGATCTATCATCAGTAGCATATACTTTGATTAAACCCG
+
FF00B<<FFFFFFBBFFFBFIFBBF0BBFFFFBFFFFIF<FFF<FBFF7BBBB<<B<''<B<BBB<<BBBBBFFFBBF<<B<7B7<BBB
@HW-ST997:217:C3KKGACXX:4:1101:1474:2051 1:N:0:TGACCA
GAGGGAGTATAGGGCTGTGACTAGTATGTTGAGTCCTGTAAGTAGGAGAGTGATATTTGATCAGGAGAACGTGGTTACTAGCACAGAGA
+
FIFIIBFBBFFFIIFFFFFFFFFFFBFFIIIFFFIIIFFFFFFFFFBF<BBBBF0BFFFBFFBFFFFFFFBFBFBFB<BBBBBBBBBFB
@HW-ST997:217:C3KKGACXX:4:1101:1451:2106 1:N:0:TGACCA
ACTGGGAAACGTTCACGCTGGGTCCAGCATTTGCCATGGACAAGATGCCAGGACCCGTATGCTTCAGGATGAAGTTCTTGTCATCAAAT
+
FIIFFBBFFFFFFBB7<7BBFFF77BBFFIFFFIFBFFFIFFIIF<B<0<BB7BBBBB<BBBBBBBB0BBBB0<7<BBBB0'0B<B<BB
@HW-ST997:217:C3KKGACXX:4:1101:1452:2018 2:N:0:TGACCA
TTACCCCCATACTCCTTACACTATTCCTCATCANCCNACTAAAAATATTAAACACAAACTACCACCTACCTCCCTCACCAAAGCCCATA
+
FFFFFFFF7FFFIIIIIFFFFFFFIIFFFFFFB#0B#07<FFFIFFFFIFBFFIFFFFFFFFBFF<BB<BFFFFB<BBBBBFBFFB<BB
@HW-ST997:217:C3KKGACXX:4:1101:1474:2051 2:N:0:TGACCA
AGTCATTCTCATAATCGCCCACGGGCTTACATCNTCNTTACTATTCTGCCTAGCAAACTCAAACTACGAACGCACTCACAGTCGCATCA
+
FFFIIFFFIIFIIFFBFBFFFIIIIFFFIFFFF#0<#07<BBFFFBBFBFFBBFFFFFBFFFFFFFFFFFFFBBBBFFBFFBBBFBBFB
@HW-ST997:217:C3KKGACXX:4:1101:1409:2234 2:N:0:TGACCA
ATCTCAGAAAAGAAGACATGGAATATGCCCTGNNTANACTGGATGACACCAAATTCCGCTCTCATGAGGGTGAAACTTCCTACATCCGA
+
<BFFFIFFIIIBBFFBFBBFFFFF7FFFFFII##07#07BFFBFFBFFFIFFFBF7BBFFBBBBBBB<BB0<B<'7<BBBBBBBBBBB<
TTCATCTTTAGATAATGAATTATATCCAAGATCAGACTGGCCACCTGTACTAGATCTATCATCAGTAGCATATACTTTGATTAAACCCG
+
FF00B<<FFFFFFBBFFFBFIFBBF0BBFFFFBFFFFIF<FFF<FBFF7BBBB<<B<''<B<BBB<<BBBBBFFFBBF<<B<7B7<BBB
@HW-ST997:217:C3KKGACXX:4:1101:1474:2051 1:N:0:TGACCA
GAGGGAGTATAGGGCTGTGACTAGTATGTTGAGTCCTGTAAGTAGGAGAGTGATATTTGATCAGGAGAACGTGGTTACTAGCACAGAGA
+
FIFIIBFBBFFFIIFFFFFFFFFFFBFFIIIFFFIIIFFFFFFFFFBF<BBBBF0BFFFBFFBFFFFFFFBFBFBFB<BBBBBBBBBFB
@HW-ST997:217:C3KKGACXX:4:1101:1451:2106 1:N:0:TGACCA
ACTGGGAAACGTTCACGCTGGGTCCAGCATTTGCCATGGACAAGATGCCAGGACCCGTATGCTTCAGGATGAAGTTCTTGTCATCAAAT
+
FIIFFBBFFFFFFBB7<7BBFFF77BBFFIFFFIFBFFFIFFIIF<B<0<BB7BBBBB<BBBBBBBB0BBBB0<7<BBBB0'0B<B<BB
@HW-ST997:217:C3KKGACXX:4:1101:1452:2018 2:N:0:TGACCA
TTACCCCCATACTCCTTACACTATTCCTCATCANCCNACTAAAAATATTAAACACAAACTACCACCTACCTCCCTCACCAAAGCCCATA
+
FFFFFFFF7FFFIIIIIFFFFFFFIIFFFFFFB#0B#07<FFFIFFFFIFBFFIFFFFFFFFBFF<BB<BFFFFB<BBBBBFBFFB<BB
@HW-ST997:217:C3KKGACXX:4:1101:1474:2051 2:N:0:TGACCA
AGTCATTCTCATAATCGCCCACGGGCTTACATCNTCNTTACTATTCTGCCTAGCAAACTCAAACTACGAACGCACTCACAGTCGCATCA
+
FFFIIFFFIIFIIFFBFBFFFIIIIFFFIFFFF#0<#07<BBFFFBBFBFFBBFFFFFBFFFFFFFFFFFFFBBBBFFBFFBBBFBBFB
@HW-ST997:217:C3KKGACXX:4:1101:1409:2234 2:N:0:TGACCA
ATCTCAGAAAAGAAGACATGGAATATGCCCTGNNTANACTGGATGACACCAAATTCCGCTCTCATGAGGGTGAAACTTCCTACATCCGA
+
<BFFFIFFIIIBBFFBFBBFFFFF7FFFFFII##07#07BFFBFFBFFFIFFFBF7BBFFBBBBBBB<BB0<B<'7<BBBBBBBBBBB<
-Jeremy
Comment