Hi, got a question that may be a very quick answer... I am using PE Illumina reads for Tophat runs, and always get output like this:
bowtie.left_kept_reads.log
52107199 reads; of these:
52107199 (100.00%) were unpaired; of these:
21494360 (41.25%) aligned 0 times
27619125 (53.00%) aligned exactly 1 time
2993714 (5.75%) aligned >1 times
58.75% overall alignment rate
bowtie.right_kept_reads.log
52025239 reads; of these:
52025239 (100.00%) were unpaired; of these:
23520504 (45.21%) aligned 0 times
25623432 (49.25%) aligned exactly 1 time
2881303 (5.54%) aligned >1 times
54.79% overall alignment rate
...and I am using Tophat with the following command:
tophat --no-convert-bam -p 20 genome_ref sample1_1.fastq sample1_2.fastq
My question: why is tophat saying that 100% of the left and right reads were unpaired? My two reads files have exactly the same number of lines and were filtered before tophat to only keep the matching PE reads. Here is an example of my reads:
==> sample1_1.fastq <==
@HHHABC:23:CEX1CEE:4:1101:2733:2083/1 1:N:0:CGATGT
GCACATCCAATAACAAATTGTCTTTTATAAATGGTTACTTATTTGAGCAGAATTGAGCAAGACAGCCATGCAAAGTGTTACGTTGAAATACTGTCAATATG
+
@@@DDDDFHGHGGGGIGIJGHIIJJJICICICHHHFHIIIGJJJIGEGGHBFFHHEHIJJJJJJIEIGHFHIHIIHHIEEHFHFEDEDACEAEEEDDCDCD
@HHHABC:23:CEX1CEE:4:1101:2837:2057/1 1:N:0:CGATGT
CTTGTTTAGTTCGGGACTCCGCGGCTCTGGAACGGAACTACAAGAGCCGCAGGTCCGGTTTGAAAAGCTGCAACAGCTGGGTTT
+
:BDFDDFBFFFAEGIADHHGG8?@FHGFDH<FBFHGAEEC?E>EAB7@>BB@B55:=B?BDDA:>>C8>@9>AA?AACDC0<??
==> sample1_2.fastq <==
@HHHABC:23:CEX1CEE:4:1101:2733:2083/2 2:N:0:CGATGT
CTTACATTTTCGCATGGCTTAATATATAGCAGGTATTAAACATTACCTATATTTATAAGTCCATCATTAATGAACAACATATGGGTTATTGTCATATTGAC
+
@CFFFFFHDFHHIJJJIJIIJIEIEEGHGGIGCFGHEHGHIGHHIJJGIIGGIIEIJDEIHIJIJJJJIJIIEIFGIDHHGAEFDFDBCCDEEEDEEECC
@HHHABC:23:CEX1CEE:4:1101:2837:2057/2 2:N:0:CGATGT
GAATGAATTAACCTCGAAATATGCTGCTGGATGCAAAGAAGTCGAATGTATTTATGATCTAGATTTATTATTGCGGTGAAGCAGCTGACATGTTTCTGTCC
+
@<<DDADAFAFHHIJJJJCB>FHGIIIFGJBGHAHGCGECG@FDEEHH*9BFHIGIHE=CFG><FHDGCEE>@AAEEHDCFBCEDCEACCCCDEEDDDDC>
Does anyone else experience the same problem!? Thanks for the input.
bowtie.left_kept_reads.log
52107199 reads; of these:
52107199 (100.00%) were unpaired; of these:
21494360 (41.25%) aligned 0 times
27619125 (53.00%) aligned exactly 1 time
2993714 (5.75%) aligned >1 times
58.75% overall alignment rate
bowtie.right_kept_reads.log
52025239 reads; of these:
52025239 (100.00%) were unpaired; of these:
23520504 (45.21%) aligned 0 times
25623432 (49.25%) aligned exactly 1 time
2881303 (5.54%) aligned >1 times
54.79% overall alignment rate
...and I am using Tophat with the following command:
tophat --no-convert-bam -p 20 genome_ref sample1_1.fastq sample1_2.fastq
My question: why is tophat saying that 100% of the left and right reads were unpaired? My two reads files have exactly the same number of lines and were filtered before tophat to only keep the matching PE reads. Here is an example of my reads:
==> sample1_1.fastq <==
@HHHABC:23:CEX1CEE:4:1101:2733:2083/1 1:N:0:CGATGT
GCACATCCAATAACAAATTGTCTTTTATAAATGGTTACTTATTTGAGCAGAATTGAGCAAGACAGCCATGCAAAGTGTTACGTTGAAATACTGTCAATATG
+
@@@DDDDFHGHGGGGIGIJGHIIJJJICICICHHHFHIIIGJJJIGEGGHBFFHHEHIJJJJJJIEIGHFHIHIIHHIEEHFHFEDEDACEAEEEDDCDCD
@HHHABC:23:CEX1CEE:4:1101:2837:2057/1 1:N:0:CGATGT
CTTGTTTAGTTCGGGACTCCGCGGCTCTGGAACGGAACTACAAGAGCCGCAGGTCCGGTTTGAAAAGCTGCAACAGCTGGGTTT
+
:BDFDDFBFFFAEGIADHHGG8?@FHGFDH<FBFHGAEEC?E>EAB7@>BB@B55:=B?BDDA:>>C8>@9>AA?AACDC0<??
==> sample1_2.fastq <==
@HHHABC:23:CEX1CEE:4:1101:2733:2083/2 2:N:0:CGATGT
CTTACATTTTCGCATGGCTTAATATATAGCAGGTATTAAACATTACCTATATTTATAAGTCCATCATTAATGAACAACATATGGGTTATTGTCATATTGAC
+
@CFFFFFHDFHHIJJJIJIIJIEIEEGHGGIGCFGHEHGHIGHHIJJGIIGGIIEIJDEIHIJIJJJJIJIIEIFGIDHHGAEFDFDBCCDEEEDEEECC
@HHHABC:23:CEX1CEE:4:1101:2837:2057/2 2:N:0:CGATGT
GAATGAATTAACCTCGAAATATGCTGCTGGATGCAAAGAAGTCGAATGTATTTATGATCTAGATTTATTATTGCGGTGAAGCAGCTGACATGTTTCTGTCC
+
@<<DDADAFAFHHIJJJJCB>FHGIIIFGJBGHAHGCGECG@FDEEHH*9BFHIGIHE=CFG><FHDGCEE>@AAEEHDCFBCEDCEACCCCDEEDDDDC>
Does anyone else experience the same problem!? Thanks for the input.
Comment