View Single Post
Old 11-02-2012, 08:07 AM   #1
EGrassi
Member
 
Location: Turin, Italy

Join Date: Oct 2010
Posts: 66
Default Tophat 2.0.0 number of reads

Apart from the problem when using more than one core and random problems there I've got some other problems:

Code:
$ wc -l ../reads/SRR306839.fastq
75400120 ../reads/SRR306839.fastq
Thus I have 18850030 reads. Tophat reports:
Code:
18645993 reads; of these:
  18645993 (100.00%) were unpaired; of these:
    7576936 (40.64%) aligned 0 times
    5480546 (29.39%) aligned exactly 1 time
    5588511 (29.97%) aligned >1 times
59.36% overall alignment rate
Ok, maybe some of them where rejected a priori for some quality issue.
Then:
Code:
$ samtools flagstat accepted_hits.bam
41956190 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
41956190 + 0 mapped (100.00%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
I have a lot of lines in the .sam files without read IDs (in another dataset this caused problems with htseq_count, here not but as long as in the past I did not notice something like that I would like to understand). Is this ok? Are lines without IDs reported from reads mapping in more than one position on the transcriptome/genome?
The SAM format guide says that missing read names should be marked with a '*'...
EGrassi is offline   Reply With Quote