Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • kreitinger
    Member
    • May 2011
    • 10

    Tophat reads kept/discarded during initial conversion

    I am using Tophat to analyze illumina HiSeq2000 paired end read data. I have noticed that during the initial execution, Tophat1(and 2) "converts the reads" and then sorts the left reads into kept and discarded groups (e.g. 8,000,012 kept, 10,121 discarded) and does the same for the right reads (e.g. 7,804,000 kept, 206133 discarded). Since there are a different number of discarded reads, I'm assuming that "lone" mates are treated as single reads.

    My question is, how does tophat decide which reads to keep and discard and why? Are there some underlying QC filters?
  • ROaj
    Junior Member
    • Jan 2012
    • 1

    #2
    I am also VERY interested in this question/answer as I do quite a bit of quality trimming prior to mapping my reads and I've noticed the discarded reads being about 1-2% of my total read library.

    Comment

    • AsoBioInfo
      Member
      • Dec 2011
      • 37

      #3
      Hey anyone of you got the answer as the same occurred with me also.

      Tophat version is v2.0.6. Previously using the old software and that was working fine.

      Comment

      • asperjelly
        Member
        • Jan 2013
        • 11

        #4
        I'm also using Tophat v2.0.6 and I also had this same question. I'm assuming it is removing reads that don't meet some quality threshold, but can't seem to find any documentation with the manual.

        Comment

        • kreitinger
          Member
          • May 2011
          • 10

          #5
          I still haven't figured out why these reads are discarded. Since this step happens before alignment to the genome or GTF annotations, it has to be related to discarding low quality reads. I emailed [email protected] with this thread's link, so hopefully they respond.

          Comment

          • Daehwan Kim
            Junior Member
            • Oct 2012
            • 4

            #6
            TopHat filter out some reads if they are of low complexity or include too many Ns.

            Comment

            • carmeyeii
              Senior Member
              • Mar 2011
              • 137

              #7
              About how many might "too many" be?

              Comment

              • NKAkers
                Member
                • Sep 2011
                • 26

                #8
                Not the answer to your question but...

                I can tell you that the 'discarded' reads end up in unmapped.bam.

                Hopefully future versions of tophat will allow for more user control/better documentation of the quality filtering.

                Comment

                • harryzs
                  Member
                  • Dec 2010
                  • 30

                  #9
                  I checked unmapped.bam from TopHat 2.0.9

                  samtools view -f 0x200 unmapped.bam | head

                  I got:
                  Code:
                  HWI-7001436:48:C2ET1ACXX:5:1108:2968:28222	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBDDDDDDDDDDDDD@DDDDBBBDDDD	ZT:A:L
                  HWI-7001436:48:C2ET1ACXX:5:1203:5292:62817	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCCTCGTTACA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBBB5&)0((+()+((++	ZT:A:L
                  HWI-7001436:48:C2ET1ACXX:5:1312:13946:40878	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD	ZT:A:L
                  HWI-7001436:48:C2ET1ACXX:5:1203:5920:62936	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDBDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBBDDDDDD	ZT:A:L
                  HWI-7001436:48:C2ET1ACXX:5:1312:14680:40864	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD	ZT:A:L
                  HWI-7001436:48:C2ET1ACXX:5:2312:9415:35514	581	*	0	255	*	*	0	0	ATTAAAAAAAAAAAACTCCTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHIII<FHCHIIIIIIHDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD	ZT:A:L
                  HWI-7001436:48:C2ET1ACXX:5:1312:14593:40904	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCTCTCTTATAAAC	CCCFFFDFHGHHHIJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBDDDDDDDDDDDDDDDDDDDDDDDBDD<9>&&+((((4(+(((((	ZT:A:L
                  HWI-7001436:48:C2ET1ACXX:5:1108:4206:28028	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBBBDDD<BDDDDDDDDDDDDDDDDDDDDDDD9	ZT:A:L
                  HWI-7001436:48:C2ET1ACXX:5:1203:7475:62973	581	*	0	255	*	*	0	0	AGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBBDDDDDDDDBDDDDDDDBB@DDDDDDB@DBDB95&	ZT:A:L
                  HWI-7001436:48:C2ET1ACXX:5:1108:4708:28068	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBDDDDDDDDDDDDDDDDDDDDDDDDD	ZT:A:L
                  I think it makes sense removing these reads before alignment.

                  Right??

                  Another question:
                  what is the meaning of "ZT:A:L"?
                  Last edited by harryzs; 10-05-2013, 12:44 AM.

                  Comment

                  Latest Articles

                  Collapse

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-09-2026, 11:58 AM
                  0 responses
                  15 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  26 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-04-2026, 08:59 AM
                  0 responses
                  37 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  61 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...