Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • kamsen
    Junior Member
    • Mar 2012
    • 3

    HTSeq dealing with "*" qualities

    Hi everyone,

    I started using HTSeq a couple of days ago and now encountered a problem. Maybe someone knows a workaround.

    I am interating over an sam file and cant find a solution for the error:
    (also described here http://seqanswers.com/forums/showthread.php?t=12091)

    ValueError: 'seq' and 'qualstr' do not have the same length.
    The Alignment is from Bowtie2 and lacks the qualitystring (only a "*" is in the file, but the complete read sequence is there).

    Like: blaaaaa ACTACTATCTAC * blaaaaa


    Since I have a lot of files I cant perform a filtering in the first place, because I do not want to touch those big files twice.

    thanks in advance.

    EDIT:
    I am using the latest release of HTSeq.



    regards
    Last edited by kamsen; 04-04-2012, 07:06 AM.
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    Sounds like a bug in HTSeq - as discussed in the linked thread, the SAM/BAM file format explicitly allows the sequencing qualities to be omitted (which in SAM is represented with the * character).

    Have you contacted the HTSeq authors?

    P.S. Saying you use the latest version isn't as helpful as saying the actual version you are using. People may read this thread later on

    Comment

    • Simon Anders
      Senior Member
      • Feb 2010
      • 995

      #3
      Yes, that's a limitation of HTSeq. Fixing this has been on my to-do list since a while; sorry that it's still not done.

      Comment

      • kamsen
        Junior Member
        • Mar 2012
        • 3

        #4
        Just a few remarks to close this topic:

        1) I was talking about version 0.5.3p3
        2) I made quick & dirty workaround in the code (__init__ modul l. 537) which worked for me. If somebody encounters this problem one could easily just return the line from the .sam file and create 0 qualities / read the original ones. After that the conversion to the Alignment format will work again.
        3) Thanks anyway for your nice package Simon!

        regards

        Comment

        • NicoBxl
          not just another member
          • Aug 2010
          • 264

          #5
          Originally posted by Simon Anders View Post
          Yes, that's a limitation of HTSeq. Fixing this has been on my to-do list since a while; sorry that it's still not done.
          Hi Simon,

          Do you fix this bug ? I've the same problem with tophat 2.0.0 bam files.

          Code:
          samtools view -h -o out.sam in.bam
          htseq-count out.sam annotation.gtf > htseq_out.txt
          gives me

          Code:
          100000 GFF lines processed.
          200000 GFF lines processed.
          283699 GFF lines processed.
          Error occured in line 36 of file out.sam.
          Error: ("'seq' and 'qualstr' do not have the same length.", 'line 36 of file out.sam')
          [Exception type: ValueError, raised in _HTSeq.pyx:765]

          Comment

          • Simon Anders
            Senior Member
            • Feb 2010
            • 995

            #6
            I've just fixed this. In HTSeq 0.5.3p4, SAM files with "*" in the quality field are accepted. Sorry that this took a while.

            Comment

            • NicoBxl
              not just another member
              • Aug 2010
              • 264

              #7
              Thanks Simon, it worked great.

              Comment

              • fishinabarrel
                Junior Member
                • Apr 2011
                • 6

                #8
                Dear Simon,

                You are my hero.
                Just ran into this problem yesterday. And by this morning a solution was already in place.
                I owe you a beer.

                Comment

                • fishinabarrel
                  Junior Member
                  • Apr 2011
                  • 6

                  #9
                  I should also add that I installed HTSeq-0.5.3p3 to encounter the qual problem and upon installing HTSeq-0.5.3p4, all was well.

                  Comment

                  • dharan
                    Junior Member
                    • Jan 2012
                    • 7

                    #10
                    Problem is still seen in HTSeq - v0.5.3p5

                    Dear Simon,

                    I had installed the latest version of HTseq (HTSeq-0.5.3p5.tar.gz) to solve the problem but it looks like for me the error still persists.

                    I am still facing this error:
                    Error: ("'seq' and 'qualstr' do not have the same length.", 'line 2671032 of file ..)
                    [Exception type: ValueError, raised in _HTSeq.pyx:765]

                    Can you please help me out?

                    Thanks,
                    Dharanya

                    Comment

                    • maubp
                      Peter (Biopython etc)
                      • Jul 2009
                      • 1544

                      #11
                      It would be nice if the HTSeq error message included the two unmatched lengths - but can you show us what line 2671032 of your input file is? This may not be due to the * for missing qualities at all, but a real error in the data.

                      Comment

                      • dharan
                        Junior Member
                        • Jan 2012
                        • 7

                        #12
                        Hi,

                        Here is the line from that file:


                        HWI-ST790:1:1101:1261:140607#ACTTGA 329 contig_126150 342 3 100M * 0 0 GTCCAGGTTGGTGGACCTCTCAATCATGTTGTCACCCTCAAACCCAGAGATGGGGACGAAGGGAACCTTGTTAGGGTTGTAGCCGACCTTCTTCAGGTAG * AS:i:-7 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:3A67C28 YT:Z:UU NH:i:2 CC:Z:contig_223383 CP:i:208 HI:i:0

                        Cheers,
                        Dharanya

                        Comment

                        • maubp
                          Peter (Biopython etc)
                          • Jul 2009
                          • 1544

                          #13
                          Can you double check which HTSeq you are using? Perhaps an older copy is taking precedence in your PATH, or the update didn't install properly.

                          Comment

                          • dharan
                            Junior Member
                            • Jan 2012
                            • 7

                            #14
                            May be there might be a problem with the installation. I will go through it again and let you know if there are any problems still.
                            Thanks

                            Comment

                            • chadn737
                              Senior Member
                              • Jan 2009
                              • 392

                              #15
                              As an aside, the latest version of Tophat 2 no longer has the "*" qualities problems.

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 10:09 AM
                              0 responses
                              10 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              18 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              26 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...