Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • HTSeq dealing with "*" qualities

    Hi everyone,

    I started using HTSeq a couple of days ago and now encountered a problem. Maybe someone knows a workaround.

    I am interating over an sam file and cant find a solution for the error:
    (also described here http://seqanswers.com/forums/showthread.php?t=12091)

    ValueError: 'seq' and 'qualstr' do not have the same length.
    The Alignment is from Bowtie2 and lacks the qualitystring (only a "*" is in the file, but the complete read sequence is there).

    Like: blaaaaa ACTACTATCTAC * blaaaaa


    Since I have a lot of files I cant perform a filtering in the first place, because I do not want to touch those big files twice.

    thanks in advance.

    EDIT:
    I am using the latest release of HTSeq.



    regards
    Last edited by kamsen; 04-04-2012, 07:06 AM.

  • #2
    Sounds like a bug in HTSeq - as discussed in the linked thread, the SAM/BAM file format explicitly allows the sequencing qualities to be omitted (which in SAM is represented with the * character).

    Have you contacted the HTSeq authors?

    P.S. Saying you use the latest version isn't as helpful as saying the actual version you are using. People may read this thread later on

    Comment


    • #3
      Yes, that's a limitation of HTSeq. Fixing this has been on my to-do list since a while; sorry that it's still not done.

      Comment


      • #4
        Just a few remarks to close this topic:

        1) I was talking about version 0.5.3p3
        2) I made quick & dirty workaround in the code (__init__ modul l. 537) which worked for me. If somebody encounters this problem one could easily just return the line from the .sam file and create 0 qualities / read the original ones. After that the conversion to the Alignment format will work again.
        3) Thanks anyway for your nice package Simon!

        regards

        Comment


        • #5
          Originally posted by Simon Anders View Post
          Yes, that's a limitation of HTSeq. Fixing this has been on my to-do list since a while; sorry that it's still not done.
          Hi Simon,

          Do you fix this bug ? I've the same problem with tophat 2.0.0 bam files.

          Code:
          samtools view -h -o out.sam in.bam
          htseq-count out.sam annotation.gtf > htseq_out.txt
          gives me

          Code:
          100000 GFF lines processed.
          200000 GFF lines processed.
          283699 GFF lines processed.
          Error occured in line 36 of file out.sam.
          Error: ("'seq' and 'qualstr' do not have the same length.", 'line 36 of file out.sam')
          [Exception type: ValueError, raised in _HTSeq.pyx:765]

          Comment


          • #6
            I've just fixed this. In HTSeq 0.5.3p4, SAM files with "*" in the quality field are accepted. Sorry that this took a while.

            Comment


            • #7
              Thanks Simon, it worked great.

              Comment


              • #8
                Dear Simon,

                You are my hero.
                Just ran into this problem yesterday. And by this morning a solution was already in place.
                I owe you a beer.

                Comment


                • #9
                  I should also add that I installed HTSeq-0.5.3p3 to encounter the qual problem and upon installing HTSeq-0.5.3p4, all was well.

                  Comment


                  • #10
                    Problem is still seen in HTSeq - v0.5.3p5

                    Dear Simon,

                    I had installed the latest version of HTseq (HTSeq-0.5.3p5.tar.gz) to solve the problem but it looks like for me the error still persists.

                    I am still facing this error:
                    Error: ("'seq' and 'qualstr' do not have the same length.", 'line 2671032 of file ..)
                    [Exception type: ValueError, raised in _HTSeq.pyx:765]

                    Can you please help me out?

                    Thanks,
                    Dharanya

                    Comment


                    • #11
                      It would be nice if the HTSeq error message included the two unmatched lengths - but can you show us what line 2671032 of your input file is? This may not be due to the * for missing qualities at all, but a real error in the data.

                      Comment


                      • #12
                        Hi,

                        Here is the line from that file:


                        HWI-ST790:1:1101:1261:140607#ACTTGA 329 contig_126150 342 3 100M * 0 0 GTCCAGGTTGGTGGACCTCTCAATCATGTTGTCACCCTCAAACCCAGAGATGGGGACGAAGGGAACCTTGTTAGGGTTGTAGCCGACCTTCTTCAGGTAG * AS:i:-7 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:3A67C28 YT:Z:UU NH:i:2 CC:Z:contig_223383 CP:i:208 HI:i:0

                        Cheers,
                        Dharanya

                        Comment


                        • #13
                          Can you double check which HTSeq you are using? Perhaps an older copy is taking precedence in your PATH, or the update didn't install properly.

                          Comment


                          • #14
                            May be there might be a problem with the installation. I will go through it again and let you know if there are any problems still.
                            Thanks

                            Comment


                            • #15
                              As an aside, the latest version of Tophat 2 no longer has the "*" qualities problems.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X