Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • HTSeq dealing with "*" qualities

    Hi everyone,

    I started using HTSeq a couple of days ago and now encountered a problem. Maybe someone knows a workaround.

    I am interating over an sam file and cant find a solution for the error:
    (also described here http://seqanswers.com/forums/showthread.php?t=12091)

    ValueError: 'seq' and 'qualstr' do not have the same length.
    The Alignment is from Bowtie2 and lacks the qualitystring (only a "*" is in the file, but the complete read sequence is there).

    Like: blaaaaa ACTACTATCTAC * blaaaaa


    Since I have a lot of files I cant perform a filtering in the first place, because I do not want to touch those big files twice.

    thanks in advance.

    EDIT:
    I am using the latest release of HTSeq.



    regards
    Last edited by kamsen; 04-04-2012, 07:06 AM.

  • #2
    Sounds like a bug in HTSeq - as discussed in the linked thread, the SAM/BAM file format explicitly allows the sequencing qualities to be omitted (which in SAM is represented with the * character).

    Have you contacted the HTSeq authors?

    P.S. Saying you use the latest version isn't as helpful as saying the actual version you are using. People may read this thread later on

    Comment


    • #3
      Yes, that's a limitation of HTSeq. Fixing this has been on my to-do list since a while; sorry that it's still not done.

      Comment


      • #4
        Just a few remarks to close this topic:

        1) I was talking about version 0.5.3p3
        2) I made quick & dirty workaround in the code (__init__ modul l. 537) which worked for me. If somebody encounters this problem one could easily just return the line from the .sam file and create 0 qualities / read the original ones. After that the conversion to the Alignment format will work again.
        3) Thanks anyway for your nice package Simon!

        regards

        Comment


        • #5
          Originally posted by Simon Anders View Post
          Yes, that's a limitation of HTSeq. Fixing this has been on my to-do list since a while; sorry that it's still not done.
          Hi Simon,

          Do you fix this bug ? I've the same problem with tophat 2.0.0 bam files.

          Code:
          samtools view -h -o out.sam in.bam
          htseq-count out.sam annotation.gtf > htseq_out.txt
          gives me

          Code:
          100000 GFF lines processed.
          200000 GFF lines processed.
          283699 GFF lines processed.
          Error occured in line 36 of file out.sam.
          Error: ("'seq' and 'qualstr' do not have the same length.", 'line 36 of file out.sam')
          [Exception type: ValueError, raised in _HTSeq.pyx:765]

          Comment


          • #6
            I've just fixed this. In HTSeq 0.5.3p4, SAM files with "*" in the quality field are accepted. Sorry that this took a while.

            Comment


            • #7
              Thanks Simon, it worked great.

              Comment


              • #8
                Dear Simon,

                You are my hero.
                Just ran into this problem yesterday. And by this morning a solution was already in place.
                I owe you a beer.

                Comment


                • #9
                  I should also add that I installed HTSeq-0.5.3p3 to encounter the qual problem and upon installing HTSeq-0.5.3p4, all was well.

                  Comment


                  • #10
                    Problem is still seen in HTSeq - v0.5.3p5

                    Dear Simon,

                    I had installed the latest version of HTseq (HTSeq-0.5.3p5.tar.gz) to solve the problem but it looks like for me the error still persists.

                    I am still facing this error:
                    Error: ("'seq' and 'qualstr' do not have the same length.", 'line 2671032 of file ..)
                    [Exception type: ValueError, raised in _HTSeq.pyx:765]

                    Can you please help me out?

                    Thanks,
                    Dharanya

                    Comment


                    • #11
                      It would be nice if the HTSeq error message included the two unmatched lengths - but can you show us what line 2671032 of your input file is? This may not be due to the * for missing qualities at all, but a real error in the data.

                      Comment


                      • #12
                        Hi,

                        Here is the line from that file:


                        HWI-ST790:1:1101:1261:140607#ACTTGA 329 contig_126150 342 3 100M * 0 0 GTCCAGGTTGGTGGACCTCTCAATCATGTTGTCACCCTCAAACCCAGAGATGGGGACGAAGGGAACCTTGTTAGGGTTGTAGCCGACCTTCTTCAGGTAG * AS:i:-7 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:3A67C28 YT:Z:UU NH:i:2 CC:Z:contig_223383 CP:i:208 HI:i:0

                        Cheers,
                        Dharanya

                        Comment


                        • #13
                          Can you double check which HTSeq you are using? Perhaps an older copy is taking precedence in your PATH, or the update didn't install properly.

                          Comment


                          • #14
                            May be there might be a problem with the installation. I will go through it again and let you know if there are any problems still.
                            Thanks

                            Comment


                            • #15
                              As an aside, the latest version of Tophat 2 no longer has the "*" qualities problems.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advancing Precision Medicine for Rare Diseases in Children
                                by seqadmin




                                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                12-16-2024, 07:57 AM
                              • seqadmin
                                Recent Advances in Sequencing Technologies
                                by seqadmin



                                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                Long-Read Sequencing
                                Long-read sequencing has seen remarkable advancements,...
                                12-02-2024, 01:49 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 12-17-2024, 10:28 AM
                              0 responses
                              34 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-13-2024, 08:24 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-12-2024, 07:41 AM
                              0 responses
                              35 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-11-2024, 07:45 AM
                              0 responses
                              46 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X