Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • PhD1990
    Junior Member
    • Jan 2014
    • 3

    problem with HTSeq

    hi everyone

    I'm trying to start to use python/HTSeq to try to analyse RNA seq data.
    I'm following a tour through HTSeq but i m having a weird problem

    i can import HTSeq
    and read in a file with the HTSeq.FastqReader
    i can get a name of a read with read.name
    but when i type read.qual python just automatically restart and i have to start over

    does anyone know why this is and how i cna solve this problem?

    thank you
  • Wolfgang Huber
    Senior Member
    • Aug 2009
    • 109

    #2
    Dear PhD1990

    it's good that you report having a problem. Probably you need to be more specific for someone to be able to help you. Can you provide a

    - reproducible example (i.e. a self-contained piece of code and, if needed, data for others to reproduce your problem)
    - a statement of what the problem is that you experience (any error messages, warnings etc.)
    - an overview over your system (OS, Python version).

    Kind regards
    Wolfgang
    Wolfgang Huber
    EMBL

    Comment

    • sindrle
      Senior Member
      • Aug 2013
      • 266

      #3
      HTSeq: Very few counts recognised

      Hi!
      Ive seen a lot of threads on this, but I can't figure it out. I got 16-60 millions single end reads in each library. Ive used Tophat 2 with UCSC GTF file for hg19.

      This is my code:

      samtools view accepted_hits.bam | \
      htseq-count -m intersection-nonempty -s no -a 10 \
      - UCSC/hg19/genes.gtf \
      > Out.txt

      Here is a typical result, its propotional to the library size:

      no_feature 7013689
      ambiguous 269370
      too_low_aQual 0
      not_aligned 0
      alignment_not_unique 6645341

      How come i get on average 25 - 50% reads that is "no_feature",
      "ambiguous" or "alignment_not_unique".

      This is RNAseq, and if I must visually inspect, how to precede?

      Comment

      • PhD1990
        Junior Member
        • Jan 2014
        • 3

        #4
        thanks + second question

        hi everyone

        thank you so much for helping me
        i have found the problem by the way in the tutorial they say you chould download a vcredist x86 2010 version but now i downloaded 2012 and it wordks perfectly

        i have a second question though.

        Now the tutorial is working for me i still have one really weird problem. to count reads you should download exon information from internet? (ensembl or something) but in the tutorial they give a gtf file and that works perfectly, but on internet i can only find gff3 files for for example E coli strains. How do you use these because i see that the content is different from the gtf file?

        is there a standard format? of a place where i can find exon information in gtf version?

        thanks
        grtz

        Sara

        Comment

        • bruce01
          Senior Member
          • Mar 2011
          • 160

          #5
          Hi Sara,

          you can use GFF3 format in HTSeq, you just need to specify the feature (3rd column) using -t flag as it may be different from default which I think is 'gene_id'. For example '-t gene'. Otherwise you can use a conversion script to make a GTF from GFF3, there are a few around in various scripting languages, or I can PM you one I use if you want.

          Bruce.

          Comment

          • PhD1990
            Junior Member
            • Jan 2014
            • 3

            #6
            hi Bruce

            that would be really nice if you could send me such a script

            thank you so much

            Sara

            Comment

            • Simon Anders
              Senior Member
              • Feb 2010
              • 995

              #7
              Originally posted by sindrle View Post
              Hi!
              Ive seen a lot of threads on this, but I can't figure it out. I got 16-60 millions single end reads in each library. Ive used Tophat 2 with UCSC GTF file for hg19.

              [...]

              How come i get on average 25 - 50% reads that is "no_feature",
              "ambiguous" or "alignment_not_unique".
              Is this a GTF file created with UCSC's table browser? If so: These do not work. There is a bug in the Table Browser server, which causes all the gene IDs to contain not the gene ID but the transcript ID.

              Please use a GTF file from another source.

              Simon

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Today, 05:37 AM
              0 responses
              5 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-26-2026, 11:10 AM
              0 responses
              16 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              49 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              109 views
              0 reactions
              Last Post SEQadmin2  
              Working...