Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by tangx_2010 View Post
    1. How does HTSeq count pair-end reads by genes when both singleton and paired reads are presented? If the count is based on fragment, then paired two reads are counted by one, and singleton also counted by one, right?
    Exactly. A paired read is only counted if both ends map to the same gene, and then, it is counted once, not twice, to this gene.

    2. How does HTSeq treats multiple mapped reads? Are they just counted several times by different genes?
    This is a bit an issue, as different aligners have different ways of reporting multiple hits, and the SAM specification is frustratingly unclear on this.

    In general, a multiply aligned read should be discarded. (Imagine, genes A and B have partial sequence identity. If A is differentially expressed and B is not, any read originating from A that matches to both A and B will let B appear as differentially expressed, too, if it is counted for both. Hence, the prudent strategy is to only count reads that map uniquely to a gene.)

    For now, HTSeq looks for the "NH" optional flag. If it indicates that more than one alignment is reported, the read is not counted. If you use the "--minaqual" option, you can also cause all reads with low alignment quality to be skipped, which is another way how some aligners tag multiple alignments. If neither of the two works for you, you should pre-filter the SAM file. It is easy to write such a filtering script with HTSeq.

    Comment


    • #32
      Originally posted by Simon Anders View Post
      Exactly. A paired read is only counted if both ends map to the same gene, and then, it is counted once, not twice, to this gene.



      This is a bit an issue, as different aligners have different ways of reporting multiple hits, and the SAM specification is frustratingly unclear on this.

      In general, a multiply aligned read should be discarded. (Imagine, genes A and B have partial sequence identity. If A is differentially expressed and B is not, any read originating from A that matches to both A and B will let B appear as differentially expressed, too, if it is counted for both. Hence, the prudent strategy is to only count reads that map uniquely to a gene.)

      For now, HTSeq looks for the "NH" optional flag. If it indicates that more than one alignment is reported, the read is not counted. If you use the "--minaqual" option, you can also cause all reads with low alignment quality to be skipped, which is another way how some aligners tag multiple alignments. If neither of the two works for you, you should pre-filter the SAM file. It is easy to write such a filtering script with HTSeq.
      I have a question:
      The samtools flagstat result of the bam:

      8236963 + 0 in total (QC-passed reads + QC-failed reads)
      0 + 0 duplicates
      8236963 + 0 mapped (100.00%:-nan%)
      8236963 + 0 paired in sequencing
      4144407 + 0 read1
      4092556 + 0 read2
      6061528 + 0 properly paired (73.59%:-nan%)
      7724492 + 0 with itself and mate mapped
      512471 + 0 singletons (6.22%:-nan%)
      0 + 0 with mate mapped to a different chr
      0 + 0 with mate mapped to a different chr (mapQ>=5)

      the line : 7724492 + 0 with itself and mate mapped (with itself : that both the forward and reverse are mapped, http://i.seqanswers.com/questions/80...lagstat-output)

      dose HTSeq handle such paired reads?

      Comment


      • #33
        pair-end strand specific RNA-seq

        Originally posted by Simon Anders View Post
        Exactly. A paired read is only counted if both ends map to the same gene, and then, it is counted once, not twice, to this gene.
        Thank you for your reply. You are such a good man with patience. And there is another problem. I am runing HTSeq with pair-end and strand-specific RNA-seq data. It's very strange that most of the reads (> 90%) are counted as ambiguous. However, when I take the data as non strand-specific, HTSeq works very well. Waiting for your reply.
        sincerely tangx

        Comment


        • #34
          Hi quinlana,

          I try the following command:
          Code:
          time samtools -f 0x2 /tophat_out/accepted_hits.bam | coverageBed -abam  /tophat_out/accepted_hits.bam -b transcripts.gtf > transcripts.proper.cov &
          Unfortunately, it gives the following error message:
          Code:
          BgzfStream ERROR: read block failed - invalid block header
          BamHeader ERROR: could not read magic number
          Do you mind to explain a little bit more about the correct command?
          Kindly point out the error that I did in my example.
          Thanks.

          Comment


          • #35
            Hi Auction,

            Is it this should be the correct command in order to identify the raw count of each transcript ?
            Code:
            samtools [B]view[/B] -f 0x2 /tophat_out/accepted_hits.bam | coverageBed -abam  /tophat_out/accepted_hits.bam -b transcripts.gtf > transcripts.proper.cov

            Comment


            • #36
              Originally posted by tangx_2010 View Post
              Thank you for your reply. You are such a good man with patience. And there is another problem. I am runing HTSeq with pair-end and strand-specific RNA-seq data. It's very strange that most of the reads (> 90%) are counted as ambiguous. However, when I take the data as non strand-specific, HTSeq works very well. Waiting for your reply.
              sincerely tangx
              This is weird. Are you sure, you mean 'ambiguous', and not 'no_feature'?

              Comment


              • #37
                multiple mapped miRNA with HTseq/DEseq

                In general, a multiply aligned read should be discarded. (Imagine, genes A and B have partial sequence identity. If A is differentially expressed and B is not, any read originating from A that matches to both A and B will let B appear as differentially expressed, too, if it is counted for both. Hence, the prudent strategy is to only count reads that map uniquely to a gene.)
                Dear Simon,
                I hope this thread will still find you well.
                I am mapping reads from small RNAseq and would like to use HTseq/ DEseq. As miRNA map very often to more than one genomic location I would like to know your suggestions to deal with these multiple mapped reads with HTSeq. For what I saw on your previous feedback, you suggest to remove the multiple mapped reads but in my case that would mean that 80% of the reads are discarded... Could you please advise some methodology to use HTseq with miRNA data?

                Olivier

                Comment


                • #38
                  Sorry, I've never worked myself with miRNA-Seq data. From what I've heard, a common approach is to first compile a list of all miRNA sequences in the genome, then find for each read a miRNA in the list. In other words: skip the alignment against the genome completely, instead just compare reads with miRNAs. Of course, htseq-count is not suitable for this, but with some Python knowledge, you can write your own script with HTSeq.

                  Comment


                  • #39
                    HT_SEQ installation

                    Hi, Simon

                    I was trying to install HTseq on my local ubuntu machine by following your instruction below:


                    However, there are some error messages there. could you help me to get around this?

                    Thanks for your help!

                    bq@bq-VirtualBox:~/Desktop/apps/HTSeq-0.5.4p3$ ls
                    build_it doc HTSeq.egg-info MANIFEST.in README setup.cfg src
                    clean HTSeq LICENSE PKG-INFO scripts setup.py VERSION
                    bq@bq-VirtualBox:~/Desktop/apps/HTSeq-0.5.4p3$ python setup.py install --user
                    Could not import 'setuptools', falling back to 'distutils'.

                    running install
                    running build
                    running build_py
                    creating build
                    creating build/lib.linux-x86_64-2.7
                    creating build/lib.linux-x86_64-2.7/HTSeq
                    copying HTSeq/__init__.py -> build/lib.linux-x86_64-2.7/HTSeq
                    copying HTSeq/_HTSeq_internal.py -> build/lib.linux-x86_64-2.7/HTSeq
                    copying HTSeq/StepVector.py -> build/lib.linux-x86_64-2.7/HTSeq
                    copying HTSeq/_version.py -> build/lib.linux-x86_64-2.7/HTSeq
                    creating build/lib.linux-x86_64-2.7/HTSeq/scripts
                    copying HTSeq/scripts/__init__.py -> build/lib.linux-x86_64-2.7/HTSeq/scripts
                    copying HTSeq/scripts/qa.py -> build/lib.linux-x86_64-2.7/HTSeq/scripts
                    copying HTSeq/scripts/count.py -> build/lib.linux-x86_64-2.7/HTSeq/scripts
                    running build_ext
                    building 'HTSeq._HTSeq' extension
                    creating build/temp.linux-x86_64-2.7
                    creating build/temp.linux-x86_64-2.7/src
                    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c src/_HTSeq.c -o build/temp.linux-x86_64-2.7/src/_HTSeq.o -w
                    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro build/temp.linux-x86_64-2.7/src/_HTSeq.o -o build/lib.linux-x86_64-2.7/HTSeq/_HTSeq.so
                    building 'HTSeq._StepVector' extension
                    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/StepVector_wrap.cxx -o build/temp.linux-x86_64-2.7/src/StepVector_wrap.o -w
                    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for Ada/C/ObjC but not for C++ [enabled by default]
                    g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro build/temp.linux-x86_64-2.7/src/StepVector_wrap.o -o build/lib.linux-x86_64-2.7/HTSeq/_StepVector.so
                    running build_scripts
                    creating build/scripts-2.7
                    copying and adjusting scripts/htseq-qa -> build/scripts-2.7
                    copying and adjusting scripts/htseq-count -> build/scripts-2.7
                    changing mode of build/scripts-2.7/htseq-qa from 664 to 775
                    changing mode of build/scripts-2.7/htseq-count from 664 to 775
                    running install_lib
                    creating /home/bq/.local/lib/python2.7/site-packages/HTSeq
                    copying build/lib.linux-x86_64-2.7/HTSeq/_StepVector.so -> /home/bq/.local/lib/python2.7/site-packages/HTSeq
                    copying build/lib.linux-x86_64-2.7/HTSeq/_HTSeq.so -> /home/bq/.local/lib/python2.7/site-packages/HTSeq
                    copying build/lib.linux-x86_64-2.7/HTSeq/StepVector.py -> /home/bq/.local/lib/python2.7/site-packages/HTSeq
                    copying build/lib.linux-x86_64-2.7/HTSeq/_version.py -> /home/bq/.local/lib/python2.7/site-packages/HTSeq
                    copying build/lib.linux-x86_64-2.7/HTSeq/_HTSeq_internal.py -> /home/bq/.local/lib/python2.7/site-packages/HTSeq
                    creating /home/bq/.local/lib/python2.7/site-packages/HTSeq/scripts
                    copying build/lib.linux-x86_64-2.7/HTSeq/scripts/qa.py -> /home/bq/.local/lib/python2.7/site-packages/HTSeq/scripts
                    copying build/lib.linux-x86_64-2.7/HTSeq/scripts/count.py -> /home/bq/.local/lib/python2.7/site-packages/HTSeq/scripts
                    copying build/lib.linux-x86_64-2.7/HTSeq/scripts/__init__.py -> /home/bq/.local/lib/python2.7/site-packages/HTSeq/scripts
                    copying build/lib.linux-x86_64-2.7/HTSeq/__init__.py -> /home/bq/.local/lib/python2.7/site-packages/HTSeq
                    byte-compiling /home/bq/.local/lib/python2.7/site-packages/HTSeq/StepVector.py to StepVector.pyc
                    byte-compiling /home/bq/.local/lib/python2.7/site-packages/HTSeq/_version.py to _version.pyc
                    byte-compiling /home/bq/.local/lib/python2.7/site-packages/HTSeq/_HTSeq_internal.py to _HTSeq_internal.pyc
                    byte-compiling /home/bq/.local/lib/python2.7/site-packages/HTSeq/scripts/qa.py to qa.pyc
                    byte-compiling /home/bq/.local/lib/python2.7/site-packages/HTSeq/scripts/count.py to count.pyc
                    byte-compiling /home/bq/.local/lib/python2.7/site-packages/HTSeq/scripts/__init__.py to __init__.pyc
                    byte-compiling /home/bq/.local/lib/python2.7/site-packages/HTSeq/__init__.py to __init__.pyc
                    running install_scripts
                    creating /home/bq/.local/bin
                    copying build/scripts-2.7/htseq-qa -> /home/bq/.local/bin
                    copying build/scripts-2.7/htseq-count -> /home/bq/.local/bin
                    changing mode of /home/bq/.local/bin/htseq-qa to 775
                    changing mode of /home/bq/.local/bin/htseq-count to 775
                    running install_egg_info
                    Writing /home/bq/.local/lib/python2.7/site-packages/HTSeq-0.5.4p3.egg-info
                    bq@bq-VirtualBox:~/Desktop/apps$ python
                    Python 2.7.3 (default, Aug 1 2012, 05:14:39)
                    [GCC 4.6.3] on linux2
                    Type "help", "copyright", "credits" or "license" for more information.
                    >>> import HTSeq
                    >>>
                    Last edited by Baoqing; 06-09-2013, 12:48 PM.

                    Comment


                    • #40
                      I don't see any error messages.

                      "Could not import 'setuptools', falling back to 'distutils'." -- Well, that's what a fall-back is for: If you don't have setuptools, it uses distutils which works as well.

                      Comment


                      • #41
                        sorry, just start to pick up the python, any reminder message freak me out.

                        Just set up the path right for htseq-count

                        then run it with my bam files:
                        samtools view accepted_hits.bam | htseq-count - gene.gtf > counts.txt

                        While the program was running, there was a warning "claimed to have an aligned mate which could be found. (Is the sam file properly sorted?)

                        Does this warning "complaining that my file is not pair mated, or do i have to add other arguments for this line to run?

                        Thanks a lot for your help!

                        Best,
                        Last edited by Baoqing; 06-09-2013, 02:56 PM.

                        Comment


                        • #42
                          Your bam/sam file needs to be name sorted for HTSeq-count, see "samtools sort -n".

                          Regards

                          Comment


                          • #43
                            Thank you, I did sort my bam files though. I have the program to finished running, and it turned out a dataset was produced, does that mean everything is good? Should I just ignore the warning message?

                            Best,

                            Comment


                            • #44
                              Originally posted by Baoqing View Post
                              Thank you, I did sort my bam files though. I have the program to finished running, and it turned out a dataset was produced, does that mean everything is good? Should I just ignore the warning message?

                              Best,
                              See this thread, that's likely what's going on.

                              Comment


                              • #45
                                awesome. The name sorted bam file worked perfect! Really appreciate it!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                66 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X