Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #76
    Hi

    Originally posted by agali View Post
    2_512_865_F3 16 Esi0595_0002 conserved unknown protein [1335] f:2354-3688 613 255 3H47M * 0 0 * AS:i:347
    Your SAM file is incorrect. According to the specs, a SAM file has the following fields:

    Code:
    <QNAME> <FLAG> <RNAME> <POS> <MAPQ> <CIGAR> <MRNM> <MPOS> <ISIZE> <SEQ> <QUAL> [<TAG>:<VTYPE>:<VALUE> [...]]
    I try to align your fields to the field names:

    QNAME: 2_512_865_F3
    FLAG: 16
    RNAME: Esi0595_0002 conserved unknown protein [1335] f:2354-3688 (assuming these are all spaces and no tags in here)
    POS: 613
    MAPQ: 255
    CIGAR: 3H47M
    MRNM: *
    MPOS: 0
    ISIZE: 0
    SEQ: *
    QUAL: AS:i:347
    TAG:VTYPE:VALUE:

    Obviously, "AS:i:347" is a tag and should hence be in the 12th column. It is, however, in the 11th column, and hence read as quality string.

    Where did you get this SAM file from?

    Simon

    Comment


    • #77
      Hi Simon,

      The SAM file is from SHRiMP. I looked up the file format specification and I think there should be a '*' in the QUAL field when there is a '*' in the SEQ field..
      I will try to put an extra column in my SAM file and then run it on HTSeq.

      Thanks!
      Aga

      Comment


      • #78
        Hi,
        I am trying to install 04.5p5 on windows. I get this error when I run the setup.py on a shell
        Traceback (most recent call last):
        File "C:\Python26\Lib\site-packages\HTSeq-0.4.5p5\setup.py", line 62, in <module>
        'scripts/htseq-count',
        File "C:\Python26\lib\distutils\core.py", line 140, in setup
        raise SystemExit, gen_usage(dist.script_name) + "\nerror: %s" % msg
        SystemExit: usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
        or: setup.py --help [cmd1 cmd2 ...]
        or: setup.py --help-commands
        or: setup.py cmd --help

        error: no commands supplied
        Please help, all help is very much appreciated.
        Thanks
        Manoj

        Comment


        • #79
          No Feature

          Hi!
          I wonder if it possible to retrieve the id of the reads that has "no feature" in htseq-count.
          I'm interested in those reads that do not overlap with any annotated gene.
          I would really appreciate any suggestion.
          Thanks
          Best regards.


          Alvaro Pena

          Comment


          • #80
            Hi

            Originally posted by mmpillai View Post
            Hi,
            I am trying to install 04.5p5 on windows. I get this error when I run the setup.py on a shell
            Traceback (most recent call last):
            File "C:\Python26\Lib\site-packages\HTSeq-0.4.5p5\setup.py", line 62, in <module>
            'scripts/htseq-count',
            File "C:\Python26\lib\distutils\core.py", line 140, in setup
            raise SystemExit, gen_usage(dist.script_name) + "\nerror: %s" % msg
            SystemExit: usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
            or: setup.py --help [cmd1 cmd2 ...]
            or: setup.py --help-commands
            or: setup.py cmd --help

            error: no commands supplied
            Please help, all help is very much appreciated.
            Thanks
            Manoj
            Please read the installation instructions:



            I haven't made a Windows binary package for a while, though.

            (I still have trouble understanding why anybody would want to do HTS bioinformatics on Windows. Nearly all bioinformatics developers work on GNU systems (Linux or Mac OS). Ensuring that a tool developed on Linux works on a Mac, or vice versa, is trivial, but supporting Windows is always extra work, and hence has low priority for us developers, which makes Windows a bad choice for users, too.)

            Simon

            Comment


            • #81
              Hi Alvaro

              Originally posted by alvin View Post
              I wonder if it possible to retrieve the id of the reads that has "no feature" in htseq-count.
              I'm interested in those reads that do not overlap with any annotated gene.
              I would really appreciate any suggestion.
              As you are by now the fourth person requesting this feature, I thought I'd no longer only make promises to add it in the future but rather get it done. :-)

              The new version, 0.4.7, now offers a new option, "-o", for htseq-count. If you add '-o', followed by a filename, a SAM file of this name will be written that contains the same lines as the input SAM file, but with each line appended by an optional field, with tag 'XF', that indicates how the read was counted, i.e., it is either a gene name, or a special counter name like "no_feature". With grep and cut, you can then get what you want.

              Simon

              Comment


              • #82
                Originally posted by Simon Anders View Post
                Hi Alvaro



                As you are by now the fourth person requesting this feature, I thought I'd no longer only make promises to add it in the future but rather get it done. :-)

                The new version, 0.4.7, now offers a new option, "-o", for htseq-count. If you add '-o', followed by a filename, a SAM file of this name will be written that contains the same lines as the input SAM file, but with each line appended by an optional field, with tag 'XF', that indicates how the read was counted, i.e., it is either a gene name, or a special counter name like "no_feature". With grep and cut, you can then get what you want.

                Simon
                Great! I found the -o option very useful.
                Thank you very much for your help.
                Best Regards


                Álvaro Pena

                Comment


                • #83
                  Originally posted by Simon Anders View Post
                  Hi Keith
                  At the moment, HTSeq can natively only work with SAM files. Adding BAM support is on my to-do list, and of course, I would do it by simply wrapping the samtools.

                  Cheers
                  Simon
                  Hi Simon,

                  is BAM support in HTSeq coming soon!?!

                  Keep up the good work!

                  Comment


                  • #84
                    htseq-count for miRNA

                    I am using "htseq-count" to count the miRNA using their genomic coordinates. It worked very well. But, I am also interested in a more detailed output. I want an output with each and every aligned read and their counts. The reason for this is, there are lot of miRNA length variants, mature star and precursor sequences. It would be nice to see the proportion of different reads. Right now, I can only see the counts of all precursor miRNAs.

                    I would like to know if there is any way to get that information and can provide some hints that will be highly appreciated.

                    Thank you in advance.

                    Comment


                    • #85
                      Hi Simon,
                      I heed your advice re: the OS - I have succesfully installed HTSeq in my linux system. I wanted to install it from binary on my Mac, but the binary package is not available for download on PyPI. (I dont want to download XCode - seems like it is >3.5 Gb in size ).
                      Thanks again, bioinformatics clearly being the bottleneck for high throughput applications, packages such as yours is clearly very helpful.
                      Manoj

                      Comment


                      • #86
                        Hi Manoj,

                        I don't provide binary packages for MacOS -- it's too complicated as I don't have a Mac myself. Please install XCode and use it. (Actually, you only need GCC, but installing all of XCode is easiest.)

                        XCode comes with MacOS and can be found on the second of the two MacOS installation CDs. So, there is no need to download it.

                        (If a Mac user reading this wants to help out Manoj: Run the command 'python setup.py bdist' in the unpacked tarball, and a binary package will be built automatically and packed into a single file.)

                        Simon

                        Comment


                        • #87
                          Originally posted by Simon Anders View Post
                          Hi Manoj,

                          I don't provide binary packages for MacOS -- it's too complicated as I don't have a Mac myself. Please install XCode and use it. (Actually, you only need GCC, but installing all of XCode is easiest.)

                          XCode comes with MacOS and can be found on the second of the two MacOS installation CDs. So, there is no need to download it.

                          (If a Mac user reading this wants to help out Manoj: Run the command 'python setup.py bdist' in the unpacked tarball, and a binary package will be built automatically and packed into a single file.)

                          Simon
                          Here we go!
                          Attached Files

                          Comment


                          • #88
                            Simon and Marcora: thanks much !

                            Comment


                            • #89
                              Hi Simon,


                              In one of my datasets, I'm getting a lot of these warnings:

                              Read ILLUMINA-GA_0000:8:36:18294:7129#0 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)

                              If I grep for these reads in the SAM file I do find the two mates:

                              Code:
                              ILLUMINA-GA_0000:8:36:18294:7129#0    163     chrY    59342791        255     38M     =       59342801        0 CAGAGGGCAGCAGGAGCAGCAGCAGCAGCAGCAGCAGC hdhhehhhhhhgghhghghgahhff[fhacfdaahhgh  NM:i:0  NH:i:1  XS:A:+
                              ILLUMINA-GA_0000:8:36:18294:7129#0    83      chrY    59342801        255     38M     =       59342791        0 CAGGAGCAGCAGCAGCAGCAGCAGCAGCAGCAACAACA abaQWdffRbWWffWfd]aa_ggfggcgfgfgggfggg  NM:i:1  NH:i:1  XS:A:+
                              Questions:

                              1) Why is this warning coming up?
                              2) When this warning appears, is the read discarded? I'm getting results that are not making a lot of sense to me:
                              Code:
                              The command:
                              htseq-count -s yes -i gene_id -m intersection-nonempty accepted_hits.sam /scratch/fdgarcia/data/gtfs/Homo_sapiens.GRCh37.60.gtf > counts.txt
                              
                              
                              Results for ~210000000 reads:
                              
                              no_feature      130841007
                              ambiguous       51826
                              too_low_aQual   0
                              not_aligned     0
                              alignment_not_unique    66886614
                              Thanks!

                              Comment


                              • #90
                                Hi Fennan

                                Originally posted by fennan View Post
                                In one of my datasets, I'm getting a lot of these warnings:

                                Read ILLUMINA-GA_0000:8:36:18294:7129#0 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)

                                If I grep for these reads in the SAM file I do find the two mates:
                                ...
                                Well, is the SAM file properly sorted?

                                If you use htseq-count on paired-end data, you need to make sure that all SAM lines referring to the same read pair are in adjacent lines. To this end, you need to sort the SAM file by read name. (Just run it through the standard Unix 'sort' command.)

                                Simon

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin


                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                                  Yesterday, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                55 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                51 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                45 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                55 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X