Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • HTseq-count not giving results

    Hello,
    I am trying to use htseq-count to obtain counts per gene for my aligned files using Tophat (mm10 genome). I ran htseq on one of my samples and it seems to have worked although the counts seem to be very low overall for a 80 million read sample. I ran it a second time on another sample from the same experiment and the output file is empty. Any idea what might be wrong? my command for htseq-count is:
    htseq-count -s reverse -q 21allsort.sam ./../igenome.mm10.mouse.genes.gtf > 21all.counts

    I then tried a different sample in my study and still got a file of size 0. Has anyone seen this? I am not getting any errors just basically a null output.

    Thanks.

  • #2
    Are you sure that htseq-count has not given you some error message besides writing an empty file?

    Comment


    • #3
      NO, I am not positive of this fact. I don't see any other files generated and I used the option -q because I get 1000's of the warning signals of no chrM alignment. Perhaps the error message could be suppressed because of the -q option? Or is there an argument i can use to print out the errors into a file?
      Thanks.

      Comment


      • #4
        The newest version now doesn't show these warnings any more, so maybe try again with that one.

        Also, in general, if you want to catch the error messages printed by a shell command into a file, use the '2>' operator:

        htseq-count -s reverse -q 21allsort.sam ./../igenome.mm10.mouse.genes.gtf > 21all.counts 2> error.log

        Comment


        • #5
          The newest version we have installed is htseq/0.5.4p1.
          I am going to try to rerun it with this version and catching errors.

          thanks,
          Jen

          Comment


          • #6
            So even with catching the errors, which there are none as the error.log file is empty and using the newer version of HTseq, I still get no result with htseq-count for certain samples in my study.

            Comment


            • #7
              Please post the exact command you were using, the first couple of lines of the involved files and all output that is written onto your screen or into any files.

              Comment


              • #8
                Here is the command I used:
                htseq-count -s reverse -q 22all_sort.sam ./../igenome.mm10.mouse.genes.gtf > 22all.counts 2> error.log
                ------------------------------------------------------------------------------
                the igenomes file first 10 lines:
                chr1 unknown exon 3214482 3216968 . - . gene_id "Xkr4"; transcript_id "NM_001011874"; gene_name "Xkr4"; p_id "P2638"; tss_id "TSS25343";
                chr1 unknown stop_codon 3216022 3216024 . - . gene_id "Xkr4"; transcript_id "NM_001011874"; gene_name "Xkr4"; p_id "P2638"; tss_id "TSS25343";
                chr1 unknown CDS 3216025 3216968 . - 2 gene_id "Xkr4"; transcript_id "NM_001011874"; gene_name "Xkr4"; p_id "P2638"; tss_id "TSS25343";
                chr1 unknown CDS 3421702 3421901 . - 1 gene_id "Xkr4"; transcript_id "NM_001011874"; gene_name "Xkr4"; p_id "P2638"; tss_id "TSS25343";
                chr1 unknown exon 3421702 3421901 . - . gene_id "Xkr4"; transcript_id "NM_001011874"; gene_name "Xkr4"; p_id "P2638"; tss_id "TSS25343";
                chr1 unknown CDS 3670552 3671348 . - 0 gene_id "Xkr4"; transcript_id "NM_001011874"; gene_name "Xkr4"; p_id "P2638"; tss_id "TSS25343";
                chr1 unknown exon 3670552 3671498 . - . gene_id "Xkr4"; transcript_id "NM_001011874"; gene_name "Xkr4"; p_id "P2638"; tss_id "TSS25343";
                chr1 unknown start_codon 3671346 3671348 . - . gene_id "Xkr4"; transcript_id "NM_001011874"; gene_name "Xkr4"; p_id "P2638"; tss_id "TSS25343";
                chr1 unknown exon 4290846 4293012 . - . gene_id "Rp1"; transcript_id "NM_001195662"; gene_name "Rp1"; p_id "P10825"; tss_id "TSS5726";
                chr1 unknown stop_codon 4292981 4292983 . - . gene_id "Rp1"; transcript_id "NM_001195662"; gene_name "Rp1"; p_id "P10825"; tss_id "TSS5726";
                ------------------------------------------------------------------------------

                the first ten lines of my sorted sam file:
                D192UACXX:6:1101:01077:12646 99 chr15 64254176 50 101M = 64254179 104 CCTNCTCCCATGCTCCTATCATCCCTTCTATCATCCTCTTCCCTACTCGACATTCTTCATAGCACCTTTCACACTACTGCCTGAGTGCTTATTTGCTTGAN C@C#4ADDHFFHFIIJIIJJJJIJJJJIHHIBGIGJJIJJDHJIIIFIJIGGIIGIJJGIJJIJIFGIJIGIHEE>ED@DEFDAECACEDCACD5;CC### AS:i:-2 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:3A96T0 YT:Z:UU NH:i:1 XS:A:-
                D192UACXX:6:1101:01077:12646 147 chr15 64254179 50 101M = 64254176 -104 NCTCCCATGCTCCTATCATCCCTTCTATCATCCTCTTCCCTACTCGACATTCTTCATAGCACCTTTCACACTACTGCCTGAGTGCTTATTTGCTTGATTTC ###A93CA99>CC;C@?C=A?7DFDCHCE?9HEDC;@IGCB8:?IHDFB?99*@AGHEIJIIIHDAEGHF<EFIIIGA;JIIIIIFIIHDHD8FFFFF@@@ AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:0A100 YT:Z:UU NH:i:1 XS:A:-
                D192UACXX:6:1101:01077:13595 163 chr9 64889674 50 58M3554N43M = 64889728 3709 GCACAGTTCAAAAGAGCTTTGAAGAAACATCCACACTTACCACAGACAGCGCTCTCAGGTGGCCGGTCTGATCTTGGATATAATTCATTATCTAAGGACAN BC@FFDDFHGHHHJHIJJJIJIGHIIJIJJHIJJJIIJJJJJJJJJIJJGGHHJJJGJJFGGIIHHDDDECDEDDDDCDECDDEEDDEEEEDDDCCCDDC# AS:i:-1 XM:i:1 XO:i:0 XG:i:0 MD:Z:100A0 NM:i:1 XS:A:+ NH:i:1
                D192UACXX:6:1101:01077:13595 83 chr9 64889728 50 4M3554N97M = 64889674 -3709 NCAGGTGGCCGGTCTGATCTTGGATATAATTCATTATCTAAGGACAATGTTAGAAGAGAGGGTCCATCTACTGAAGACATTCAAGGGGAAAAGGAGANAAG #CBBDDDB?<DFFDFFHHHHHFEJIGJJJJJJIJIJJJJJJJJJJJJJJJIJIJJJJJJHIIGJJJJIJJIIJJJJIJJJJJJJJJJJHHHHHDDA4#BB@ AS:i:-2 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:0T96A3 YT:Z:UU XS:A:+ NH:i:1
                D192UACXX:6:1101:01077:19692 163 chr8 124880350 50 3M3003N98M = 124883444 3195 CTGATAACAAACTTCCTGAGGAAGTGGTCCAGTCCAGCATCCTCCTGCACTCTAACCTGGCCAGCCTTGTCAAAGACCAGGTGGTTCTGAAAATGAACTCN BBBFFFFFHHHHHJIJJJHIJIIJFHIHJJJJJJJJJJJJJJJIJJJIJJHIJHIJJJJJJJJJJJJJJJHHHHHHFFFFDACD@BDDDECDDDDDDDDC# AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:3 MD:Z:100T0 YT:Z:UU XS:A:+ NH:i:1
                D192UACXX:6:1101:01077:19692 83 chr8 124883444 50 101M = 124880350 -3195 NATGAACTCTGGAAGCTCACAAGTGGTCAACGGGCTTGTGCCCGAACATATCGCCCTCCTCATGTGCTCTGCCTACAGGAACCAGCTGCTCAACATTNTTG #AEDDADDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDCDFFHHHHJJIHFIIIJJIJJJJJJJJJJIJJJJJJJJJJJJJJJJJIJJHHHHHFDB4#CCC AS:i:-2 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:0A96T3 YT:Z:UU NH:i:1 XS:A:+
                D192UACXX:6:1101:01077:35330 177 chr1 24612504 50 101M chrM 10022 0 NAAACTAAGATGGTGATGGGGATTGGTATGGAGCTTATGGAGTTGGAGTTTAGGGAAGTTACTGAAGTTATAATAAATAAGGATAATACTATGCCTTCCAG #EDDEEEEEEEFFFFFHHHHJJJJJJJJJJJJIIIGJJJJJJJIJJJJJIHIJJJJJJJJJJJJJJJJJJIJJJJJHHJJIJJHIHIHHFHHHFFFFFCCC AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:0A100 YT:Z:UU NH:i:1 XS:A:-
                D192UACXX:6:1101:01077:35330 113 chrM 10022 50 101M chr1 24612504 0 NAAACTCCAACTCCATAAGCTCCATACCAATCCCCATCACCATCTTAGTTTTCGCAGCCTGCGAAGCAGCTGTAGGACTAGCCCTACTAGTAAAAGTNTCA #CC@C@DDCCDCDCDCDDCB>DDDCCDDDBADDEEC??;FFHEHHIJJJJIEJJJJGJJJJJJJJIJJJJJJIJJIIJIJHJIHJHIGHHHHHFDA4#CCC AS:i:-2 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:0T96T3 YT:Z:UU NH:i:1 XS:A:+
                D192UACXX:6:1101:01077:50524 99 chr3 84160839 50 101M = 84160927 189 GTCNGGAATGGTGAGAAAGAACGTTAGCAGAGGGTACTGTGTTGATAGCACAACGTCCCACGTTATAAAGCAAAAGCATGAGGTTAAGGCTGTGTTGCTTN @CC#4AABDHHCFIHIIIJGJIJIIJIGJJJCHJ?DHHIIJIFHIIE@BGEFHIE@FHIJEECHHFEFFFEEDEEEBDDDDDD?AC>CCCDA?CCABD>@# AS:i:-2 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:3A96T0 YT:Z:UU NH:i:1 XS:A:-
                D192UACXX:6:1101:01077:50524 147 chr3 84160927 50 101M = 84160839 -189 NCTGTGTTGCTTTCAAAAGTTAATGGAAGTGCTGTACATTCATTGGAACAACATAGCTGATTATTAATCTCTTCTAGAATACTAATGAATCGCATCCATAC #BDDEADEEFCDBFGHHEEECJJIIGJIJGGIIHDFFHGGJJJIEHAGHGJJIIHEGDAGCIHD>DHIJJHIGHIIGJIHFFIIIHJIHHHHHFFEDD@B@ AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:0G100 YT:Z:UU NH:i:1 XS:A:-
                ------------------------------------------------------------------------------
                the program has been running for about 2 hours and the output directory files real time snapshot:
                -rw-r--r-- 1 barbj MSCL 60566222168 Mar 26 12:37 22all_sort.sam
                -rw-r--r-- 1 barbj MSCL 0 Mar 26 12:45 22all.counts
                -rw-r--r-- 1 barbj MSCL 0 Mar 26 12:45 error.log

                Comment


                • #9
                  Well, maybe it is counting. The counts are written out only at the very end, and given that your SAM file is quite large, this might take quite a while. As you have switched off progress reports ("-q") you cannot say whether it is getting somewhere.

                  Comment


                  • #10
                    OK. I will rerun with the -q parameter off. I thought that I had let it run before over an entire weekend and when I came in on Monday the job had completed but I still had a file of size 0.
                    I will start the run now.
                    Thanks. I really hope that is it!!

                    Comment


                    • #11
                      I just wanted to check back in that the program worked once i turned off the -

                      Comment


                      • #12
                        Just wanted to say that htseq-count worked once I turned off the -q parameter. I am now running it on all of my samples.

                        Is there a way to obtain exon level counts? I suppose I would have to modify my gtf file for this. Is that the best way?

                        Comment


                        • #13
                          htseq-count: all the counts are 0

                          Originally posted by Simon Anders View Post
                          Are you sure that htseq-count has not given you some error message besides writing an empty file?
                          Hello Simon




                          Command used is:
                          I have passed the sorted bam converted into sam using

                          samtools sort -n <accepted_hits_chr22.bam> <out_sorted>
                          samtools view <sorted.bam> <sorted.sam>

                          Then countng No. of reads per fearture using .gff reference prepared using python script
                          As the dexseq_count.py script is not working i.e. giving all counts 0 then I decided to use HTSEQ-count but its also not working:::::::::::

                          htseq-count -f sam -s no -r name -m union -t exonic_part /home/svashisht/Cariplo_data_analysis_29-05-14/TopHat2_Alignments_CariploData/DNA13058-001-L1/accepted_hits_chr22_sorted.sam /home/svashisht/Cariplo_data_analysis_29-05-14/DEXSeqAnalysis/new.gff >/home/svashisht/Cariplo_data_analysis_29-05-14/DEXSeqAnalysis/DNA13058-008-L1.txt

                          Warning: 2474344 reads with missing mate encountered.

                          Results:: pasting head of the results file:

                          ENSG00000000003 0
                          ENSG00000000005 0
                          ENSG00000000419 0
                          ENSG00000000457 0
                          ENSG00000000460 0
                          ENSG00000000938 0
                          ENSG00000000971 0
                          ENSG00000001036 0
                          ENSG00000001084+ENSG00000231683 0
                          ENSG00000001167 0
                          ENSG00000001460 0
                          ENSG00000001461 0
                          ENSG00000001497 0
                          ENSG00000001561 0
                          ENSG00000001617 0
                          ENSG00000001626 0
                          ENSG00000001629 0
                          ENSG00000001630+ENSG00000240720 0
                          ENSG00000001631+ENSG00000243107 0
                          ENSG00000002079 0
                          ENSG00000002330 0
                          ENSG00000002549 0
                          ENSG00000002586 0
                          ENSG00000002587 0
                          ENSG00000002726 0
                          ENSG00000002745 0
                          ENSG00000002746 0
                          ENSG00000002822 0
                          ENSG00000002834 0

                          In the end of the results file

                          __no_feature 2410288
                          __ambiguous 23795
                          __too_low_aQual 0
                          __not_aligned 0
                          __alignment_not_unique 2782635

                          If you have any idea regarding this please help

                          Thanks in advance

                          Regards

                          Shikha Vashisht
                          Last edited by shikha5; 11-21-2014, 06:03 AM.

                          Comment


                          • #14
                            You have VERY few reads. You might want to view the alignments in IGV or a similar program and see if you actually have any alignments to genes or not. Also, you can use the -o option to see how individual alignments are being treated. This is quite useful to debug things.

                            Comment


                            • #15
                              Thanks Devon for you reply

                              Actually I have tried same thing on full alignments file and the results were same so in order to understand the error i am doing this for only chr22 aligned reads.
                              yes, I did check on IGV and in fact there are reads mapping.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X