Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with HTSeq-count

    Hi All,
    I have a problem in counting reads using HTSe-count. The error is like,

    csg@csg-W650EH:~/Downloads$ python -m HTSeq.scripts.count Nipponbare_ref_assembly.sam chr1.gff3
    Error occured in line 3 of file chr1.gff3.
    Error: Feature LOC_Os01g01010.1:exon_1 does not contain a 'gene_id' attribute
    [Exception type: SystemExit, raised in count.py:55]
    csg@csg-W650EH:~/Downloads$

    Following is a portion of the gff3 file,
    ##gff-version 3
    Chr1 MSU_osa1r7 mRNA 2903 10817 . + . ID=LOC_Os01g01010.1;Name=LOC_Os01g01010.1;Parent=LOC_Os01g01010
    Chr1 MSU_osa1r7 exon 2903 3268 . + . ID=LOC_Os01g01010.1:exon_1;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 exon 3354 3616 . + . ID=LOC_Os01g01010.1:exon_2;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 exon 4357 4455 . + . ID=LOC_Os01g01010.1:exon_3;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 exon 5457 5560 . + . ID=LOC_Os01g01010.1:exon_4;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 exon 7136 7944 . + . ID=LOC_Os01g01010.1:exon_5;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 exon 8028 8150 . + . ID=LOC_Os01g01010.1:exon_6;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 exon 8232 8320 . + . ID=LOC_Os01g01010.1:exon_7;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 exon 8408 8608 . + . ID=LOC_Os01g01010.1:exon_8;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 exon 9210 9617 . + . ID=LOC_Os01g01010.1:exon_9;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 exon 10104 10187 . + . ID=LOC_Os01g01010.1:exon_10;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 exon 10274 10430 . + . ID=LOC_Os01g01010.1:exon_11;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 exon 10504 10817 . + . ID=LOC_Os01g01010.1:exon_12;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 five_prime_UTR 2903 3268 . + . ID=LOC_Os01g01010.1:utr_1;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 five_prime_UTR 3354 3448 . + . ID=LOC_Os01g01010.1:utr_2;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 CDS 3449 3616 . + . ID=LOC_Os01g01010.1:cds_1;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 CDS 4357 4455 . + . ID=LOC_Os01g01010.1:cds_2;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 CDS 5457 5560 . + . ID=LOC_Os01g01010.1:cds_3;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 CDS 7136 7944 . + . ID=LOC_Os01g01010.1:cds_4;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 CDS 8028 8150 . + . ID=LOC_Os01g01010.1:cds_5;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 CDS 8232 8320 . + . ID=LOC_Os01g01010.1:cds_6;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 CDS 8408 8608 . + . ID=LOC_Os01g01010.1:cds_7;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 CDS 9210 9617 . + . ID=LOC_Os01g01010.1:cds_8;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 CDS 10104 10187 . + . ID=LOC_Os01g01010.1:cds_9;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 CDS 10274 10297 . + . ID=LOC_Os01g01010.1:cds_10;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 three_prime_UTR 10298 10430 . + . ID=LOC_Os01g01010.1:utr_3;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 three_prime_UTR 10504 10817 . + . ID=LOC_Os01g01010.1:utr_4;Parent=LOC_Os01g01010.1
    Chr1 MSU_osa1r7 mRNA 2984 10562 . + . ID=LOC_Os01g01010.2;Name=LOC_Os01g01010.2;Parent=LOC_Os01g01010
    Chr1 MSU_osa1r7 exon 2984 3255 . + . ID=LOC_Os01g01010.2:exon_1;Parent=LOC_Os01g01010.2
    Chr1 MSU_osa1r7 exon 3354 3616 . + . ID=LOC_Os01g01010.2:exon_2;Parent=LOC_Os01g01010.2
    Chr1 MSU_osa1r7 exon 4357 4455 . + . ID=LOC_Os01g01010.2:exon_3;Parent=LOC_Os01g01010.2








    Kindly consider that I am a fresher in Rna seq data analysis.

    Thank you,
    anikng

  • #2
    htseq-counts default for identity is the gene_id attribute from the gff file. Since your gff file does not have a gene_id attribute, it has nothing to use.

    if you change the -i flag to ID (which is how your gff is formatted) it should be happier.

    Comment


    • #3
      Thank you jparsons...
      As you suggested, i changed the attribute name from 'ID' to 'gene_id' in the gff file and now it is working...

      anikng

      Comment


      • #4
        Hi All,

        I have an error while running HTSeq count tool. The error is like "seq' and 'qualstr' do not have the same length." I have seen someone having this error and discussion was going on related to bug issue. But n my case i feel it is different..

        csg@csg-W650EH:~/Downloads/bowtie-0.12.8$ python -m HTSeq.scripts.count SEEDLING_ROOT_ORIGINA.sam all.gff3 >> SEDLINGROOT_output.txt

        The output of the above command giving the SEDLINGROOT_output.txt file with zero reads for all transcript. As somebody mentioned in this forum, i noticed that the chromosome is indicated as 'Chr*' in gff and something like 'gi...' in sam file. when i changed the 'gi..' id to coresponding Chr id (ie, like in the gff file), the "seq' and 'qualstr' do not have the same length error is displayed.

        It is said that error is in line number 19..below is the first few lines of the sam file in which I am getting error..
        @HD VN:1.0 SO:unsorted
        @SQ SN:gi|297598437|ref|NC_008394.4| LN:45064769
        @SQ SN:gi|297600179|ref|NC_008395.2| LN:36823111
        @SQ SN:gi|297602023|ref|NC_008396.2| LN:37257345
        @SQ SN:gi|297603645|ref|NC_008397.2| LN:35863200
        @SQ SN:gi|297605017|ref|NC_008398.2| LN:30039014
        @SQ SN:gi|297606578|ref|NC_008399.2| LN:32124789
        @SQ SN:gi|297607852|ref|NC_008400.2| LN:30357780
        @SQ SN:gi|297609017|ref|NC_008401.2| LN:28530027
        @SQ SN:gi|297610002|ref|NC_008402.2| LN:23843360
        @SQ SN:gi|297611005|ref|NC_008403.2| LN:23661561
        @SQ SN:gi|297612483|ref|NC_008404.2| LN:30828668
        @SQ SN:gi|297613623|ref|NC_008405.2| LN:27757321
        @PG ID:Bowtie VN:0.12.7 CL:"bowtie -S /home/csg/Downloads/bowtie-0.12.8/indexes/rice /home/csg/Downloads/bowtie-0.12.8/SEDLING_ROOT.fastq"
        SEDLING_ROOT.1 0 Chr1 33035177 255 35M * 0 0 GGTTGCTTTTAGAGAAACTTGGACACTTTGTTTAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII.I XA:i:1 MD:Z:25T9 NM:i:1
        SEDLING_ROOT.2 0 Chr1 29608234 255 35M * 0 0 GGCAACGGATATCTCGGCTCTCGCATCGATGAAGA IIIIIIIIIIIIIIIIIIIIII/I&FI1I5II8'3 XA:i:0 MD:Z:35 NM:i:0
        SEDLING_ROOT.3 0 Chr1 29608235 255 35M * 0 0 GCAACGGATATCTCGGCTCTCGCATCGATGAAGAA IIIIIIIIIIIIIIIIIIIIIII;III*.I?I51G XA:i:0 MD:Z:35 NM:i:0
        SEDLING_ROOT.4 0 Chr1 29606240 255 35M * 0 0 GTCATATGCTTGTCTCAAAGATTAAGCCATGCATG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIDIII XA:i:0 MD:Z:35 NM:i:0
        SEDLING_ROOT.5 0 Chr1 29608234 255 35M * 0 0 GGCAACGGATATCTCGGCTCTCGCACCGATGAAGA IIIIIIIIIIIIIIIIIIIIHI2F4'0+=,($6@( XA:i:1 MD:Z:25T9 NM:i:1
        SEDLING_ROOT.6 16 Chr1 29607225 255 35M * 0 0 ATACCGTCCTAGTCTCAACCATAAACGATGCCGAC I87I<?IIIIIIIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:35 NM:i:0
        SEDLING_ROOT.7 16 Chr1 29603552 255 35M * 0 0 AAGCTACCGTGTGCCGGATTATGACTGAACGCCTC =6+II8HII2IIIIIICIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:35 NM:i:0
        SEDLING_ROOT.8 16 Chr1 29607831 255 35M * 0 0 GCGGTGACTACGTCCCTGCCCTTTGTACACACCGC **.$I@(IIII:IFIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:3T31 NM:i:1
        SEDLING_ROOT.9 0 Chr1 29606233 255 35M * 0 0 GCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCC IIIIIIIIIIIIIIIIII4IIIIIIIIIIIIII8I XA:i:0 MD:Z:35 NM:i:0
        SEDLING_ROOT.10 HWI-EAS80_2_FC20AWLAAXX:8:1:882:460 length=35 4 * 0 0 * * 0 0 GTGAACTATGCCTGAGCGGGGCGAAGCCAGAGGAA IIIIIIIIIIIIIIIIIIIIIIIIIIII'7A@I)8 XM:i:0
        SEDLING_ROOT.11 HWI-EAS80_2_FC20AWLAAXX:8:1:891:382 length=35 4 * 0 0 * * 0 0 GTGTTGGTCGATTAAGACAGCAGGACGGTGGTCCT IIIIIIIIIIIIIIIIIIIIIIIIFFII;IE%$-> XM:i:0
        SEDLING_ROOT.12 0 Chr1 29606966 255 35M * 0 0 GTTACTTTGAAGAAATTAGAGTGCTCAAAGCAAGC IIIIIIIIIIIIIIIIIIIIIIIIIIIII4II+,C XA:i:0 MD:Z:35 NM:i:0
        SEDLING_ROOT.13 16 Chr1 29608348 255 35M * 0 0 GCGTGCGGGCCGGGGGCACGCCTGCCTGGGCGTCA &(+&+.*1(1$AIII:IIIIIIIIIIIIIIIIIII XA:i:1 MD:Z:2C0A0T1C5A22 NM:i:5
        SEDLING_ROOT.14 0 Chr1 29608240 255 35M * 0 0 GGATATCTCGGCTCTCGCATCGATGAAGAAAGTAG IIIIIIIIIIIIIIIIIIIIIIII?IIIII+-)=& XA:i:0 MD:Z:30C4 NM:i:1
        SEDLING_ROOT.15 0 Chr1 29608232 255 35M * 0 0 TCGGCAACGGATATCTCGGCTCTCGCATCGATGAA IIIIIIIIIIIIIIIIIIIIII:34I+I2B$II$I XA:i:0 MD:Z:35 NM:i:0
        SEDLING_ROOT.16 HWI-EAS80_2_FC20AWLAAXX:8:1:930:891 length=35 4 * 0 0 * * 0 0 GAACTATGCCTGAGCGGGGCGAAGCCAGAGGAAAC IIIIIIIIIIIIIIIIIII8I1II7CII&II*2?1 XM:i:0
        SEDLING_ROOT.17 16 Chr1 29606373 255 35M * 0 0 TTCTAGAGCTAATACGTGCAACAAACCCCGACTCC :57BIIII=IIIIIIIIIIIIIIIIIIIIIIIIII XA:i:2 MD:Z:7A25T1 NM:i:2
        SEDLING_ROOT.18 16 Chr1 29606730 255 35M * 0 0 GGAATGAGTACAATCTAAATCCCTTAACGAGGATC >;IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:35 NM:i:0
        SEDLING_ROOT.19 16 Chr1 28694491 255 35M * 0 0 CAGCATGTGTAAACTATTTTGCTTATTCACTGATC 2I7GII4IIIIIIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:35 NM:i:0


        Is this because of the error in editing the sam file? Kindly suggest some solution for this..


        Thank you,

        anikng
        Seoul

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        66 views
        0 likes
        Last Post seqadmin  
        Working...
        X