Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • missing libraries with featureCounts?

    Hello!

    I’m working with 33 RNAseq libraries, and I’m having a problem with featureCounts. I start with sorted bam files (which are named sorted_6346.bam, sorted_6347.bam all the way until sorted_6378.bam), which I then pass to featureCounts with this command:

    featureCounts -a ~/genomes/Mouse/ensembl_genome/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf —t exon -g gene_id -s 2 -p -R -M sorted_63* -o output

    The individual output files look fine, but there seems to be something wrong with the combined output table. Here, counts from the first two libraries appear to be missing. For example, if I take one particular gene, ENSMUSG00000029614:

    >grep "ENSMUSG00000029614" output
    ENSMUSG00000029614 5;5;5;5;5;5 121204481;121205406;121206638;121207077;121208196;121208782 121204552;121206445;121206810;121207384;121208575;121209241 +;+;+;+;+;+ 2433 36508 47652 50431 11667 15455 75749 15577 27682 67064 14802 12306 26099 55411 17297 52910 22243 29685 18242 36564 21280 31884 10634 75043 22386 31312 17584 5298 27524 13846 14408 21197


    As you can see, the first 6 fields are the usual ones from featureCounts:
    Geneid Chr Start End Strand Length
    ENSMUSG00000029614 5;5;5;5;5;5 121204481;121205406;121206638;121207077;121208196;121208782 121204552;121206445;121206810;121207384;121208575;121209241 +;+;+;+;+;+ 2433

    After this, there should be the counts from each of the 33 libraries (6346-6378), but there are only 31 (starting with 36508).

    To investigate further, I looked at the individual outputs:

    [ls299@themonster ensembl_genome]grep -c "ENSMUSG00000029614" sorted_63*.bam.featureCounts
    sorted_6346.bam.featureCounts:32761
    sorted_6347.bam.featureCounts:31802
    sorted_6348.bam.featureCounts:36508
    sorted_6349.bam.featureCounts:47652
    sorted_6350.bam.featureCounts:50431
    sorted_6351.bam.featureCounts:11667
    sorted_6352.bam.featureCounts:15455
    sorted_6353.bam.featureCounts:75749
    sorted_6354.bam.featureCounts:15577
    sorted_6355.bam.featureCounts:27682
    sorted_6356.bam.featureCounts:67064
    sorted_6357.bam.featureCounts:14802
    sorted_6358.bam.featureCounts:12306
    sorted_6359.bam.featureCounts:26099
    sorted_6360.bam.featureCounts:55411
    sorted_6361.bam.featureCounts:17297
    sorted_6362.bam.featureCounts:52910
    sorted_6363.bam.featureCounts:22243
    sorted_6364.bam.featureCounts:29685
    sorted_6365.bam.featureCounts:18242
    sorted_6366.bam.featureCounts:36564
    sorted_6367.bam.featureCounts:21280
    sorted_6368.bam.featureCounts:31884
    sorted_6369.bam.featureCounts:10634
    sorted_6370.bam.featureCounts:75043
    sorted_6371.bam.featureCounts:22386
    sorted_6372.bam.featureCounts:31312
    sorted_6373.bam.featureCounts:17584
    sorted_6374.bam.featureCounts:5298
    sorted_6375.bam.featureCounts:27524
    sorted_6376.bam.featureCounts:13846
    sorted_6377.bam.featureCounts:14408
    sorted_6378.bam.featureCounts:21197

    As you can see, there are in fact counts for the first two libraries, it just looks like they are missing in the combined table.

    Any ideas as to what’s going on?

    Thanks a lot!

  • #2
    Hi, are you using the latest version (1.5.0-p1)?

    Comment


    • #3
      Hi, thanks for the reply.

      Yes, I am:
      featureCounts -v
      featureCounts v1.5.0-p1

      Comment


      • #4
        I noticed that the '—t' option in your command includes a long dash, which is invalid. Could you replace it with a hyphen and then reran your command? This invalid option might cause problems for processing the parameters after it by featureCounts.

        Comment


        • #5
          Yes, that seems to have worked! Thank you so much.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          9 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          51 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X