Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem generating counts with featureCount

    Hello, I am learning how to do RNA-seq analysis following this tutorial:

    Last week I ran a one-day workshop on RNA-seq data analysis in the UVA Health Sciences Library. I set up an AWS public EC2 image with all the necessary software installed. Participants logged into AWS, launched the image, and we kicked off the morning ...


    I'm having a problem generating a table of gene counts using the program featureCounts on my annotation file (GTF) and aligned sequences (BAM).

    Here's what I'm typing into the unix shell:

    Code:
    $ featureCounts -a Homo_sapiens.GRCh38.82.gtf -o counts.txt -t exon -g gene_name -T 4 */accepted_hits.bam
    I don't get any errors and the program seems to handle everything ok, here is an example output for one of the files:

    Code:
     Process BAM file trimmed_uvb3.fastq_tophat/accepted_hits.bam...           
    ||    Single-end reads are included.                                          
    ||    Assign reads to features...                                             
    ||    Total reads : 129858                                                    
    ||    Successfully assigned reads : 98419 (75.8%)                             
    ||    Running time : 0.00 minutes
    BUT when I open my counts.txt file, I don't have any hits. All I get back is a table full of zeros for every gene.

    Does anyone know why this is happening?
    Last edited by Joshua.Urrutia; 11-23-2015, 11:29 AM.

  • #2
    Also, here is what the summary file looks like:

    Code:
    $ less counts.txt.summary 
    
    Status 
    trimmed_ctl1.fastq_tophat/accepted_hits.bam    
    trimmed_ctl2.fastq_tophat/accepted_hits.bam    
    trimmed_ctl3.fastq_tophat/accepted_hits.bam     
    trimmed_uvb1.fastq_tophat/accepted_hits.bam     
    trimmed_uvb2.fastq_tophat/accepted_hits.bam     
    trimmed_uvb3.fastq_tophat/accepted_hits.bam
    
    Assigned        44995   50830   46270   100230  85736   98419
    Unassigned_Ambiguity    2702    2694    2944    5802    4149    4907
    Unassigned_MultiMapping 5171    4789    6165    10614   8150    8599
    Unassigned_NoFeatures   7766    8067    8877    16690   16698   17933
    Unassigned_Unmapped     0       0       0       0       0       0
    Unassigned_MappingQuality       0       0       0       0       0       0
    Unassigned_FragmentLength       0       0       0       0       0       0
    Unassigned_Chimera      0       0       0       0       0       0
    Unassigned_Secondary    0       0       0       0       0       0
    Unassigned_Nonjunction  0       0       0       0       0       0
    Unassigned_Duplicate    0       0       0       0       0       0

    Comment


    • #3
      Hi Joshua,

      Replace "-g gene_name" by "-g gene_id" in your command.

      Comment


      • #4
        Thank you for your suggestion!

        Unfortunately, it did not work. It did change the gene names to gene ids in the output table, but there are still zero values for every gene id.

        Comment


        • #5
          This sounds strange. Apparently featureCounts have successfully produced counts for genes. Could you please show the first few rows of your counting result?

          Comment


          • #6
            Certainly, here's the head of the counts.txt file:

            Code:
            # Program:featureCounts v1.5.0; Command:"featureCounts" "-a" "Homo_sapiens.GRCh38.82.gtf" "-o" "counts3.txt" "-t" "exon" "-g" "gene_name" "-T" "4" "trimmed_ctl1.fastq_tophat/accepted_hits.bam" "trimmed_ctl2.fastq_tophat/accepted_hits.bam" "trimmed_ctl3.fastq_tophat/accepted_hits.bam" "trimmed_uvb1.fastq_tophat/accepted_hits.bam" "trimmed_uvb2.fastq_tophat/accepted_hits.bam" "trimmed_uvb3.fastq_tophat/accepted_hits.bam" 
            Geneid	Chr	Start	End	Strand	Length	trimmed_ctl1.fastq_tophat/accepted_hits.bam	trimmed_ctl2.fastq_tophat/accepted_hits.bam	trimmed_ctl3.fastq_tophat/accepted_hits.bam	trimmed_uvb1.fastq_tophat/accepted_hits.bam	trimmed_uvb2.fastq_tophat/accepted_hits.bam	trimmed_uvb3.fastq_tophat/accepted_hits.bam
            DDX11L1	1;1;1;1	11869;12613;12975;13221	12227;12721;13052;14409	+;+;+;+	1735	0
            WASH7P	1;1;1;1;1;1;1;1;1;1;1;12;12;12;12;12;12;12;12;12;12;12	14404;15005;15796;16607;16858;17233;17606;17915;18268;24738;29534;14522;15085;15913;16722;16969;17348;17723;18037;18373;26801;31878	14501;15038;15947;16765;17055;17368;17742;18061;18366;24891;29570;14944;15153;16065;16880;17170;17483;17859;18183;18471;26954;32015	-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-	3168	0	0
            MIR6859-1	1	17369	17436	-	68	0	0	0	0
            RP11-34P13.3	1;1;1	29554;30267;30976	30039;30667;31109	+;+;+	1021	0	0	0	0	0	0
            MIR1302-2	1	30366	30503	+	138	0	0	0	0
            FAM138A	1;1;1	34554;35245;35721	35174;35481;36081	-;-;-	1219	0
            OR4G4P	1	52473	53312	+	840	0	0	0	0	0
            OR4G11P	1	62948	63887	+	940	0	0	0
            Its more clear after I load the table into R, the right most columns have only zeros:
            Last edited by Joshua.Urrutia; 11-23-2015, 07:15 PM. Reason: clairity

            Comment


            • #7
              Thanks Joshua.Urrutia. But we still couldn't figure out what may cause the problem.

              Could you send me your counts.txt file and also one of the bam files? You may send them offline.

              Comment


              • #8
                Most definitely, thank you for the help. I emailed you a tar file with the .bam file and counts.txt file inside.

                Comment


                • #9
                  Thanks for sending the files. I took a look at your counts.txt file and found that not all the genes got zero count. Below are the number of genes that had at least 1 count in each library:

                  trimmed_ctl1.fastq_tophat.accepted_hits.bam
                  394
                  trimmed_ctl2.fastq_tophat.accepted_hits.bam
                  390
                  trimmed_ctl3.fastq_tophat.accepted_hits.bam
                  402
                  trimmed_uvb1.fastq_tophat.accepted_hits.bam
                  451
                  trimmed_uvb2.fastq_tophat.accepted_hits.bam
                  400
                  trimmed_uvb3.fastq_tophat.accepted_hits.bam
                  489

                  The total number of counts in each library is the same as that reported in featureCounts summary file.

                  Comment


                  • #10
                    Thanks for your help and sorry for my confusion.

                    The problem must be with the way I am using R to find the non-zero values, and not with featureCounts.

                    Comment


                    • #11
                      I figured it out, I was improperly using the subset function in R.

                      Sorry again for my confusion. I can't believe I spent days trying to figure that out.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin




                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                        04-22-2024, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 11:49 AM
                      0 responses
                      13 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-24-2024, 08:47 AM
                      0 responses
                      16 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      61 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      60 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X