Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing output for multiple input bam files in featureCounts

    Dear Wei Shi,

    I use featureCounts v1.5.0-p1 for 3 input sorted bam files:

    //========================== featureCounts setting ===========================\\
    || ||
    || Input files : 3 BAM files 1 unknown file ||
    || ? CORE ||
    || P sorted1.bam ||
    || P sorted2.bam ||
    || P sorted3.bam ||
    || ||
    || Output file : /out/ALL.featureCounts.txt ||
    || Annotations : /ref/All_assembled.merged.gtf ||
    || Assignment details : <input_file>.featureCounts ||
    || ||
    || Threads : 6 ||
    || Level : meta-feature level ||
    || Paired-end : yes ||
    || Strand specific : inversed ||
    || Multimapping reads : not counted ||
    || Multi-overlapping reads : not counted ||
    || Read orientations : fr ||
    || ||
    || Chimeric reads : counted ||
    || Both ends mapped : not required ||
    || ||


    Although the output summary file shows the statistics for all 3 files as expected, the output featureCounts.txt file shows the counts for only the first 2 files, and the names of the last 2 files are not separated by tab in the top row:


    Geneid Chr Start End Strand Length sorted1.bam sorted2.bamsorted3.bam
    ENSG00000278066 KI270731.1 26533 27138 - 606 0 0
    ENSG00000277374 KI270750.1 148668 148843 + 176 0 0
    ENSG00000273532 KI270721.1 51722 51792 + 71 0 0
    ENSG00000276351 KI270721.1 52666 52734 + 69 0 0
    ENSG00000275661 KI270721.1 52895 53010 + 116 0 0
    ENSG00000277856 KI270726.1 26241 26534 + 294 0 0
    ENSG00000275063 KI270726.1;KI270726.1 41444;41572 41489;41876 +;+ 351 0 0
    ENSG00000275987 KI270713.1 30437 30580 - 144 0 0
    ENSG00000277475 KI270713.1 31698 32528 - 831 1 0
    ENSG00000268674 KI270713.1 35407 35916 + 510 0 0



    Could you please let me know how I can get the featureCounts.txt summarizing all 3 input files?

    Many thanks!
    Best wishes,
    Yuqia



    Originally posted by shi View Post
    Dear All,

    I would like to formally introduce to you our featureCounts program, a software program we developed for summarizing the next-gen sequencing reads to genomic features such as genes, exons and promoters.

    featureCounts is a light-weight read counting program written entirely using the C programming language. It can be used to count both gDNA-seq and RNA-seq reads for genomic features. It has the following features:
    (1) It carries out precise and accurate read assignments by taking care of indels, junctions and fusions in the reads.
    (2) It takes less than 4 minutes to summarize 20 million pairs of reads to 26k RefSeq genes using one thread, and only uses 40MB of memory (you can run it on a Mac laptop).
    (3) It supports multi-threaded running, making it extremely fast for summarizing large datasets.
    (4) It supports GTF format annotation and SAM/BAM read data.
    (5) It supports strand-specific read summarization.
    (6) It can perform read summarization at both feature level (eg. exons) and meta-feature level (eg. genes).
    (7) It allows users to specify whether reads overlapping with more than one feature should be counted or not.
    (8) It gives users full control on the summarization of paired-end reads, including allowing them to check if both ends are mapped and/or if the paired-end distances satisfy the distance criteria.
    (9) It discriminates the features, which were overlapped by both ends from the same fragment, from those which were overlapped by only one end so as to get more fragments counted.
    (10) It allows users to specify whether chimeric fragments should be counted.

    For a quick start, have a look at our short tutorial - http://bioinf.wehi.edu.au/featureCounts/ . For more details, please refer to the users guide - http://bioinf.wehi.edu.au/featureCounts/usersguide.pdf (see Chapter 6).

    We also compared featureCounts with other methods. The comparison results can be found in our manuscript - http://arxiv.org/abs/1305.3347.

    The featureCounts program is part of the Subread package (http://subread.sourceforge.net), which includes a suite of programs for processing next-gen sequencing data such as read mapping and exon-exon junction detection. featureCounts can also be accessed from the development version of the Bioconductor R package Rsubread (http://bioconductor.org/packages/2.1.../Rsubread.html)

    Please do not hesitate to contact me if you have any questions ([email protected]).

    Best regards,
    -------------------
    Wei Shi, Ph.D
    Bioinformatics Division
    The Walter and Eliza Hall Institute of Medical Research
    1G Royal Parade, Parkville, Victoria 3052
    Australia

    Comment


    • Missing output for multiple input bam files in featureCounts

      Dear Wei Shi,

      I'm using featureCounts v1.5.0-p1 to get a count summary of multiple input sorted bam files. First I tried with 3 files. I'm having trouble getting the correct output.

      Here's my code:

      featureCounts -T 6 -p -s 2 -t exon -g gene_id \
      -a /ref/All_assembled.merged.gtf \
      -o /out/ALL.featureCounts.txt \
      -R CORE \
      sorted1.bam sorted2.bam sorted3.bam


      The setting is in the attached file "featureCounts_setting".

      The summary (attached file "summary_correct") shows the statistics for all 3 files.
      But the ALL.featureCounts.txt (attached file Count_error) shows the output for only 2 files, and the names of the 2 last input files (sorted2.bam and sorted3.bam) in this count file are not tab delimited like the ones in the "summary_correct".

      Could this be the bug in this version? If not, could you please let me know how I could get all outputs for all input files?

      Many thanks!
      Best regards,
      Yuqia
      Attached Files

      Comment


      • Hi,

        I am new to RNA sequencing analysis and just finished assigning my reads using featureCounts. I am trying to save the output of featureCounts into a txt file and am having trouble.

        The command I used for fc is as follows

        counts <- featureCounts("my.bam",annot.ext ="my.gtf",isGTFAnnotationFile=T,GTF.featureType="exon",GTF.attrType="gene_id",nthreads=4,isPairedEnd=T,countMultiMappingReads=T)

        And I used this command to tabulate the results

        write.table(cbind(counts$annotation[,2:4], counts$counts),"sample_featureCounts.txt",quote=F,sep=" ",row.names=F)

        My output file has only 4 columns

        Chromosome(Ensembl ID) Start End Counts

        I would like to add gene name to the output to identify counts better.

        How can I do that? What command should I use?

        Comment


        • Originally posted by rookie_genomics View Post
          Code:
          write.table(cbind(counts$annotation[,2:4], counts$counts),"sample_featureCounts.txt",quote=F,sep=" ",row.names=F)
          You saved the 2nd to the 4th columns in the annotation data frame into the file. These three columns are the chromosome names, start and end locations, as you had in the output.

          The gene identity is in the first column of annotation, so you can use
          Code:
          write.table(cbind(counts$annotation[,1:4], counts$counts),"sample_featureCounts.txt",quote=F,sep=" ",row.names=F)

          Comment


          • COunt inconsistency

            Hello

            I have problems getting correct counts for a stranded RNA-seq study using hisat2 version 2.1.0 and featureCounts v1.6.4, and I would appreciate any help.

            You have below an image of the results of the mapping by Hisat2 (--rna-strandness RF) of a small test subset of my data

            For example, the second last gene (AFBG1_15566.1) is only covered by transcripts of the same orientation (blue reads).

            However, when I process the same BAM file with featureCounts, it finds
            with the flag -s 1:

            AFBG1_15566 scf7180000002748;scf7180000002748 13827;16161 16084;16780 +;+ 2878 2383

            with the flag -s 2:
            AFBG1_15566 scf7180000002748;scf7180000002748 13827;16161 16084;16780 +;+ 2878 2257


            I can't understand why featureCounts finds about the same counts (2383 and 2257) in both orientations.


            if that can be useful to find the solution, I have put the script and data that allow to get these results at this address: https://nuage.osupytheas.fr/s/D5S4stT9aDLYEsD

            Thank you !

            Last edited by Guillaume; 04-09-2019, 06:37 AM.

            Comment


            • Hi Guillaume,

              I found that you did not use the "-p" option in your script to run featureCounts. This means that each single read, not read-pair, was assigned to the genes. FeatureCounts only flips the strand of the second read when the "-p" option is specified; otherwise it simply looks for the 0x10 FLAG in the alignment for matching the strands of the gene and the alignment. Half of your single reads (R2s) were from the positive strand, while the other half (R1s) were from negative strand, hence the AFBG1_15566 gene always has counts no matter you used "-s 1" or "-s 2".

              I changed your script by adding the "-p" option to featureCounts, flipping all your R2s to the negative strand. Now using "-s 1" has zero count for AFBG1_15566, but using "-s 2" has all counts for AFBG1_15566.

              Cheers,
              Yang

              Comment


              • Thank you so much Yang
                I should have read the manual more carefully...

                Comment


                • We are using FeatureCounts quite a bit. Great tool. I have noticed that running it with the options "-t exon -g gene_id " gives slightly different results from "-t gene -f -g gene_id" and yet different from "-t gene -g gene_id", though they all should produce the counts at the gene level. I am wondering how the execution behind these options is different?

                  Comment


                  • ror: the feature on the 21091-th line has zero coordinate or zero lengths

                    Hi ALL,
                    can you please help to get pass of this error when trying to get featureCounts for genes out of my gtf file.
                    error -

                    ror: the feature on the 21091-th line has zero coordinate or zero lengths

                    No counts were generated.


                    i am running as-

                    featureCounts(files= "myBAM.BAM",
                    isPairedEnd=TRUE,requireBothEndsMapped=TRUE,
                    annot.ext="GCF_000009705.1_ASM970v1_genomic.gtf.gz",
                    isGTFAnnotationFile=TRUE,GTF.featureType="gene",GTF.attrType="gene_id", nthreads = 20,
                    genome = "GCF_000009705.1_ASM970v1_genomic.fna" )


                    pelase help
                    with regards
                    Thank you

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM
                    • seqadmin
                      The Impact of AI in Genomic Medicine
                      by seqadmin



                      Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                      02-26-2024, 02:07 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-14-2024, 06:13 AM
                    0 responses
                    32 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-08-2024, 08:03 AM
                    0 responses
                    71 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-07-2024, 08:13 AM
                    0 responses
                    80 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-06-2024, 09:51 AM
                    0 responses
                    68 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X