Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #76
    Dear Sindrle,

    Sorry for the crash of R -- did it crash within a function of Rsubread or somewhere else? The ".featureCount" files are indeed very big -- they contain the assignment result of each single read (or fragment) in the input SAM or BAM files, and is uncompressed. That is why they are usually larger than the input BAM files.

    The format of the GTF file seemed perfectly correct. I tried to put the first couple of lines you provided into a GTF file, and ran featureCounts on a paired-end BAM file. The result seemed correct -- featureCounts loaded 3 features and 2 meta-features from the GTF file, and assigned 286 reads to the 2 meta-features. The R object returned from featureCounts has the correct numbers on each meta-feature.

    The only problem is that your command line has an additional parameter "GTF.featureType=featureGENE". What was the value of variable featureGENE? The value should be "exon" for the GTF file you're using.

    Cheers,

    Yang


    Code:
            ==========     _____ _    _ ____  _____  ______          _____
            =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
              =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
                ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
                  ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
            ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
           Rsubread 1.12.6
    
    //========================== featureCounts setting ===========================\\
    ||                                                                            ||
    ||             Input files : 1 BAM file                                       ||
    ||                           o A.bam                                          ||
    ||                                                                            ||
    ||             Output file : ./.Rsubread_featureCounts_pid24234               ||
    ||             Annotations : in.gtf (GTF)                                     ||
    ||                                                                            ||
    ||                 Threads : 1                                                ||
    ||                   Level : meta-feature level                               ||
    ||              Paired-end : no                                               ||
    ||         Strand specific : no                                               ||
    ||      Multimapping reads : not counted                                      ||
    || Multi-overlapping reads : not counted                                      ||
    ||                                                                            ||
    \\===================== [url]http://subread.sourceforge.net/[/url] ======================//
    
    //================================= Running ==================================\\
    ||                                                                            ||
    || Load annotation file in.gtf ...                                            ||
    ||    Number of features is 3                                                 ||
    ||    Number of meta-features is 2                                            ||
    ||    Number of chromosomes is 1                                              ||
    ||                                                                            ||
    || Process BAM file A.bam...                                                  ||
    ||    Assign reads to features...                                             ||
    ||    Total number of reads is : 1124980                                      ||
    ||    Number of successfully assigned reads is : 286 (0.0%)                   ||
    ||    Running time : 0.05 minutes                                             ||
    ||                                                                            ||
    ||                         Read assignment finished.                          ||
    ||                                                                            ||
    \\===================== [url]http://subread.sourceforge.net/[/url] ======================//




    Originally posted by sindrle View Post
    Im a bit disappointed, after finishing read summary after 5.5 hours, R just crashed.. R version 3.0.2 Rsubread version 1.12.6.

    Maybe I can read the "reported output", the .featureCount files thats 2x the size of the BAMs (!)

    Heres the GTF:

    chr1 unknown CDS 69091 70005 . + 0 gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
    chr1 unknown exon 69091 70008 . + . gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
    chr1 unknown start_codon 69091 69093 . + . gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
    chr1 unknown stop_codon 70006 70008 . + . gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
    chr1 unknown exon 134773 139696 . - . gene_id "LOC729737"; gene_name "LOC729737"; transcript_id "NR_039983"; tss_id "TSS18541";
    chr1 unknown exon 139790 139847 . - . gene_id "LOC729737"; gene_name "LOC729737"; transcript_id "NR_039983"; tss_id "TSS18541";
    Last edited by yangliao; 02-01-2014, 05:21 AM.

    Comment


    • #77
      Originally posted by yangliao View Post
      The only problem is that your command line has an additional parameter "GTF.featureType=featureGENE". What was the value of variable featureGENE? The value should be "exon" for the GTF file you're using.
      I see!

      R crashed making the "$counts" table.

      featureGENE = "gene_id"

      My goal was to count reads "at gene level", thus adding all exon reads for one gene together.

      Is it, be the way, possible to count more than one feature in each run?

      Comment


      • #78
        It is good that we have found the problem.

        Every meta-feature in featureCounts can have one or multiple features. Each row in the GTF file is a feature if the third column equals the feature type you wanted (e.g., "exon", or any value defined in variable "GTF.featureType"). FeatureCounts then groups the features that have the same meta-feature-id into meta-features.

        For example, if a row in the GTF file contains gene_id "LOC729737", then the meta-feature-id for this feature is LOC729737.

        If the attribute name is not "gene_id" in the GTF file, you need to specify the correct attribute name. For example, if a row in the GTF file contains feature-name "LOC729737", you need to give a parameter telling featureCounts the new attribute name: GTF.attrType="feature-name", so that featureCounts can find the correct meta-feature-id by searching the attribute name (feature-name) in every row in the GTF file.

        Thereby, if you want to assign reads on meta-feature level, please set GTF.attrType="gene_id", GTF.featureType="exon" and useMetaFeatures=TRUE.

        If you want to assign reads on feature level, please set all variables the same values as above, but only useMetaFeatures=FALSE. FeatureCounts will do what it should do.

        FeatureCounts counts reads for every gene in one single run. The returned R object contains a count table where the rows are meta-features (or features, depending on what you need) , and the columns are the input SAM/BAM file names. Every number in the table is the number of reads (or fragments if you specified isPairedEnd=TRUE) in that input file assigned to that gene.

        Cheers,

        Yang

        Originally posted by sindrle View Post
        I see!

        R crashed making the "$counts" table.

        featureGENE = "gene_id"

        My goal was to count reads "at gene level", thus adding all exon reads for one gene together.

        Is it, be the way, possible to count more than one feature in each run?

        Comment


        • #79
          Thank you for the clarification! Then I know what I did wrong, and maybe thats why it crashed..

          Comment


          • #80
            Quick question, Im trying to run featureCounts with DEXSeq.

            > head(adiExons$counts)
            D104.bam D121.bam D153.bam D155.bam D161.bam D162.bam D167.bam D173.bam
            WASH7P 0 0 0 0 0 0 0 0
            WASH7P 9 4 2 6 4 5 5 7
            WASH7P 8 1 2 5 0 1 0 2

            How do I know which exon these are? Or more importantly, how does DEXSeq get the information it needs?

            Comment


            • #81
              Hi Sindrle,

              If you look at the annotation component in the returned R object, you will find another table containing the details of the annotations.

              Each row in adiExons$counts corresponds to the same row in adiExons$annotation, so you can tell which count number is for which exon. The information in the annotation table (e.g., chromosome name, start, stop, gene name, strand symbol and exon length) can be used in down stream analysis.

              You may also notice that the order of the rows in adiExons$annotation is exactly the same as the order of the exons in the input GTF file.

              Cheers,

              Yang

              Originally posted by sindrle View Post
              Quick question, Im trying to run featureCounts with DEXSeq.

              > head(adiExons$counts)
              D104.bam D121.bam D153.bam D155.bam D161.bam D162.bam D167.bam D173.bam
              WASH7P 0 0 0 0 0 0 0 0
              WASH7P 9 4 2 6 4 5 5 7
              WASH7P 8 1 2 5 0 1 0 2

              How do I know which exon these are? Or more importantly, how does DEXSeq get the information it needs?

              Comment


              • #82
                Ah! Now I get it, very nice.

                One more question, where is the information about "gene length" stored?

                Comment


                • #83
                  The values in the "Gene Length" column are the length of each feature (on feature mode) or the total length of all the features belonging to each gene (on meta-feature mode).

                  It is straightforward on the feature mode: gene length = stop - start + 1. However, on the meta-feature mode, the total length is the uniquely covered length of the gene; namely, if two exons in this gene have an overlapping part, the overlapping part is measured only once in the Gene Length column.

                  For example, on the meta-feature mode, if a gene has three exons: [100, 200], [100, 300] and [200, 250], the Gene Length value for this gene is only 201 because the three exons overlap with each other, and they can be merged into one interval on the chromosome: [100, 300], which has 201 bases.

                  Originally posted by sindrle View Post
                  Ah! Now I get it, very nice.

                  One more question, where is the information about "gene length" stored?

                  Comment


                  • #84
                    Thank you, you have been very helpful.

                    Im currently struggling with how to implement featureCounts to DEXseq, its much easier to use HTSeq, since its described in the manual.

                    I really want to be able to use featureCounts, since with HTSeq I had to "hack" the python script to work with my annotation file, whereas I don't need that with featureCounts.

                    Also, I want to be able, in the end, to present expression values of transcripts as counts/library size/gene length, which also is easier with featureCounts, since it outputs gene length!

                    Comment


                    • #85
                      Using pre-sorted SAM in featureCounts?

                      hello featureCounts users.
                      I have a bunch of "name-sorted" SAM files which I used for HTSeq. I'm trying featureCounts now but I cant find an option to turn off sorting.

                      Though much faster than HTSeq, the program spends major portion of its run time in sorting. If I could turn off sorting I could get results like superfast..

                      Is there a way?

                      thanks

                      Comment


                      • #86
                        Dear Amitm,

                        Thank you for using featureCounts. If you wish to count the reads on the paired-end mode, and if the the input SAM or BAM files are not sorted by the read names, featureCounts has to sort the reads according to their names, so that it can count both reads in a pair at the same time. If, however, the reads are already sorted by names (like SAM or BAM files from subread, bowtie, bwa and etc), subread can count the reads without re-sorting them.

                        You can bypass the sorting step by turning off paired-end counting, or using subtools (provided in the /bin/utilities directory in subread-1.4.3 and latter versions) to sort the SAM file by read names before running featureCounts.

                        Cheers,

                        Yang

                        Originally posted by amitm View Post
                        hello featureCounts users.
                        I have a bunch of "name-sorted" SAM files which I used for HTSeq. I'm trying featureCounts now but I cant find an option to turn off sorting.

                        Though much faster than HTSeq, the program spends major portion of its run time in sorting. If I could turn off sorting I could get results like superfast..

                        Is there a way?

                        thanks
                        Last edited by yangliao; 02-07-2014, 12:01 PM. Reason: typo

                        Comment


                        • #87
                          hi yangliao,
                          Thanks for the reply. Ok, so I understand that in non-PE mode, sorting won't be performed. But in that case neither -B and -C flags would have any meaning.
                          Both the flags are really good for stringency/ QC. Moreover in non-PE mode, reads where R1 is in one meta-feature and R2 in another, would both get counted. That is to say the multi-overlap filter would become less effective.

                          I guess I would bear with the extra time. Thank you for a really good tool!

                          Comment


                          • #88
                            Both reads in a pair are counted on the Single-end mode even if they are assigned to different meta-features.

                            Thank you for using our tools

                            Originally posted by amitm View Post
                            hi yangliao,
                            Thanks for the reply. Ok, so I understand that in non-PE mode, sorting won't be performed. But in that case neither -B and -C flags would have any meaning.
                            Both the flags are really good for stringency/ QC. Moreover in non-PE mode, reads where R1 is in one meta-feature and R2 in another, would both get counted. That is to say the multi-overlap filter would become less effective.

                            I guess I would bear with the extra time. Thank you for a really good tool!

                            Comment


                            • #89
                              Originally posted by amitm View Post
                              hello featureCounts users.
                              I have a bunch of "name-sorted" SAM files which I used for HTSeq. I'm trying featureCounts now but I cant find an option to turn off sorting.

                              Though much faster than HTSeq, the program spends major portion of its run time in sorting. If I could turn off sorting I could get results like superfast..

                              Is there a way?

                              thanks
                              featureCounts only performs read reordering when it is necessary, ie. when reads are not properly paired and/or there are missing mates. Note that sorting a SAM file by name using a tool like samtools does not guarantee that reads are properly paired after sorting. For example, reads that were reported multiple times in the SAM file may still not be properly paired after being sorted by read names.

                              If all the reads in a SAM/BAM file are properly paired, featureCounts will not perform read reordering at all although it will spend a very small amount of time to check if reads are properly paired or not. It is very important to check this, otherwise the read summarization results might be incorrect and the users may not be aware of it.

                              Comment


                              • #90
                                Originally posted by amitm View Post
                                hi yangliao,
                                Thanks for the reply. Ok, so I understand that in non-PE mode, sorting won't be performed. But in that case neither -B and -C flags would have any meaning.
                                Both the flags are really good for stringency/ QC. Moreover in non-PE mode, reads where R1 is in one meta-feature and R2 in another, would both get counted. That is to say the multi-overlap filter would become less effective.

                                I guess I would bear with the extra time. Thank you for a really good tool!
                                I agree it is better to count read pairs (PE mode) rather than individual reads for paired end read data.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Today, 08:47 AM
                                0 responses
                                12 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                59 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                54 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X