Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    What Bruce suggested is the current way of getting assignment stats from using featureCounts. But apparently it is not user-friendly. It is on our to-do list to output assignment stats (with no need to turn on -R option) when read summarization is done, but it will take a week or two to implement it. We will make a new release in a day or so, but it is not going to be included in that release.

    Best,
    Wei

    Comment


    • #62
      Thank you Wei. I switched to using featureCounts.
      Its especially great that it now takes multiple .bam files.

      Whatever statistics can be provided in future versions will be helpful.

      Comment


      • #63
        We have just released Subread 1.4.2. The featureCounts program included in this release outputs an assignment summary file (*.summary) along with the read count file.

        BTW, our featureCounts paper was just published on Bioinformatics. Here is the link to it:

        http://bioinformatics.oxfordjournals...6F&keytype=ref

        Wei

        Comment


        • #64
          SAM/BAM parse error

          Hi Wei,
          I may have found a bug. It seems that the last line of my BAM file header is being parsed as the fist read, since it is quite long this causes the program to exit (line 697 in core.c). When I remove the last @PG line from the header and reheader the bam file your code runs to completion. Please see the attached header.
          cheers,
          Aaron
          Attached Files

          Comment


          • #65
            Hello,
            I think I may have a problem with my gtf file, which look like this:
            ##gff-version 3
            ##source-version geneious 6.1.6
            seq_ref Geneious gene 1 109 . . . Name=exon1;created by=User

            I received this error message:
            Warning: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
            The specified gene identifier attribute is 'exon'
            The attributes included in your GTF annotation are 'Name=exon1;created by=User;modified by=User'

            I hope you could help me

            Comment


            • #66
              Originally posted by AntonioMaceo View Post
              Hi Wei,
              I may have found a bug. It seems that the last line of my BAM file header is being parsed as the fist read, since it is quite long this causes the program to exit (line 697 in core.c). When I remove the last @PG line from the header and reheader the bam file your code runs to completion. Please see the attached header.
              cheers,
              Aaron
              Dear Aaron,

              Thanks for reporting this. You are correct that there was a bug in featureCounts in dealing with long header lines. We have fixed this and released a patched version of Subread package (1.4.3-p1).

              Best wishes,
              Wei

              Comment


              • #67
                Originally posted by chibouki View Post
                Hello,
                I think I may have a problem with my gtf file, which look like this:
                ##gff-version 3
                ##source-version geneious 6.1.6
                seq_ref Geneious gene 1 109 . . . Name=exon1;created by=User

                I received this error message:
                Warning: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
                The specified gene identifier attribute is 'exon'
                The attributes included in your GTF annotation are 'Name=exon1;created by=User;modified by=User'

                I hope you could help me

                Your GTF file has an incorrect format. Have a look at this page for GTF format:

                http://mblab.wustl.edu/GTF2.html

                Comment


                • #68
                  Finally, we solve the problem by cutting the last column of sam file.

                  But we are disappointing, it seems that -d and -D options are only for pair-end?

                  Comment


                  • #69
                    I don't understand how you solved the problem by changing the sam file? The problem is with your GTF annotation file, not your sam file.

                    Comment


                    • #70
                      I know, I stilled receive an error message about the column nine of the gtf (I may have changed it since I posted here for the first time...) but it worked online after cutting the sam

                      Comment


                      • #71
                        Do not change your sam file, otherwise you may get unexpected results. Just change the 9th column of your gtf from for example

                        Name=exon1;created by=User;modified by=User

                        to

                        gene_id "exon1" (space delimited)

                        Comment


                        • #72
                          Hi!
                          Im trying your package, using UCSC hg19 genes.GTF.

                          Im getting this error:

                          || Load annotation file genes.gtf ... ||
                          || Number of features is 0 ||
                          || WARNING no features were loaded in format GTF. ||
                          || annotation format can be specified using '-F'.

                          This is my code:

                          featureCounts(files=BAMs,file., annot.ext=gtf,isGTFAnnotationFile=TRUE,useMetaFeatures=TRUE,GTF.featureType=featureGENE,GTF.attrType=attributeGENE,nthreads=8, reportReads=TRUE)

                          "BAMs" are a string character with the file names
                          "gtf" is the gene.gtf
                          "attributeGENE" is "gene_id"

                          EDIT:
                          Had to delete this: GTF.attrType=attributeGENE
                          guess it was unnecessary
                          Last edited by sindrle; 01-31-2014, 05:53 PM.

                          Comment


                          • #73
                            Here is my comparison so far:

                            HS 68,2 56,9 57,2 64,7 59,7 63,8 60,9
                            FC 62.5 56.8 57.1 64.6 59.5 63.6 60.7

                            HTseq was run with "intersection strict", so maybe thats why it counts a little bit more reads I guess.

                            I like that featureCounts is implemented in R, its easy to run
                            On my Macbook the running time is about the same as for HTseq, but maybe its because I had the option "reportReads = TRUE".

                            I also like that it reports gene length and may handle multiple input files, it also names the output according to the input automatically.

                            What will make or break it is how easily its implemented with the DEseq2 workflow.. HTSeq is very easy to use.
                            Last edited by sindrle; 01-31-2014, 06:55 PM.

                            Comment


                            • #74
                              Dear Sindrle,

                              Could you please provide a couple of lines in "UCSC hg19 genes.GTF"? There are some GTF or GFF files not in the format that is currently supported by featureCounts.

                              Yang

                              Originally posted by sindrle View Post
                              Hi!
                              Im trying your package, using UCSC hg19 genes.GTF.

                              Im getting this error:

                              || Load annotation file genes.gtf ... ||
                              || Number of features is 0 ||
                              || WARNING no features were loaded in format GTF. ||
                              || annotation format can be specified using '-F'.

                              This is my code:

                              featureCounts(files=BAMs,file., annot.ext=gtf,isGTFAnnotationFile=TRUE,useMetaFeatures=TRUE,GTF.featureType=featureGENE,GTF.attrType=attributeGENE,nthreads=8, reportReads=TRUE)

                              "BAMs" are a string character with the file names
                              "gtf" is the gene.gtf
                              "attributeGENE" is "gene_id"

                              EDIT:
                              Had to delete this: GTF.attrType=attributeGENE
                              guess it was unnecessary

                              Comment


                              • #75
                                Im a bit disappointed, after finishing read summary after 5.5 hours, R just crashed.. R version 3.0.2 Rsubread version 1.12.6.

                                Maybe I can read the "reported output", the .featureCount files thats 2x the size of the BAMs (!)

                                Heres the GTF:

                                chr1 unknown CDS 69091 70005 . + 0 gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
                                chr1 unknown exon 69091 70008 . + . gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
                                chr1 unknown start_codon 69091 69093 . + . gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
                                chr1 unknown stop_codon 70006 70008 . + . gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
                                chr1 unknown exon 134773 139696 . - . gene_id "LOC729737"; gene_name "LOC729737"; transcript_id "NR_039983"; tss_id "TSS18541";
                                chr1 unknown exon 139790 139847 . - . gene_id "LOC729737"; gene_name "LOC729737"; transcript_id "NR_039983"; tss_id "TSS18541";

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-27-2024, 06:37 PM
                                0 responses
                                12 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-27-2024, 06:07 PM
                                0 responses
                                11 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                53 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                69 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X