Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RSubread: featureCounts problem with input .bam files

    Hello, I have been having an issue with this function featureCounts() for reads summarize.

    I want to make an analysis from exons. Everything worked fine until now, but seems like I cant give the input .bam files in a right way. Im getting a error:

    Script:
    Code:
    setwd(where/bam/files/are/located)
    fls <- dir(full.names = TRUE )
    fc1 <- featureCounts(files = fls, annot.ext = "/home/.../tomics/data2", isGTFAnnotationFile = TRUE, GTF.featureType = "exon", GTF.attrType = "gene_id")
    Output:
    ERROR: invalid parameter: './16_GCCAAT_L002_R1_001.bam'

    Error in file(file, "rt") : cannot open the connection
    In addition: Warning message:
    In file(file, "rt") :
    cannot open file './.Rsubread_featureCounts_pid2638': No such file or directory
    I also tried to make the character vector manually, no success.
    Tried getting the exact path from bash, no success.
    Wrote the file one by one manually on featureCounts line, no success.
    Checked the files were correctly BAM already.

    To be honest Im getting out of ideas and I dont want to loose another evening with this issue, it seems like a detail.

    Do you see the problem?

    Thanks in advance.

    UPDATE: I have tried with
    Code:
    featureCounts -t exon -g gene_id -a data2.gtf -o counts.txt 4_GCCAAT_L001_R1_001.bam 8_AGTTCC_L001_R1_001.bam 16_GCCAAT_L002_R1_001.bam 20_AGTTCC_L003_R1_001.bam 28_GCCAAT_L004_R1_001.bam 32_AGTTCC_L004_R1_001.bam 40_GCCAAT_L005_R1_001.bam 44_AGTTCC_L006_R1_001.bam
    , again without success :/
    Last edited by gcR; 04-11-2017, 04:06 AM. Reason: Updated with real code in 2nd codequote.
    Beginner @ RNA-Seq, R programming, Linux, Python.-

    Please be patients!

  • #2
    All you should need is
    Code:
    featureCounts -a annotation.gtf -t exon -g gene_id -o counts.txt results1.bam results2.bam results3.bam
    Not sure why you added word `mapping` before the file names. Needs an "_" in `gene_id`.

    Comment


    • #3
      Originally posted by GenoMax View Post
      All you should need is
      Code:
      featureCounts -a annotation.gtf -t exon -g gene_id -o counts.txt results1.bam results2.bam results3.bam
      Not sure why you added word `mapping` before the file names. Needs an "_" in `gene_id`.
      Thanks for your fast response.

      In fact I didn't, I copy-pasted the code format from Wei Shi examples, but my filenames are different.

      I think the problem is in the path, but although I give the full path (/home/.../containing folder/filename.bam) I still get the error.

      I will keep trying, I think the character vector with filenames is messing things up, but doesn't matter if I put it manually, still gives the quoted error.

      Thanks again.-
      Beginner @ RNA-Seq, R programming, Linux, Python.-

      Please be patients!

      Comment


      • #4
        Can you post the error you are getting when running featureCounts on command line directly (not in R)?

        Comment


        • #5
          Code:
          [sysadm@sysadm bin]$ ./featureCounts -t exon -g gene_id -a data2.gtf -o counts.txt 4_GCCAAT_L001_R1_001.bam 8_AGTTCC_L001_R1_001.bam 16_GCCAAT_L002_R1_001.bam 20_AGTTCC_L003_R1_001.bam 28_GCCAAT_L004_R1_001.bam 32_AGTTCC_L004_R1_001.bam 40_GCCAAT_L005_R1_001.bam 44_AGTTCC_L006_R1_001.bam
          Code:
          ERROR: invalid parameter: '4_GCCAAT_L001_R1_001.bam'
          Do I need to give exact path? Maybe change the folder locations? Im kind of confused now, sry.
          Beginner @ RNA-Seq, R programming, Linux, Python.-

          Please be patients!

          Comment


          • #6
            Are those files in the directory where you are running featureCounts from? If not (which I suspect is the case), you will need to provide full/relative path for each of the BAM files.

            Code:
            [sysadm@sysadm bin]$ ./featureCounts -t exon -g gene_id -a data2.gtf -o counts.txt /path_to/4_GCCAAT_L001_R1_001.bam /path_to/8_AGTTCC_L001_R1_001.bam /path_to/16_GCCAAT_L002_R1_001.bam /path_to/20_AGTTCC_L003_R1_001.bam /path_to/28_GCCAAT_L004_R1_001.bam /path_to/32_AGTTCC_L004_R1_001.bam /path_to/40_GCCAAT_L005_R1_001.bam /path_to/44_AGTTCC_L006_R1_001.bam

            Comment


            • #7
              Again, tried with:

              Code:
              ./featureCounts -t exon -g gene_id -a data2.gtf -o counts.txt ./4_GCCAAT_L001_R1_001.bam
              Code:
              ./featureCounts -t exon -g gene_id -a data2.gtf -o counts.txt /home/sysadm/tomics/ADI/4_GCCAAT_L001_R1_001.bam
              and,
              Code:
              ./featureCounts -t exon -g gene_id -a data2.gtf -o counts.txt /usr/bin/ls/home/sysadm/tomics/ADI/4_GCCAAT_L001_R1_001.bam
              RESULT:
              Code:
              ERROR: invalid parameter: '/path/to/files.bam'
              Maybe there is a problem with data2.gtf and is being masked with that path error? To be honest I tried many ways of giving the input files and still no clue about the problem.

              I kept some attributes on annotations.gtf that could be cut off, could that be a problem?

              Thanks for your tracing, Max.
              Beginner @ RNA-Seq, R programming, Linux, Python.-

              Please be patients!

              Comment


              • #8
                What does your data2.gtf file look like? Can you post output of "head -4 data2.gtf"?

                Are you getting that exact error (as posted above)? That does not seem to match any of the command variations you have provided.

                Comment


                • #9
                  Sorry for my late response, I had been checking other things about it.

                  Apparently the annotation.gtf has no problems, since I changed the annotation to an inbuilt one and had the same error.

                  Also, I tried Rsubread v1.14.2 with the same filenames.BAM on another computer and worked.

                  Seems like the problem is on the path, on the OS Fedora, or on the RSubread version.

                  Btw, data2 and data3 are the same content
                  Code:
                  $head -4 data3.gtf
                  
                  1	havana	exon	11869	12227	.	+	.	gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "1"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-002"; transcript_source "havana"; transcript_biotype "processed_transcript"; havana_transcript "OTTHUMT00000362751"; havana_transcript_version "1"; exon_id "ENSE00002234944"; exon_version "1"; tag "basic"; transcript_support_level "1";
                  1	havana	exon	12613	12721	.	+	.	gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-002"; transcript_source "havana"; transcript_biotype "processed_transcript"; havana_transcript "OTTHUMT00000362751"; havana_transcript_version "1"; exon_id "ENSE00003582793"; exon_version "1"; tag "basic"; transcript_support_level "1";
                  1	havana	exon	13221	14409	.	+	.	gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "3"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-002"; transcript_source "havana"; transcript_biotype "processed_transcript"; havana_transcript "OTTHUMT00000362751"; havana_transcript_version "1"; exon_id "ENSE00002312635"; exon_version "1"; tag "basic"; transcript_support_level "1";
                  1	havana	exon	12010	12057	.	+	.	gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; exon_number "1"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-001"; transcript_source "havana"; transcript_biotype "transcribed_unprocessed_pseudogene"; havana_transcript "OTTHUMT00000002844"; havana_transcript_version "2"; exon_id "ENSE00001948541"; exon_version "1"; tag "basic"; transcript_support_level "NA";
                  Beginner @ RNA-Seq, R programming, Linux, Python.-

                  Please be patients!

                  Comment


                  • #10
                    Finally, I changed my RSubread version to 1.14.0 and worked indeed.

                    Still having some problems, but at least Im progressing and understanding.

                    Now it says my annotationfile.gtf is SAF, so I think I should process it to keep only those things are intended to be found on GTF Format.

                    Code:
                    //========================== featureCounts setting ===========================\\
                    ||                                                                            ||
                    ||             Input files : 1 unknown file                                   ||
                    ||                           ? /home/sysadm/tomics/4_GCCAAT_L001_R1_001.f ... ||
                    ||                                                                            ||
                    ||             Output file : ./.Rsubread_featureCounts_pid4402                ||
                    ||             Annotations : /home/sysadm/tomics/data3 (SAF)                  ||
                    ||                                                                            ||
                    ||                 Threads : 1                                                ||
                    ||                   Level : meta-feature level                               ||
                    ||              Paired-end : no                                               ||
                    ||         Strand specific : no                                               ||
                    ||      Multimapping reads : not counted                                      ||
                    || Multi-overlapping reads : not counted                                      ||
                    ||                                                                            ||
                    \\===================== http://subread.sourceforge.net/ ======================//
                    
                    //================================= Running ==================================\\
                    ||                                                                            ||
                    || Load annotation file /home/sysadm/tomics/data3 ...                         ||
                    ||    Features : 2062043                                                      ||
                    ||    Meta-features : 25                                                      ||
                    ||    Chromosomes : 4                                                         ||
                    ||                                                                            ||
                    || Process Unknown file /home/sysadm/tomics/4_GCCAAT_L001_R1_001.fastq.gz ... ||
                    ||    Single-end reads are included.                                          ||
                    || Failed to open file /home/sysadm/tomics/4_GCCAAT_L001_R1_001.fastq.gz. ... ||
                    || No counts were generated for this file.                                    ||
                    ||                                                                            ||
                    ||                         Read assignment finished.                          ||
                    ||                                                                            ||
                    \\===================== http://subread.sourceforge.net/ ======================//
                    
                    Error in featureCounts(files = "/home/sysadm/tomics/4_GCCAAT_L001_R1_001.fastq.gz.subread/data.bam",  : 
                      No count data were generated.

                    Thanks for your time, Max.
                    Beginner @ RNA-Seq, R programming, Linux, Python.-

                    Please be patients!

                    Comment


                    • #11
                      Is this a good properly formatted BAM file? There are no spaces in your chromosome names correct?

                      Comment


                      • #12
                        I think so, I have already worked with some files from these people and had no problems, this would be a 2nd experiment.

                        I tried to install Rsamtools and Rbamtools without success, tried from bash and got a problem with RCurl and XML packages update. Since it was giving me novel problems I quitted trying and focused on my featureCounts issue.
                        I had to install devtools and other things, and even after that I had non zero status.

                        Do you think that would be the problem? Im now formating the GTF file.

                        Thanks Max.
                        Beginner @ RNA-Seq, R programming, Linux, Python.-

                        Please be patients!

                        Comment


                        • #13
                          This is all still a bit puzzling. Hopefully you can get it sorted out. Never had any issues with featureCounts (except one time where there were spaces in the chromosome names, but the error was different).

                          Comment


                          • #14
                            Yes, it is hahah. Thanks for your help.

                            annotation.gtf already formated:
                            Code:
                            $ head -4 data3filtrada3
                            1 havana exon 11869 12227 "ENSE00002234944"; "DDX11L1"; "processed_transcript"; "ENSG00000223972";
                            1 havana exon 12613 12721 "ENSE00003582793"; "DDX11L1"; "processed_transcript"; "ENSG00000223972";
                            1 havana exon 13221 14409 "ENSE00002312635"; "DDX11L1"; "processed_transcript"; "ENSG00000223972";
                            1 havana exon 12010 12057 "ENSE00001948541"; "DDX11L1"; "transcribed_unprocessed_pseudogene"; "ENSG00000223972";
                            featureCounts command from R:
                            Code:
                            fc1 <- featureCounts(files = "/home/sysadm/tomics/4_GCCAAT_L001_R1_001.fastq.gz.subread/data.bam", annot.ext = "/home/sysadm/tomics/data3filtrada3", isGTFAnnotationFile = TRUE, useMetaFeatures = FALSE, isPairedEnd = TRUE)
                            still error:
                            Code:
                            //========================== featureCounts setting ===========================\\
                            ||                                                                            ||
                            ||             Input files : 1 unknown file                                   ||
                            ||                           ? /home/sysadm/tomics/4_GCCAAT_L001_R1_001.f ... ||
                            ||                                                                            ||
                            ||             Output file : ./.Rsubread_featureCounts_pid4402                ||
                            ||             Annotations : /home/sysadm/tomics/data3filtrada3 (GTF)         ||
                            ||                                                                            ||
                            ||                 Threads : 1                                                ||
                            ||                   Level : feature level                                    ||
                            ||              Paired-end : yes                                              ||
                            ||         Strand specific : no                                               ||
                            ||      Multimapping reads : not counted                                      ||
                            || Multi-overlapping reads : not counted                                      ||
                            ||                                                                            ||
                            ||          Chimeric reads : counted                                          ||
                            ||        Both ends mapped : not required                                     ||
                            ||                                                                            ||
                            \\===================== http://subread.sourceforge.net/ ======================//
                            
                            //================================= Running ==================================\\
                            ||                                                                            ||
                            || Load annotation file /home/sysadm/tomics/data3filtrada3 ...                ||
                            ||    Features : 0                                                            ||
                            || WARNING no features were loaded in format GTF.                             ||
                            ||         annotation format can be specified using '-F'.                     ||
                            Failed to open the annotation file /home/sysadm/tomics/data3filtrada3, or its format is incorrect, or it contains no 'exon' features.
                            Error in file(file, "rt") : cannot open the connection
                            In addition: Warning message:
                            In file(file, "rt") :
                              cannot open file './.Rsubread_featureCounts_pid4402': No such file or directory
                            BTW, im using Fedora from TopHat.
                            Last edited by gcR; 04-11-2017, 11:34 AM.
                            Beginner @ RNA-Seq, R programming, Linux, Python.-

                            Please be patients!

                            Comment


                            • #15
                              That GTF file is not in the right format.

                              What is the output of "samtools view -H data.bam | head -10"?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X