![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Raw counts - FeatureCounts (Rsubread), HT Seq and BEdtools | yaseen.ladak | RNA Sequencing | 0 | 11-05-2016 04:41 AM |
Input BAM files for Cufflinks | buthercup_ch | RNA Sequencing | 1 | 06-19-2016 11:16 AM |
input BAM files for GATK | Jane M | Bioinformatics | 26 | 07-30-2015 10:58 PM |
MuTect Bam input files | himanshu04 | Bioinformatics | 1 | 07-17-2013 04:22 PM |
cufflinks accepting BAM files as input??? | PFS | Bioinformatics | 1 | 03-18-2011 12:56 PM |
![]() |
|
Thread Tools |
![]() |
#1 | |
Member
Location: Montevideo Join Date: Mar 2017
Posts: 15
|
![]()
Hello, I have been having an issue with this function featureCounts() for reads summarize.
I want to make an analysis from exons. Everything worked fine until now, but seems like I cant give the input .bam files in a right way. Im getting a error: Script: Code:
setwd(where/bam/files/are/located) fls <- dir(full.names = TRUE ) fc1 <- featureCounts(files = fls, annot.ext = "/home/.../tomics/data2", isGTFAnnotationFile = TRUE, GTF.featureType = "exon", GTF.attrType = "gene_id") Quote:
Tried getting the exact path from bash, no success. Wrote the file one by one manually on featureCounts line, no success. Checked the files were correctly BAM already. To be honest Im getting out of ideas and I dont want to loose another evening with this issue, it seems like a detail. Do you see the problem? Thanks in advance. UPDATE: I have tried with Code:
featureCounts -t exon -g gene_id -a data2.gtf -o counts.txt 4_GCCAAT_L001_R1_001.bam 8_AGTTCC_L001_R1_001.bam 16_GCCAAT_L002_R1_001.bam 20_AGTTCC_L003_R1_001.bam 28_GCCAAT_L004_R1_001.bam 32_AGTTCC_L004_R1_001.bam 40_GCCAAT_L005_R1_001.bam 44_AGTTCC_L006_R1_001.bam
__________________
Beginner @ RNA-Seq, R programming, Linux, Python.- Please be patients! ![]() Last edited by gcR; 04-11-2017 at 05:06 AM. Reason: Updated with real code in 2nd codequote. |
|
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,089
|
![]()
All you should need is
Code:
featureCounts -a annotation.gtf -t exon -g gene_id -o counts.txt results1.bam results2.bam results3.bam |
![]() |
![]() |
![]() |
#3 | |
Member
Location: Montevideo Join Date: Mar 2017
Posts: 15
|
![]() Quote:
In fact I didn't, I copy-pasted the code format from Wei Shi examples, but my filenames are different. I think the problem is in the path, but although I give the full path (/home/.../containing folder/filename.bam) I still get the error. I will keep trying, I think the character vector with filenames is messing things up, but doesn't matter if I put it manually, still gives the quoted error. Thanks again.-
__________________
Beginner @ RNA-Seq, R programming, Linux, Python.- Please be patients! ![]() |
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,089
|
![]()
Can you post the error you are getting when running featureCounts on command line directly (not in R)?
|
![]() |
![]() |
![]() |
#5 |
Member
Location: Montevideo Join Date: Mar 2017
Posts: 15
|
![]() Code:
[sysadm@sysadm bin]$ ./featureCounts -t exon -g gene_id -a data2.gtf -o counts.txt 4_GCCAAT_L001_R1_001.bam 8_AGTTCC_L001_R1_001.bam 16_GCCAAT_L002_R1_001.bam 20_AGTTCC_L003_R1_001.bam 28_GCCAAT_L004_R1_001.bam 32_AGTTCC_L004_R1_001.bam 40_GCCAAT_L005_R1_001.bam 44_AGTTCC_L006_R1_001.bam Code:
ERROR: invalid parameter: '4_GCCAAT_L001_R1_001.bam'
__________________
Beginner @ RNA-Seq, R programming, Linux, Python.- Please be patients! ![]() |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,089
|
![]()
Are those files in the directory where you are running featureCounts from? If not (which I suspect is the case), you will need to provide full/relative path for each of the BAM files.
Code:
[sysadm@sysadm bin]$ ./featureCounts -t exon -g gene_id -a data2.gtf -o counts.txt /path_to/4_GCCAAT_L001_R1_001.bam /path_to/8_AGTTCC_L001_R1_001.bam /path_to/16_GCCAAT_L002_R1_001.bam /path_to/20_AGTTCC_L003_R1_001.bam /path_to/28_GCCAAT_L004_R1_001.bam /path_to/32_AGTTCC_L004_R1_001.bam /path_to/40_GCCAAT_L005_R1_001.bam /path_to/44_AGTTCC_L006_R1_001.bam |
![]() |
![]() |
![]() |
#7 |
Member
Location: Montevideo Join Date: Mar 2017
Posts: 15
|
![]()
Again, tried with:
Code:
./featureCounts -t exon -g gene_id -a data2.gtf -o counts.txt ./4_GCCAAT_L001_R1_001.bam Code:
./featureCounts -t exon -g gene_id -a data2.gtf -o counts.txt /home/sysadm/tomics/ADI/4_GCCAAT_L001_R1_001.bam Code:
./featureCounts -t exon -g gene_id -a data2.gtf -o counts.txt /usr/bin/ls/home/sysadm/tomics/ADI/4_GCCAAT_L001_R1_001.bam Code:
ERROR: invalid parameter: '/path/to/files.bam' I kept some attributes on annotations.gtf that could be cut off, could that be a problem? Thanks for your tracing, Max.
__________________
Beginner @ RNA-Seq, R programming, Linux, Python.- Please be patients! ![]() |
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,089
|
![]()
What does your data2.gtf file look like? Can you post output of "head -4 data2.gtf"?
Are you getting that exact error (as posted above)? That does not seem to match any of the command variations you have provided. |
![]() |
![]() |
![]() |
#9 |
Member
Location: Montevideo Join Date: Mar 2017
Posts: 15
|
![]()
Sorry for my late response, I had been checking other things about it.
Apparently the annotation.gtf has no problems, since I changed the annotation to an inbuilt one and had the same error. Also, I tried Rsubread v1.14.2 with the same filenames.BAM on another computer and worked. Seems like the problem is on the path, on the OS Fedora, or on the RSubread version. Btw, data2 and data3 are the same content Code:
$head -4 data3.gtf 1 havana exon 11869 12227 . + . gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "1"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-002"; transcript_source "havana"; transcript_biotype "processed_transcript"; havana_transcript "OTTHUMT00000362751"; havana_transcript_version "1"; exon_id "ENSE00002234944"; exon_version "1"; tag "basic"; transcript_support_level "1"; 1 havana exon 12613 12721 . + . gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-002"; transcript_source "havana"; transcript_biotype "processed_transcript"; havana_transcript "OTTHUMT00000362751"; havana_transcript_version "1"; exon_id "ENSE00003582793"; exon_version "1"; tag "basic"; transcript_support_level "1"; 1 havana exon 13221 14409 . + . gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "3"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-002"; transcript_source "havana"; transcript_biotype "processed_transcript"; havana_transcript "OTTHUMT00000362751"; havana_transcript_version "1"; exon_id "ENSE00002312635"; exon_version "1"; tag "basic"; transcript_support_level "1"; 1 havana exon 12010 12057 . + . gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; exon_number "1"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-001"; transcript_source "havana"; transcript_biotype "transcribed_unprocessed_pseudogene"; havana_transcript "OTTHUMT00000002844"; havana_transcript_version "2"; exon_id "ENSE00001948541"; exon_version "1"; tag "basic"; transcript_support_level "NA";
__________________
Beginner @ RNA-Seq, R programming, Linux, Python.- Please be patients! ![]() |
![]() |
![]() |
![]() |
#10 |
Member
Location: Montevideo Join Date: Mar 2017
Posts: 15
|
![]()
Finally, I changed my RSubread version to 1.14.0 and worked indeed.
Still having some problems, but at least Im progressing and understanding. Now it says my annotationfile.gtf is SAF, so I think I should process it to keep only those things are intended to be found on GTF Format. Code:
//========================== featureCounts setting ===========================\\ || || || Input files : 1 unknown file || || ? /home/sysadm/tomics/4_GCCAAT_L001_R1_001.f ... || || || || Output file : ./.Rsubread_featureCounts_pid4402 || || Annotations : /home/sysadm/tomics/data3 (SAF) || || || || Threads : 1 || || Level : meta-feature level || || Paired-end : no || || Strand specific : no || || Multimapping reads : not counted || || Multi-overlapping reads : not counted || || || \\===================== http://subread.sourceforge.net/ ======================// //================================= Running ==================================\\ || || || Load annotation file /home/sysadm/tomics/data3 ... || || Features : 2062043 || || Meta-features : 25 || || Chromosomes : 4 || || || || Process Unknown file /home/sysadm/tomics/4_GCCAAT_L001_R1_001.fastq.gz ... || || Single-end reads are included. || || Failed to open file /home/sysadm/tomics/4_GCCAAT_L001_R1_001.fastq.gz. ... || || No counts were generated for this file. || || || || Read assignment finished. || || || \\===================== http://subread.sourceforge.net/ ======================// Error in featureCounts(files = "/home/sysadm/tomics/4_GCCAAT_L001_R1_001.fastq.gz.subread/data.bam", : No count data were generated. Thanks for your time, Max.
__________________
Beginner @ RNA-Seq, R programming, Linux, Python.- Please be patients! ![]() |
![]() |
![]() |
![]() |
#11 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,089
|
![]()
Is this a good properly formatted BAM file? There are no spaces in your chromosome names correct?
|
![]() |
![]() |
![]() |
#12 |
Member
Location: Montevideo Join Date: Mar 2017
Posts: 15
|
![]()
I think so, I have already worked with some files from these people and had no problems, this would be a 2nd experiment.
I tried to install Rsamtools and Rbamtools without success, tried from bash and got a problem with RCurl and XML packages update. Since it was giving me novel problems I quitted trying and focused on my featureCounts issue. I had to install devtools and other things, and even after that I had non zero status. Do you think that would be the problem? Im now formating the GTF file. Thanks Max.
__________________
Beginner @ RNA-Seq, R programming, Linux, Python.- Please be patients! ![]() |
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,089
|
![]()
This is all still a bit puzzling. Hopefully you can get it sorted out. Never had any issues with featureCounts (except one time where there were spaces in the chromosome names, but the error was different).
|
![]() |
![]() |
![]() |
#14 |
Member
Location: Montevideo Join Date: Mar 2017
Posts: 15
|
![]()
Yes, it is hahah. Thanks for your help.
annotation.gtf already formated: Code:
$ head -4 data3filtrada3 1 havana exon 11869 12227 "ENSE00002234944"; "DDX11L1"; "processed_transcript"; "ENSG00000223972"; 1 havana exon 12613 12721 "ENSE00003582793"; "DDX11L1"; "processed_transcript"; "ENSG00000223972"; 1 havana exon 13221 14409 "ENSE00002312635"; "DDX11L1"; "processed_transcript"; "ENSG00000223972"; 1 havana exon 12010 12057 "ENSE00001948541"; "DDX11L1"; "transcribed_unprocessed_pseudogene"; "ENSG00000223972"; Code:
fc1 <- featureCounts(files = "/home/sysadm/tomics/4_GCCAAT_L001_R1_001.fastq.gz.subread/data.bam", annot.ext = "/home/sysadm/tomics/data3filtrada3", isGTFAnnotationFile = TRUE, useMetaFeatures = FALSE, isPairedEnd = TRUE) Code:
//========================== featureCounts setting ===========================\\ || || || Input files : 1 unknown file || || ? /home/sysadm/tomics/4_GCCAAT_L001_R1_001.f ... || || || || Output file : ./.Rsubread_featureCounts_pid4402 || || Annotations : /home/sysadm/tomics/data3filtrada3 (GTF) || || || || Threads : 1 || || Level : feature level || || Paired-end : yes || || Strand specific : no || || Multimapping reads : not counted || || Multi-overlapping reads : not counted || || || || Chimeric reads : counted || || Both ends mapped : not required || || || \\===================== http://subread.sourceforge.net/ ======================// //================================= Running ==================================\\ || || || Load annotation file /home/sysadm/tomics/data3filtrada3 ... || || Features : 0 || || WARNING no features were loaded in format GTF. || || annotation format can be specified using '-F'. || Failed to open the annotation file /home/sysadm/tomics/data3filtrada3, or its format is incorrect, or it contains no 'exon' features. Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file './.Rsubread_featureCounts_pid4402': No such file or directory
__________________
Beginner @ RNA-Seq, R programming, Linux, Python.- Please be patients! ![]() Last edited by gcR; 04-11-2017 at 12:34 PM. |
![]() |
![]() |
![]() |
#15 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,089
|
![]()
That GTF file is not in the right format.
What is the output of "samtools view -H data.bam | head -10"? |
![]() |
![]() |
![]() |
#16 |
Member
Location: Montevideo Join Date: Mar 2017
Posts: 15
|
![]()
You were right, I had been working on that and fixed it alredy. Thanks.
Now, the featureCounts function says my files are .SAM, altought even checking on bash the class type its .BAM It seems like it doesnt open the file and gives 0 count results. RESULT: Code:
//========================== featureCounts setting ===========================\\ || || || Input files : 8 SAM files || || S /home/sysadm/tomics/4_GCCAAT_L001_R1_001.f ... || || S /home/sysadm/tomics/8_AGTTCC_L001_R1_001.f ... || || S /home/sysadm/tomics/16_GCCAAT_L002_R1_001. ... || || S /home/sysadm/tomics/20_AGTTCC_L003_R1_001. ... || || S /home/sysadm/tomics/28_GCCAAT_L004_R1_001. ... || || S /home/sysadm/tomics/32_AGTTCC_L004_R1_001. ... || || S /home/sysadm/tomics/40_GCCAAT_L005_R1_001. ... || || S /home/sysadm/tomics/44_AGTTCC_L006_R1_001. ... || || || || Output file : ./.Rsubread_featureCounts_pid3679 || || Annotations : /home/sysadm/tomics/data3 (GTF) || || || || Threads : 1 || || Level : feature level || || Paired-end : no || || Strand specific : no || || Multimapping reads : not counted || || Multi-overlapping reads : not counted || || || \\===================== http://subread.sourceforge.net/ ======================// //================================= Running ==================================\\ || || || Load annotation file /home/sysadm/tomics/data3 ... || ^[[B^[[B|| Features : 1193694 || || Meta-features : 58174 || || Chromosomes : 25 || || || || Process SAM file /home/sysadm/tomics/4_GCCAAT_L001_R1_001.fastq.gz.sub ... || || Single-end reads are included. || || WARNING format issue in file '/home/sysadm/tomics/4_GCCAAT_L001_R1_001 ... || || The required format is : SAM || || The file format is unknown. || || A wrong format may result in wrong results or crash the program. || || Please refer to the manual for file format options. || || If the file is in the correct format, please ignore this message. || || || || Assign reads to features... || || Total reads : 2380011 || || Successfully assigned reads : 0 (0.0%) || || Running time : 0.02 minutes || || || || Process SAM file /home/sysadm/tomics/8_AGTTCC_L001_R1_001.fastq.gz.sub ... || || Single-end reads are included. || || WARNING format issue in file '/home/sysadm/tomics/8_AGTTCC_L001_R1_001 ... || || The required format is : SAM || || The file format is unknown. || || A wrong format may result in wrong results or crash the program. || || Please refer to the manual for file format options. || || If the file is in the correct format, please ignore this message. || || || || Assign reads to features... || || Total reads : 1956680 || || Successfully assigned reads : 0 (0.0%) || || Running time : 0.16 minutes || || || || Process SAM file /home/sysadm/tomics/16_GCCAAT_L002_R1_001.fastq.gz.su ... || || Single-end reads are included. || || WARNING format issue in file '/home/sysadm/tomics/16_GCCAAT_L002_R1_00 ... || || The required format is : SAM || || The file format is unknown. || || A wrong format may result in wrong results or crash the program. || || Please refer to the manual for file format options. || || If the file is in the correct format, please ignore this message. || || || || Assign reads to features... || || Total reads : 2287401 || || Successfully assigned reads : 0 (0.0%) || || Running time : 0.17 minutes || || || || Process SAM file /home/sysadm/tomics/20_AGTTCC_L003_R1_001.fastq.gz.su ... || || Single-end reads are included. || || WARNING format issue in file '/home/sysadm/tomics/20_AGTTCC_L003_R1_00 ... || || The required format is : SAM || || The file format is unknown. || || A wrong format may result in wrong results or crash the program. || || Please refer to the manual for file format options. || || If the file is in the correct format, please ignore this message. || || || || Assign reads to features... || || Total reads : 2362945 || || Successfully assigned reads : 0 (0.0%) || || Running time : 0.17 minutes || || || || Process SAM file /home/sysadm/tomics/28_GCCAAT_L004_R1_001.fastq.gz.su ... || || Single-end reads are included. || || WARNING format issue in file '/home/sysadm/tomics/28_GCCAAT_L004_R1_00 ... || || The required format is : SAM || || The file format is unknown. || || A wrong format may result in wrong results or crash the program. || || Please refer to the manual for file format options. || || If the file is in the correct format, please ignore this message. || || || || Assign reads to features... || || Total reads : 1884255 || || Successfully assigned reads : 0 (0.0%) || || Running time : 0.15 minutes || || || || Process SAM file /home/sysadm/tomics/32_AGTTCC_L004_R1_001.fastq.gz.su ... || || Single-end reads are included. || || WARNING format issue in file '/home/sysadm/tomics/32_AGTTCC_L004_R1_00 ... || || The required format is : SAM || || The file format is unknown. || || A wrong format may result in wrong results or crash the program. || || Please refer to the manual for file format options. || || If the file is in the correct format, please ignore this message. || || || || Assign reads to features... || || Total reads : 4205162 || || Successfully assigned reads : 0 (0.0%) || || Running time : 0.29 minutes || || || || Process SAM file /home/sysadm/tomics/40_GCCAAT_L005_R1_001.fastq.gz.su ... || || Single-end reads are included. || || WARNING format issue in file '/home/sysadm/tomics/40_GCCAAT_L005_R1_00 ... || || The required format is : SAM || || The file format is unknown. || || A wrong format may result in wrong results or crash the program. || || Please refer to the manual for file format options. || || If the file is in the correct format, please ignore this message. || || || || Assign reads to features... || || Total reads : 1953045 || || Successfully assigned reads : 0 (0.0%) || || Running time : 0.14 minutes || || || || Process SAM file /home/sysadm/tomics/44_AGTTCC_L006_R1_001.fastq.gz.su ... || || Single-end reads are included. || || WARNING format issue in file '/home/sysadm/tomics/44_AGTTCC_L006_R1_00 ... || || The required format is : SAM || || The file format is unknown. || || A wrong format may result in wrong results or crash the program. || || Please refer to the manual for file format options. || || If the file is in the correct format, please ignore this message. || || || || Assign reads to features... || || Total reads : 2908457 || || Successfully assigned reads : 0 (0.0%) || || Running time : 0.22 minutes || || || || Read assignment finished. || || || \\===================== http://subread.sourceforge.net/ ======================// Thanks all for reading. G.
__________________
Beginner @ RNA-Seq, R programming, Linux, Python.- Please be patients! ![]() |
![]() |
![]() |
![]() |
#17 |
Member
Location: Montevideo Join Date: Mar 2017
Posts: 15
|
![]()
Found the drama, the BAM files were corrupt, had to DL the dataset again and problem solved.
Dummy begginers problems, indeed ![]() Thx for ur time.
__________________
Beginner @ RNA-Seq, R programming, Linux, Python.- Please be patients! ![]() |
![]() |
![]() |
![]() |
Thread Tools | |
|
|