Hello!
I’m working with 33 RNAseq libraries, and I’m having a problem with featureCounts. I start with sorted bam files (which are named sorted_6346.bam, sorted_6347.bam all the way until sorted_6378.bam), which I then pass to featureCounts with this command:
featureCounts -a ~/genomes/Mouse/ensembl_genome/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf —t exon -g gene_id -s 2 -p -R -M sorted_63* -o output
The individual output files look fine, but there seems to be something wrong with the combined output table. Here, counts from the first two libraries appear to be missing. For example, if I take one particular gene, ENSMUSG00000029614:
>grep "ENSMUSG00000029614" output
ENSMUSG00000029614 5;5;5;5;5;5 121204481;121205406;121206638;121207077;121208196;121208782 121204552;121206445;121206810;121207384;121208575;121209241 +;+;+;+;+;+ 2433 36508 47652 50431 11667 15455 75749 15577 27682 67064 14802 12306 26099 55411 17297 52910 22243 29685 18242 36564 21280 31884 10634 75043 22386 31312 17584 5298 27524 13846 14408 21197
As you can see, the first 6 fields are the usual ones from featureCounts:
Geneid Chr Start End Strand Length
ENSMUSG00000029614 5;5;5;5;5;5 121204481;121205406;121206638;121207077;121208196;121208782 121204552;121206445;121206810;121207384;121208575;121209241 +;+;+;+;+;+ 2433
After this, there should be the counts from each of the 33 libraries (6346-6378), but there are only 31 (starting with 36508).
To investigate further, I looked at the individual outputs:
[ls299@themonster ensembl_genome]grep -c "ENSMUSG00000029614" sorted_63*.bam.featureCounts
sorted_6346.bam.featureCounts:32761
sorted_6347.bam.featureCounts:31802
sorted_6348.bam.featureCounts:36508
sorted_6349.bam.featureCounts:47652
sorted_6350.bam.featureCounts:50431
sorted_6351.bam.featureCounts:11667
sorted_6352.bam.featureCounts:15455
sorted_6353.bam.featureCounts:75749
sorted_6354.bam.featureCounts:15577
sorted_6355.bam.featureCounts:27682
sorted_6356.bam.featureCounts:67064
sorted_6357.bam.featureCounts:14802
sorted_6358.bam.featureCounts:12306
sorted_6359.bam.featureCounts:26099
sorted_6360.bam.featureCounts:55411
sorted_6361.bam.featureCounts:17297
sorted_6362.bam.featureCounts:52910
sorted_6363.bam.featureCounts:22243
sorted_6364.bam.featureCounts:29685
sorted_6365.bam.featureCounts:18242
sorted_6366.bam.featureCounts:36564
sorted_6367.bam.featureCounts:21280
sorted_6368.bam.featureCounts:31884
sorted_6369.bam.featureCounts:10634
sorted_6370.bam.featureCounts:75043
sorted_6371.bam.featureCounts:22386
sorted_6372.bam.featureCounts:31312
sorted_6373.bam.featureCounts:17584
sorted_6374.bam.featureCounts:5298
sorted_6375.bam.featureCounts:27524
sorted_6376.bam.featureCounts:13846
sorted_6377.bam.featureCounts:14408
sorted_6378.bam.featureCounts:21197
As you can see, there are in fact counts for the first two libraries, it just looks like they are missing in the combined table.
Any ideas as to what’s going on?
Thanks a lot!
I’m working with 33 RNAseq libraries, and I’m having a problem with featureCounts. I start with sorted bam files (which are named sorted_6346.bam, sorted_6347.bam all the way until sorted_6378.bam), which I then pass to featureCounts with this command:
featureCounts -a ~/genomes/Mouse/ensembl_genome/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf —t exon -g gene_id -s 2 -p -R -M sorted_63* -o output
The individual output files look fine, but there seems to be something wrong with the combined output table. Here, counts from the first two libraries appear to be missing. For example, if I take one particular gene, ENSMUSG00000029614:
>grep "ENSMUSG00000029614" output
ENSMUSG00000029614 5;5;5;5;5;5 121204481;121205406;121206638;121207077;121208196;121208782 121204552;121206445;121206810;121207384;121208575;121209241 +;+;+;+;+;+ 2433 36508 47652 50431 11667 15455 75749 15577 27682 67064 14802 12306 26099 55411 17297 52910 22243 29685 18242 36564 21280 31884 10634 75043 22386 31312 17584 5298 27524 13846 14408 21197
As you can see, the first 6 fields are the usual ones from featureCounts:
Geneid Chr Start End Strand Length
ENSMUSG00000029614 5;5;5;5;5;5 121204481;121205406;121206638;121207077;121208196;121208782 121204552;121206445;121206810;121207384;121208575;121209241 +;+;+;+;+;+ 2433
After this, there should be the counts from each of the 33 libraries (6346-6378), but there are only 31 (starting with 36508).
To investigate further, I looked at the individual outputs:
[ls299@themonster ensembl_genome]grep -c "ENSMUSG00000029614" sorted_63*.bam.featureCounts
sorted_6346.bam.featureCounts:32761
sorted_6347.bam.featureCounts:31802
sorted_6348.bam.featureCounts:36508
sorted_6349.bam.featureCounts:47652
sorted_6350.bam.featureCounts:50431
sorted_6351.bam.featureCounts:11667
sorted_6352.bam.featureCounts:15455
sorted_6353.bam.featureCounts:75749
sorted_6354.bam.featureCounts:15577
sorted_6355.bam.featureCounts:27682
sorted_6356.bam.featureCounts:67064
sorted_6357.bam.featureCounts:14802
sorted_6358.bam.featureCounts:12306
sorted_6359.bam.featureCounts:26099
sorted_6360.bam.featureCounts:55411
sorted_6361.bam.featureCounts:17297
sorted_6362.bam.featureCounts:52910
sorted_6363.bam.featureCounts:22243
sorted_6364.bam.featureCounts:29685
sorted_6365.bam.featureCounts:18242
sorted_6366.bam.featureCounts:36564
sorted_6367.bam.featureCounts:21280
sorted_6368.bam.featureCounts:31884
sorted_6369.bam.featureCounts:10634
sorted_6370.bam.featureCounts:75043
sorted_6371.bam.featureCounts:22386
sorted_6372.bam.featureCounts:31312
sorted_6373.bam.featureCounts:17584
sorted_6374.bam.featureCounts:5298
sorted_6375.bam.featureCounts:27524
sorted_6376.bam.featureCounts:13846
sorted_6377.bam.featureCounts:14408
sorted_6378.bam.featureCounts:21197
As you can see, there are in fact counts for the first two libraries, it just looks like they are missing in the combined table.
Any ideas as to what’s going on?
Thanks a lot!
Comment