SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
featureCounts vs bedtools multicov mslider Bioinformatics 0 10-01-2015 12:07 PM
procedure for extracting raw read counts NitaC Bioinformatics 5 04-30-2013 02:55 AM
Raw read counts for RNAseq biofreak Introductions 13 01-16-2013 05:28 AM
Raw read counts per Exon biofreak Bioinformatics 3 05-03-2012 10:29 AM
Raw read counts for RNAseq biofreak RNA Sequencing 2 06-15-2011 05:56 AM

Reply
 
Thread Tools
Old 11-05-2016, 03:41 AM   #1
yaseen.ladak
Junior Member
 
Location: Cambridge

Join Date: Nov 2013
Posts: 7
Default Raw counts - FeatureCounts (Rsubread), HT Seq and BEdtools

Dear All,
I would really appreciate some help.
I have a aligned BAM file which is sorted. This file was obtained from paired end sequencing. I am trying to find the raw counts which fall in the regions mentioned in the GTF.

The GTF is below:
*********************************
1dh10b ena exon 12163 14079 . + . gene_id "ECDH10B_0014"; transcript_id "ACB01219"; exon_number "1"; gene_name "dnaK"; gene_source "ena"; gene_biotype "protein_coding"; transcript_name "dnaK-1"; transcript_source "ena"; transcript_biotype "protein_coding"; exon_id "ACB01219-1";
1dh10b ena exon 4040647 4043550 . + . gene_id "ECDH10B_3947"; transcript_id "ECDH10B_3947-1"; exon_number "1"; gene_name "rrlC"; gene_source "ena"; gene_biotype "rRNA"; transcript_name "rrlC"; transcript_source "ena"; transcript_biotype "rRNA"; exon_id "ECDH10B_3947-1";
***************************************

When I ran Rsubreads in the paired end command and results mentioned below:

*************
The code for feature count:


fc1<-featureCounts(files=c("R1.bam"),
annot.ext="/data/tellis/analysis/genome/combined/dh10b_gfp_psb1c3_h3/test_part.gtf",
isGTFAnnotationFile=TRUE,GTF.featureType="exon",
isPairedEnd=TRUE)
dge1 <- DGEList(counts=fc1$counts, genes=fc1$annotation)
write.table(dge1$counts,file="/data/tellis/analysis/genome/combined/dh10b_gfp_psb1c3_h3/test_part_gtf_pe.txt",sep="\t")

Result obtained:
"R1.bam"
"ECDH10B_0014" 6528
"ECDH10B_3947" 255754

When I run the above with single end mode i.e. isPairedEnd=FALSE, I get the following results:
"R1.bam"
"ECDH10B_0014" 12049
"ECDH10B_3947" 440677

I wanted to double check this with bedtools:
bedtools coverage -a test_part.gtf -b R1.bam > cov_prt_gtf_bam.txt

The counts I got were:
ECDH10B_0014 12049
ECDH10B_3947 4040647

However, because bedtools results dont account for paired end the results were similar to feature counts in single end mode.

But when I ran HTseq command as below
htseq-count -f bam -s no R1.bam test_part.gtf > htseq_test_part_gtf.txt
Results
ECDH10B_3947 0
ECDH10B_0014 6509

When I checked on the IGV browser for the location on where the gene ECDH10B_3947 is I could hardly find any reads but bedtools in single mode picked up counts similar to feature counts single mode. There were also almost half the counts in feature counts in paired end mode. But the HT seq counts were zero and on IGV also there were no reads. ITs strange that for ECDH10B_0014 gene the counts of HT seq was similar to feature count in paired end mode.

Can someone please help? I wold be grateful.

Thanks,
Yaseen



***************
yaseen.ladak is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:27 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO