![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to count the reads mapping to different regions from tophat output | vivienne_lovely | RNA Sequencing | 5 | 04-19-2016 11:39 AM |
Count the Number of Paired-End Reads Mapped by Tophat | Fernas | Bioinformatics | 5 | 10-23-2015 03:26 AM |
Count intervals of multiply mapped reads overlapping the genome | sunkissbean | Bioinformatics | 2 | 01-08-2015 07:58 AM |
HTSeq-count large proportion of mapped reads with no_feature | edm1 | Bioinformatics | 4 | 05-26-2014 09:28 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: czech Join Date: May 2015
Posts: 65
|
![]()
Hi all,
I just mapped the RNA-Seqs by Tophat and got the output file accepted_hit.bam. I also have gtf file. I want to count the reads mapping to rRNA regions and also other. I hope featurecounts can be used. But could anyone please tell me how to give the options and different parameters. Any reply will be appreciated. Thank you very much. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]()
For reference cross-posted (and mostly answered): https://www.biostars.org/p/187338
@bvk: Please check the links for featureCounts page posted and read featureCounts manual starting at page #27. |
![]() |
![]() |
![]() |
#3 |
Member
Location: czech Join Date: May 2015
Posts: 65
|
![]()
Hi,
As you said I had a look in the manual. featureCounts -p -t exon -g gene_id -a hg19_RefSeq.gtf -o counts.txt accepted_hits.bam This Summarize paired-end reads and count fragments (instead of reads). This is the output I got. //========================== featureCounts setting ===========================\\ || || || Input files : 1 BAM file || || P MCF10A_BaP_pool1_R1_postTrimming/accepted_ ... || || || || Output file : counts.txt || || Annotations : hg19_RefSeq.gtf (GTF) || || || || Threads : 1 || || Level : meta-feature level || || Paired-end : yes || || Strand specific : no || || Multimapping reads : not counted || || Multi-overlapping reads : not counted || || Read orientations : fr || || || || Chimeric reads : counted || || Both ends mapped : not required || || || \\===================== http://subread.sourceforge.net/ ======================// //================================= Running ==================================\\ || || || Load annotation file hg19_RefSeq.gtf ... || || Features : 432390 || || Meta-features : 24019 || || Chromosomes/contigs : 24 || || || || Process BAM file MCF10A_BaP_pool1_R1_postTrimming/accepted_hits.bam... || || Paired-end reads are included. || || Assign fragments (read pairs) to features... || || || || WARNING: reads from the same pair were found not adjacent to each || || other in the input (due to read sorting by location or || || reporting of multi-mapping read pairs). || || || || Read re-ordering is performed. || || || || Total fragments : 56783755 || || Successfully assigned fragments : 31088269 (54.7%) || || Running time : 4.13 minutes || || || || Read assignment finished. || || || \\===================== http://subread.sourceforge.net/ ======================// Is this right one to count reads mapped to rRNA regions. Could you please help me. Thank you Last edited by bvk; 04-20-2016 at 03:16 AM. |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Vienna Join Date: Oct 2011
Posts: 123
|
![]()
There are a couple of annotations for rRNA regions (e.g. the repeat masker) which you can use to count reads in these regions. Nevertheless, rRNA is located in clusters which itself have repeats. Most reads originating from rRNA would map too many times and would be discarded then by Tophat.
The easiest way of counting reads mapping to the rRNA is to generate a bowtie2 index from the rRNA sequences (28S, 5.8S, 18S, 5S, the 45S pre-ribosomal cluster, and the mt-rRNAs) and map your reads with bowtie2. |
![]() |
![]() |
![]() |
#5 |
Member
Location: czech Join Date: May 2015
Posts: 65
|
![]()
Actually the samples which I have are rRNA depleted but from the person who did sequencing told me there may be some present still. As I already have gtf and bam files I'm trying to use featureCounts to count the reads mapped to rRNA regions.
|
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]()
If you use a GTF file that only has genes (and things that you are interested in) then the resulting counts file only has counts for those features. Unless you are actually interested in rRNA biology you can safely ignore the reads that mapped to rRNA past this point.
BTW: You should look at the contents of counts.txt.summary file or post it here. I suggest that you stop worrying about rRNA (assuming you are not interested in rRNA) and move on with your DE analysis. Last edited by GenoMax; 04-20-2016 at 04:53 AM. |
![]() |
![]() |
![]() |
#7 |
Member
Location: czech Join Date: May 2015
Posts: 65
|
![]()
Here it is... the contents of counts.txt.summary file
Status MCF10A_BaP_pool1_R1_postTrimming/accepted_hits.bam Assigned 31088269 Unassigned_Ambiguity 613410 Unassigned_MultiMapping 15375382 Unassigned_NoFeatures 9706694 Unassigned_Unmapped 0 Unassigned_MappingQuality 0 Unassigned_FragmentLength 0 Unassigned_Chimera 0 Unassigned_Secondary 0 Unassigned_Nonjunction 0 Unassigned_Duplicate 0 Yes actually Im interested in rRNA biology. I guess the gtf file doesn't have that info. So, how can I get a gtf file with only rRNA regions? |
![]() |
![]() |
![]() |
#8 |
Member
Location: czech Join Date: May 2015
Posts: 65
|
![]()
Ok. Just now I searched I can get it from UCSC table browser. So now how can I give the arguments and options for featureCounts. As I said before the command which I gave gives fragments.
I need read pairs mapped to rRNA regions. |
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]()
What does the following command produce when you run it on the GTF file you used?
Code:
$ grep rRNA your_GTF_file At this point you have two choices. You can either map your reads against just the rDNA repeat (that I had linked in Biostars thread) or use a different genome build (e.g. Ensembl verify that it has rRNA annotations) and re-do your Tophat alignments. If you are interested in rRNA biology then why were these samples ribo-depleted? You have likely lost most of the interesting data you would have wanted in this process. |
![]() |
![]() |
![]() |
#10 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]()
Fragments are represented by read pairs (reads sample the two ends of the fragment).
|
![]() |
![]() |
![]() |
#11 |
Member
Location: czech Join Date: May 2015
Posts: 65
|
![]()
But I found a link which is telling how to get rRNA gtf file from UCSC table browser. check this once.
http://onetipperday.sterding.com/201...csc-table.html Yes, as my samples are ribo depleted the person who did sequencing said me that there me be some rRNA still present. So, they also want the reads which are mapped to rRNA. Can you give me the right information whether it can be done or not. Thank you |
![]() |
![]() |
![]() |
#12 | |
Member
Location: czech Join Date: May 2015
Posts: 65
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]()
IF the rRNA GTF file you downloaded from UCSC is for the exact genome build of the reference you used (and assuming that reference was also from UCSC) for doing TopHat alignments then you can use that GTF file to count the reads that map to the features defined in the rRNA GTF file.
|
![]() |
![]() |
![]() |
#14 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]() |
![]() |
![]() |
![]() |
#15 | |
Member
Location: czech Join Date: May 2015
Posts: 65
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#16 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]() |
![]() |
![]() |
![]() |
#17 | |
Member
Location: czech Join Date: May 2015
Posts: 65
|
![]() Quote:
featureCounts -p -t exon -g gene_id -a hg19_rRNA -o MCF10A_BaP_pool1_counts.txt MCF10A_BaP_pool1_R1_postTrimming/accepted_hits.bam //========================== featureCounts setting ===========================\\ || || || Input files : 1 BAM file || || P MCF10A_BaP_pool1_R1_postTrimming/accepted_ ... || || || || Output file : MCF10A_BaP_pool1_counts.txt || || Annotations : hg19_rRNA (GTF) || || || || Threads : 1 || || Level : meta-feature level || || Paired-end : yes || || Strand specific : no || || Multimapping reads : not counted || || Multi-overlapping reads : not counted || || Read orientations : fr || || || || Chimeric reads : counted || || Both ends mapped : not required || || || \\===================== http://subread.sourceforge.net/ ======================// //================================= Running ==================================\\ || || || Load annotation file hg19_rRNA ... || || Features : 1769 || || Meta-features : 3 || || Chromosomes/contigs : 38 || || || || Process BAM file MCF10A_BaP_pool1_R1_postTrimming/accepted_hits.bam... || || Paired-end reads are included. || || Assign fragments (read pairs) to features... || || || || WARNING: reads from the same pair were found not adjacent to each || || other in the input (due to read sorting by location or || || reporting of multi-mapping read pairs). || || || || Read re-ordering is performed. || || || || Total fragments : 56783755 || || Successfully assigned fragments : 246578 (0.4%) || || Running time : 3.98 minutes || || || || Read assignment finished. || || || \\===================== http://subread.sourceforge.net/ ======================// Thank you |
|
![]() |
![]() |
![]() |
#18 | |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]()
Provided the input files were correct this is the relevant bit
Quote:
There is also this warning to keep in mind || WARNING: reads from the same pair were found not adjacent to each || || other in the input (due to read sorting by location or || || reporting of multi-mapping read pairs). || Last edited by GenoMax; 04-20-2016 at 05:58 AM. |
|
![]() |
![]() |
![]() |
#19 | |
Member
Location: czech Join Date: May 2015
Posts: 65
|
![]() Quote:
what do you mean here (Does not help you if you actually were interested in rRNA.) I didn't get this. There are 246578 read pairs aligned to rRNA regions. Could you please tell clearly about the warning. And finally is this right or not? If not how can I do now? Last edited by bvk; 04-20-2016 at 06:24 AM. |
|
![]() |
![]() |
![]() |
#20 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]()
Ribodepletion has worked well (only 0.4% reads align to rRNA).
Fragments that go into a library have a size distribution and generally it is possible to infer the size of the fragment based on the alignments to a reference. The warning indicates that in some cases the reads were found to be not at the expected distance from each other. This is to be expected since you are looking at a repeat region and the annotation is not perfect. You have said you (or the person who did the experiment, if you are only analyzing the data) are interested in rRNA biology yet the samples are rRNA depleted. So those two observations are contradictory. Aside from this we don't have any information about what this experiment was for (e.g. were you only interested in finding expression levels of genes but not differential expression or are there groups of samples that need to be analyzed for differential expression) so only you have an idea as to what is to be done next. |
![]() |
![]() |
![]() |
Tags |
feature counts, reads counts, rnaseq analysis |
Thread Tools | |
|
|