Seqanswers Leaderboard Ad

**GenoMax** · 01-15-2015, 03:20 PM

Since you did the mapping against genome you need to summarize the alignments using a program like featureCounts or HTSeq-count along with an annotation file that will translate the alignments you have into counts per gene/exon (any features included in the annotation file).

You could have also provided that annotation file to TopHat (when you ran it) if you only wanted to look at the transcriptome (instead of the whole genome).

**dpryan** · 01-15-2015, 03:21 PM

That's what should happen. Your next step is to get counts of aligned fragments per gene, for which you can use featureCounts or htseq-count. Both of those expect exactly what you have as input.

Edit: Genomax beat me by a minute. I should note that mapping against the transcriptome with tophat still produces alignments in genomic coordinates.

**analog900** · 01-15-2015, 03:28 PM

Thanks guys!! Appreciate it! I'll give it a try.
Thanks again!

**analog900** · 01-16-2015, 09:47 AM

Using featureCounts gives me nice summary and counts text files. However, my SAM and BAM files still contain the original, genomic, annotations (obviously). Ideally, I would like to convert the annotations in the BAM/SAM files so that I can further process them.

This leads me to a more broader question: what reference (for mouse rna-seq) do people use when they want gene_ids instead of genomic targets?. I noticed that reference files such as mRNA.fa or refMrna.fa only contain accession numbers, but not gene ids.
Thanks in advance

**Brian Bushnell** · 01-16-2015, 10:08 AM

Gene IDs, names, and numbers vary depending on the database in question. You can either get a translation table, or try find a fasta file already named with the identifiers you want to use.

**GenoMax** · 01-16-2015, 10:33 AM

Originally posted by analog900 View Post

Using featureCounts gives me nice summary and counts text files. However, my SAM and BAM files still contain the original, genomic, annotations (obviously). Ideally, I would like to convert the annotations in the BAM/SAM files so that I can further process them.

What is "further processing" referring to here? Most downstream analysis is going to use the counts files (unless you are going to call SNPs from this data) and will always refer to the gene names contained in that file.

**analog900** · 01-16-2015, 10:51 AM

Originally posted by GenoMax View Post

What is "further processing" referring to here? Most downstream analysis is going to use the counts files (unless you are going to call SNPs from this data) and will always refer to the gene names contained in that file.

I've been loosely following the "simple fool's guide for rna seq" by the group of Stephen Palumbi (http://sfg.stanford.edu/guide.html). They parse their SAM output files with a series of python scripts to obtain similar summary statistics like the ones I can now get with featureCounts. Then, they use DESeq for functional enrichment (which I would really like to do in order to compare my different samples).

**dpryan** · 01-16-2015, 11:58 AM

I would recommend ignoring that guide. If you want to use DESeq (use DESeq2), just directly use the counts from featureCounts. This would be the standard and accepted pipeline and there's no reason to use any kludgy scripts.

**analog900** · 01-16-2015, 01:12 PM

Originally posted by dpryan View Post

I would recommend ignoring that guide. If you want to use DESeq (use DESeq2), just directly use the counts from featureCounts. This would be the standard and accepted pipeline and there's no reason to use any kludgy scripts.

Thank you. Really appreciate it! Can you recommend any other standard/accepted pipelines downstream of featureCounts?

**shi** · 01-18-2015, 02:13 PM

We use limma/voom and edgeR in downstream analyses to discover differentially expressed genes. The link below is a short tutorial for using our pipeline for analyzing RNA-seq data which you might find helpful:

404 Not Found

http://bioinf.wehi.edu.au/RNAseqCaseStudy/

**Michael Love** · 01-19-2015, 07:23 AM

for DESeq2 you would use the DESeqDataSetFromMatrix function to start the analysis, using the counts matrix returned by featureCounts. Example of starting from count matrix is in the DESeq2 vignette.

**analog900** · 01-20-2015, 09:56 AM

Thanks so much guys!
Working through the DESeq2 vignette now and learning new stuff... really excited!
Thanks again!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Newbie question regarding mapping of RNA-seq data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News