SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Question about --genemodel=complete option of Augustus evolver Bioinformatics 1 10-19-2015 02:36 AM
featureCounts SylvainL Bioinformatics 4 03-16-2015 03:35 AM
Bug in featureCounts? brinda Bioinformatics 1 01-23-2014 06:35 PM
question about VariantRecalibrator's mode option caswater Bioinformatics 0 04-16-2012 12:34 AM
A question about tophat -g/--max-multihits option keycard Bioinformatics 2 12-29-2010 04:36 AM

Reply
 
Thread Tools
Old 08-09-2016, 09:34 AM   #1
gene_x
Senior Member
 
Location: MO

Join Date: May 2010
Posts: 108
Default featureCounts option question

New to use featureCounts on RNA-seq analysis, my data is polyA enriched, stranded, single end Illumina reads.

My goal is to do differential expression analysis between control and case groups. I plan to use DEseq2 to do the DE analysis after featureCounts.

I have a few questions:

1. I'm wondering if it's best to use -M −−fraction options or −−primary option or neither? I understand in ChIP-seq, people often only keep uniquely mapped reads, not sure about RNA-seq and also whether to only keep primary alignments. My feeling is that it's best to use --primary option.

Quote:
-M
If specified, multi-mapping reads/fragments will be counted. A multi-mapping read will be counted up to N times if it has N reported mapping locations. The program uses the ‘NH’ tag to find multi-mapping reads.

−−fraction
If specified, a fractional count 1/n will be generated for each multi-mapping read, where n is the number of alignments (in- dicated by ‘NH’ tag) reported for the read. This option must be used together with the ‘-M’ option.
Quote:
−−primary
If specified, only primary alignments will be counted. Primary and secondary alignments are identified using bit 0x100 in the Flag field of SAM/BAM files. All primary alignments in a dataset will be counted no matter they are from multi- mapping reads or not (ie. ‘-M’ is ignored).
2. I read from many sources saying that it's normal to observe high level of duplicated reads for RNA-seq. So is it best not to use −−ignoreDup option?

3. My current command line looks like this:

Code:
featureCounts -t exon -g gene_id -a genes.gtf -F GTF -o outfile.txt -s 1 −−primary input.bam
Please let me know if there is some other options that I better use.

Thanks!

Last edited by gene_x; 08-09-2016 at 09:44 AM.
gene_x is offline   Reply With Quote
Old 08-09-2016, 09:52 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,033
Default

How did you handle the multimappers in your alignment program? Did you use one of these options (for example this is what BBMap allows)

Code:
best    (use the first best site)
toss    (consider unmapped)
random  (select one top-scoring site randomly)
all     (retain all top-scoring sites)
GenoMax is offline   Reply With Quote
Old 08-09-2016, 10:01 AM   #3
gene_x
Senior Member
 
Location: MO

Join Date: May 2010
Posts: 108
Default

Good point.

I used hisat2 to do alignment and I think the default setting is -k option at

Quote:
-k <int>
It searches for at most <int> distinct, primary alignments for each read. Primary alignments mean alignments whose alignment score is equal or higher than any other alignments.

Default: 5 (HFM)
Then I guess I don't really need --primary option here because all the reported alignments are primary.

But still not sure if I should keep these multi-mapping reads at all. I read in a best practice paper saying tools including featureCounts often discard these multi-mapping reads whereas these newer ones (Sailfish/Salmon, Kallisto, RSEM) keep them.


Quote:
Originally Posted by GenoMax View Post
How did you handle the multimappers in your alignment program? Did you use one of these options (for example this is what BBMap allows)

Code:
best    (use the first best site)
toss    (consider unmapped)
random  (select one top-scoring site randomly)
all     (retain all top-scoring sites)
gene_x is offline   Reply With Quote
Old 08-09-2016, 10:28 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,033
Default

Having k set to 5 means you only count that many positions (even if there are more). Using "random" option with BBMap does not throw information away but does not overcount at the same time.

If "mapping" (not precise) the reads is ok instead of alignment then the newer tools you mention are fast option.
GenoMax is offline   Reply With Quote
Old 08-09-2016, 10:47 AM   #5
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

One clarification, in (classical) RNAseq multimappers are excluded (I'm counting Salmon/Kallisto/et al. as non-classical). In ChIPseq, primary alignments from multimappers are typically included.
dpryan is offline   Reply With Quote
Old 08-09-2016, 11:02 AM   #6
gene_x
Senior Member
 
Location: MO

Join Date: May 2010
Posts: 108
Default

really? Could you provide a reference for the treatment of multimappers in ChIP-seq? To the contrary, I believe they are discarded and only uniquely mapped reads are kept.

Quote:
Originally Posted by dpryan View Post
One clarification, in (classical) RNAseq multimappers are excluded (I'm counting Salmon/Kallisto/et al. as non-classical). In ChIPseq, primary alignments from multimappers are typically included.
gene_x is offline   Reply With Quote
Old 08-09-2016, 11:23 AM   #7
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I'll see if I can find a reference when I'm in the office tomorrow. Using only "unique alignments" prevents finding peaks in genes with upstream repeats (there are a number of them) and expressed repeats (we have a large group working on them).
dpryan is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:32 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO