Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • featureCounts option question

    New to use featureCounts on RNA-seq analysis, my data is polyA enriched, stranded, single end Illumina reads.

    My goal is to do differential expression analysis between control and case groups. I plan to use DEseq2 to do the DE analysis after featureCounts.

    I have a few questions:

    1. I'm wondering if it's best to use -M −−fraction options or −−primary option or neither? I understand in ChIP-seq, people often only keep uniquely mapped reads, not sure about RNA-seq and also whether to only keep primary alignments. My feeling is that it's best to use --primary option.

    -M
    If specified, multi-mapping reads/fragments will be counted. A multi-mapping read will be counted up to N times if it has N reported mapping locations. The program uses the ‘NH’ tag to find multi-mapping reads.

    −−fraction
    If specified, a fractional count 1/n will be generated for each multi-mapping read, where n is the number of alignments (in- dicated by ‘NH’ tag) reported for the read. This option must be used together with the ‘-M’ option.
    −−primary
    If specified, only primary alignments will be counted. Primary and secondary alignments are identified using bit 0x100 in the Flag field of SAM/BAM files. All primary alignments in a dataset will be counted no matter they are from multi- mapping reads or not (ie. ‘-M’ is ignored).
    2. I read from many sources saying that it's normal to observe high level of duplicated reads for RNA-seq. So is it best not to use −−ignoreDup option?

    3. My current command line looks like this:

    Code:
    featureCounts -t exon -g gene_id -a genes.gtf -F GTF -o outfile.txt -s 1 −−primary input.bam
    Please let me know if there is some other options that I better use.

    Thanks!
    Last edited by gene_x; 08-09-2016, 09:44 AM.

  • #2
    How did you handle the multimappers in your alignment program? Did you use one of these options (for example this is what BBMap allows)

    Code:
    best    (use the first best site)
    toss    (consider unmapped)
    random  (select one top-scoring site randomly)
    all     (retain all top-scoring sites)

    Comment


    • #3
      Good point.

      I used hisat2 to do alignment and I think the default setting is -k option at

      -k <int>
      It searches for at most <int> distinct, primary alignments for each read. Primary alignments mean alignments whose alignment score is equal or higher than any other alignments.

      Default: 5 (HFM)
      Then I guess I don't really need --primary option here because all the reported alignments are primary.

      But still not sure if I should keep these multi-mapping reads at all. I read in a best practice paper saying tools including featureCounts often discard these multi-mapping reads whereas these newer ones (Sailfish/Salmon, Kallisto, RSEM) keep them.


      Originally posted by GenoMax View Post
      How did you handle the multimappers in your alignment program? Did you use one of these options (for example this is what BBMap allows)

      Code:
      best    (use the first best site)
      toss    (consider unmapped)
      random  (select one top-scoring site randomly)
      all     (retain all top-scoring sites)

      Comment


      • #4
        Having k set to 5 means you only count that many positions (even if there are more). Using "random" option with BBMap does not throw information away but does not overcount at the same time.

        If "mapping" (not precise) the reads is ok instead of alignment then the newer tools you mention are fast option.

        Comment


        • #5
          One clarification, in (classical) RNAseq multimappers are excluded (I'm counting Salmon/Kallisto/et al. as non-classical). In ChIPseq, primary alignments from multimappers are typically included.

          Comment


          • #6
            really? Could you provide a reference for the treatment of multimappers in ChIP-seq? To the contrary, I believe they are discarded and only uniquely mapped reads are kept.

            Originally posted by dpryan View Post
            One clarification, in (classical) RNAseq multimappers are excluded (I'm counting Salmon/Kallisto/et al. as non-classical). In ChIPseq, primary alignments from multimappers are typically included.

            Comment


            • #7
              I'll see if I can find a reference when I'm in the office tomorrow. Using only "unique alignments" prevents finding peaks in genes with upstream repeats (there are a number of them) and expressed repeats (we have a large group working on them).

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              23 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X