Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counts on single and paired ends reads merged bam file

    Dear all,
    We have made experiments either paired-end AND single-end on the same sample. Next, the 2 corresponding BAM files has been merged, making some difficulties for htseq-count ('pair_alignments' needs a sequence of paired-end alignments).
    A fix is to use the 2 UNmerged BAM files with htseq-count separately, followed by a merge of the results files (simply summing the counts paired-end + single-end).
    Is that approach healthy and recommended?
    Best regards.
    PS: HTSeq rocks!

  • #2
    You might also consider just keeping the single-end and paired-end separate and then using that as a blocking factor in your experimental design. Having said that, if the library-type effect is minimal (as indicated by PCA, clustering, etc.), then you might as well go ahead and sum things...but I'd check that the results are similar enough first.

    Comment


    • #3
      Right.
      I need to update my question, and actually the experiment has been conducted with only paired-end BUT the read quality filtering and trimming downstream steps conducted to the removal of some mate pairs (approx. 10% of pairs become single/orphan).
      So the mixture of single and paired read doesn't come from the biochemistry, but rather from QC filtering.
      We can state that it's healthy to merge back the paired and single in HTSeqc-count, isn't it? Qualitatively speaking et least.
      About the counting, on one hand 1 mapped single read conduct to one count, on the other hand, a paired read will also be counted only once. Don't we overweight the single-end reads by simply summing single + paired end reads?

      Comment


      • #4
        A single read and a pair both describe the position of the fragment that was sequenced. In both cases, you can consider that it's actually the fragment that's getting counted, so then nothing is being given undue weight. The only real objection to that is that single-end reads don't give you the full bounds, so there are cases where they'll lead to slightly inflated counts (e.g., when the other end of the fragment actually overlaps a different feature, but you have no way of knowing this), but the effect of that is likely quite small (again, you could judge this by clustering things).

        Comment


        • #5
          Great answer!
          I agree with that and I'll go on with the proposed strategy, which is the following:
          1. QC of fastq files, trimming ..
          2. Alignment of single read, alignment of paired reads
          3. HTSeq-count of single reads, HTSeq-count of paired reads
          4. Sum of counts for each gene of single reads + paired reads
          5. Happy EdgeR or DESeq or whatever...

          Thank you so much for your help!!

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin


            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
            Today, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          37 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          41 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          35 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          54 views
          0 likes
          Last Post seqadmin  
          Working...
          X