Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Htseq counting

    Hi, all
    I ever discussed the below questions with some members here, but not got clear answer yet. Anyone can help?

    For pair-end reads, if the read1 is uniquely mapping, but its mate partner read2 is mapped in multiple places, how htseq deal with this situation? Count as one or throw this paired read?

    After sorting my SAM file, it seems my SAM files is mixed with both single end and pair-end, to be clearer, the below is an example (only read name):

    HWI-ST1106:1870NMUACXX:6:1101:10001:124778
    HWI-ST1106:1870NMUACXX:6:1101:10001:124778
    HWI-ST1106:1870NMUACXX:6:1101:10001:137376
    HWI-ST1106:1870NMUACXX:6:1101:10001:138830
    HWI-ST1106:1870NMUACXX:6:1101:10001:138830

    My fastq file is properly paired. Looked my sam file, it seems that many one side read failed to align onto genom and thus only single end read is remained in SAM file.

    Will the third uniquely mapped read "...137376" be counted by htseq? Before using Htseq, these scattered "single-end reads" should be filtered first?

    To prevent producing this type of "single-end reads" in sam, from the tophat manual, it seems the parameter "--no-mixed" just do the work (only report alignments both pair reads mapped?). Is my understanding right?

    thanks!

    maize

  • #2
    htseq

    Hi, all
    Another interesting thing I did is to treat pair-end reads data as "single-end" data to run throught tophat-htseq. In this case, I donot need to struggle with the above issue when using single-end data. The below is an example table from htseq. The end of table is a summary of counting, in which I added "on_feature", "total", and "on_feature(%)". Using pair-end reads does increase on feature ratio by 8%. The number between pair-end and single-end for each gene is interesting.

    gene pair-end single-end
    GRMZM2G061626 7 8
    GRMZM2G061629 13 24
    GRMZM2G061655 2 2
    GRMZM2G061662 61 111
    GRMZM2G061663 55 93
    GRMZM2G061672 128 185
    GRMZM2G061681 202 375
    GRMZM2G061684 20 38
    GRMZM2G061695 12 19
    GRMZM2G061700 19 20
    GRMZM2G061702 74 113
    no_feature 225430 352813
    ambiguous 95104 161366
    too_low_aQual 0 0
    not_aligned 0 0
    alignment_not_unique 3325479 8191346
    total 8939003 17842485
    on_feature 5292990 9136960
    on_feature(%) 0.592123081 0.512090104

    Comment


    • #3
      an interesting observations by the developers of RSEM (http://www.biomedcentral.com/1471-2105/12/323) is that for gene expression you are actually going to get better results from single-end, short reads (think 50bp reads) over long paired-end reads. naturally the paired-end reads provide much better evidence of isoform structures over single-end. in my tests in the past i've observed that if i aligned only the left side reads from paired data verses aligning the pairs there was negligible difference between the expressions. in reality if you're aligning just the left side of each pair it's not much different from what's going on with single end reads anyways. in both cases you're aligning a read from one side of a fragment.

      to address your original question i'm not sure if HTSeq does anything about those random unmated alignments. i know that it needs pairs next to each other in the file in order for it to properly count (since each pair should count as 1 and not 2). if you have sorted the alignments by read name then in the case that both sides of of an unmated pair actually aligned their names would appear next to each other in the SAM file. that isn't the case in what you posted so my guess is HTSeq is going to count that 3rd alignment towards whatever feature it aligned to. if it were the case that both ends aligned but didn't pair i'm not sure what it would do. it might detect that the same read name aligned to different features and then throw it out.

      i think your test in your second post demonstrates the improved alignment confidence of paired-end reads. it's interesting, however, that if you compare differential expression between the paired-end alignments and single-end alignments from the paired data the results are similar if not identical. the improved alignment accuracy is desirable for other types of analysis though such as splicing or mutations.
      /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
      Salk Institute for Biological Studies, La Jolla, CA, USA */

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      31 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      32 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Working...
      X