Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • HISAT Discordant Alignment Rate of RNAseq data was so high

    To run fast and test the result, I splited one paired-end RNAseq fastq to eight pairs, then I aligned the data to unmasked genome reference using HISAT and Tophat2 respectively. It was so strange that the HISAT results of high RNA sequencing amount and low one were so different.
    ####### Below are the HISAT results for one splited paired end fastq: (The other seven results were as similar as below.)
    begin [Thu Aug 27 09:25:07 CST 2015]
    6743045 reads; of these:
    6743045 (100.00%) were paired; of these:
    572503 (8.49%) aligned concordantly 0 times
    5478092 (81.24%) aligned concordantly exactly 1 time
    692450 (10.27%) aligned concordantly >1 times
    ----
    572503 pairs aligned concordantly 0 times; of these:
    35260 (6.16%) aligned discordantly 1 time
    ----
    537243 pairs aligned 0 times concordantly or discordantly; of these:
    1074486 mates make up the pairs; of these:
    937868 (87.29%) aligned 0 times
    98655 (9.18%) aligned exactly 1 time
    37963 (3.53%) aligned >1 times
    93.05% overall alignment rate
    finish [Thu Aug 27 09:26:18 CST 2015]
    ######## Below are the HISAT results of raw paired end fastqs:
    begin [Thu Aug 27 15:32:23 CST 2015]
    53944360 reads; of these:
    53944360 (100.00%) were paired; of these:
    53859198 (99.84%) aligned concordantly 0 times
    173 (0.00%) aligned concordantly exactly 1 time
    84989 (0.16%) aligned concordantly >1 times
    ----
    53859198 pairs aligned concordantly 0 times; of these:
    42847397 (79.55%) aligned discordantly 1 time
    ----
    11011801 pairs aligned 0 times concordantly or discordantly; of these:
    22023602 mates make up the pairs; of these:
    7838984 (35.59%) aligned 0 times
    14547 (0.07%) aligned exactly 1 time
    14170071 (64.34%) aligned >1 times
    92.73% overall alignment rate
    finish [Thu Aug 27 15:57:01 CST 2015]
    ###############
    The HISAT of splited paired end fastq seemed not bad. But I was shocked when I saw the HISAT results of raw paired end fastqs. Why was the discordant alignment rate so high? The input data were same, and the difference was just that the former was raw fastq and the the latter were the splited fastq.

    PS:
    ########## The Tophat2 result for one of splited paired end fastqs
    Left reads:
    Input : 6743045
    Mapped : 6257213 (92.8% of input)
    of these: 537537 ( 8.6%) have multiple alignments (4996 have >20)
    Right reads:
    Input : 6743045
    Mapped : 6234936 (92.5% of input)
    of these: 535149 ( 8.6%) have multiple alignments (4982 have >20)
    92.6% overall read mapping rate.

    Aligned pairs: 6153025
    of these: 526916 ( 8.6%) have multiple alignments
    60596 ( 1.0%) are discordant alignments
    90.4% concordant pair alignment rate.

    ############The result of Tophat2 for raw paired end fastqs:
    Left reads:
    Input : 53944360
    Mapped : 49867482 (92.4% of input)
    of these: 3252071 ( 6.5%) have multiple alignments (50305 have >20)
    Right reads:
    Input : 53944360
    Mapped : 49672436 (92.1% of input)
    of these: 3237335 ( 6.5%) have multiple alignments (50373 have >20)
    92.3% overall read mapping rate.

    Aligned pairs: 48962272
    of these: 3172613 ( 6.5%) have multiple alignments
    427349 ( 0.9%) are discordant alignments
    90.0% concordant pair alignment rate.
    Last edited by skly; 08-27-2015, 10:19 PM.

  • #2
    I assume you just gave the original fastq files in the wrong order to HISAT, since otherwise the subset datasets would have had similar metrics.

    Comment


    • #3
      Yes, I checked the input files and found that I had a mistake~
      Thank U, dpryan.
      Originally posted by dpryan View Post
      I assume you just gave the original fastq files in the wrong order to HISAT, since otherwise the subset datasets would have had similar metrics.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      51 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      68 views
      0 likes
      Last Post seqadmin  
      Working...
      X