Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by BADE View Post
    I performed the quality trimming also but the number of surviving reads was only 66% (described in earlier post). I am not sure how many will align to the genome.
    From the quality plots, i would guess the machine had some kind of problem during the cycle 2 and 3 in the reverse run. This could be caused by environmental problems (e.g. external vibration) or within the machine (bubbles in the flow cell etc). If you have access to the 'per tile' quality information there may be a clear pattern to the low quality reads.

    I would suggest you perform a 'HEADCROP' of 3 bp on the data, immediately after the ILLUMINACLIP step, and before the other quality filtering steps. This will simply drop the problem bases entirely from all reads.

    Alternatively, you could widen the window/lower the threshold of the SLIDINGWINDOW, which would help bridge these dodgy patches. However, even after getting the dodgy data past the trimming stage, you still have the issue of aligning it so the HEADCROP might be better.

    You might also want to consider more liberal alignment settings in Tophat, since many of the reads are probably failing due to these poor quality bases.

    Comment


    • #17
      Hi tonybolger,

      I would suggest you perform a 'HEADCROP' of 3 bp on the data, immediately after the ILLUMINACLIP step, and before the other quality filtering steps. This will simply drop the problem bases entirely from all reads.
      I performed trimmomatic as per suggested settings. Here is my code and output:
      Code:
      TrimmomaticPE: Started with arguments: -threads 28 -phred33 _WT_CTTGTA_L001_R1_001.fastq _WT_CTTGTA_L001_R2_001.fastq Out_paired_WT_CTTGTA_L001_R1_001.fastq.gz Out_unpaired_WT_CTTGTA_L001_R1_001.fastq.gz Out_paired_WT_CTTGTA_L001_R2_001.fastq.gz Out_unpaired_WT_CTTGTA_L001_R2_001.fastq.gz ILLUMINACLIP:/home/kakrana/tools/Trimmomatic-0.32/TruSeq3-PE-2.fa:2:30:10:8:true HEADCROP:3 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
      output:
      Input Read Pairs: 22060013 Both Surviving: 14588404 (66.13%) Forward Only Surviving: 6416736 (29.09%) Reverse Only Surviving: 295779 (1.34%) Dropped: 759094 (3.44%)

      As you see HEADCROP is not helping in getting more survive reads. What do you suggest? Should I use these paired reads to do the TopHat with changed settings?

      You might also want to consider more liberal alignment settings in Tophat, since many of the reads are probably failing due to these poor quality bases.
      For the previous TopHat run I used the below standard analysis options:
      * FASTQ Quality Scale: Sanger (PHRED33)
      * Anchor length: 8
      * Maximum number of mismatches that can appear in the anchor region of spliced alignment: 0
      * The minimum intron length: 70
      * The maximum intron length: 50000
      * Minimum isoform fraction: 0.15
      * Maximum number of alignments to be allowed: 20
      * Minimum intron length that may be found during split-segment (default) search: 50
      * Maximum intron length that may be found during split-segment (default) search: 500000
      * Number of mismatches allowed in each segment alignment for reads mapped independently: 2
      * Minimum length of read segments: 20
      * Mate-Pair Inner Distance: 50
      * Bowtie 2 speed and sensitivity: Sensitive (slower)

      Would you please elaborate more about what setting should I use?

      Thanks for your suggestions.

      Comment


      • #18
        Originally posted by BADE View Post
        Input Read Pairs: 22060013 Both Surviving: 14588404 (66.13%) Forward Only Surviving: 6416736 (29.09%) Reverse Only Surviving: 295779 (1.34%) Dropped: 759094 (3.44%)

        As you see HEADCROP is not helping in getting more survive reads. What do you suggest?
        OK, not as much improvement as i hoped. I guess you will also need to be a bit more liberal with the SLIDINGWINDOW - maybe an average of 10 or 12, rather than 15. You could already get a major improvement with 5, but that is very liberal.

        You can also remove the MINLENGTH to see precisely how short the reads are getting after filtering.

        Another alternative is the MAXINFO quality filter mode (rather than sliding window) - it adaptively gets stricter during the read, so almost all reads will get close to the target length.

        Originally posted by BADE View Post
        Should I use these paired reads to do the TopHat with changed settings?
        I think you need to gain a few more reads, and then try to gain alignment rate with tophat. I would try alternative settings of --initial-read-mismatches and --segment-mismatches.

        You could also consider aligning against the reference transcriptome - it won't get you the alternative splicing, but it will indicate if the reads are mostly ok with a few errors, or completely random, since most standard aligners are a bit more liberal than tophat.

        In any case, given the quality plots and the mapping rates, it is a question of how much effort you want to spend on such low-quality data.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        9 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X