Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by BADE View Post
    I performed the quality trimming also but the number of surviving reads was only 66% (described in earlier post). I am not sure how many will align to the genome.
    From the quality plots, i would guess the machine had some kind of problem during the cycle 2 and 3 in the reverse run. This could be caused by environmental problems (e.g. external vibration) or within the machine (bubbles in the flow cell etc). If you have access to the 'per tile' quality information there may be a clear pattern to the low quality reads.

    I would suggest you perform a 'HEADCROP' of 3 bp on the data, immediately after the ILLUMINACLIP step, and before the other quality filtering steps. This will simply drop the problem bases entirely from all reads.

    Alternatively, you could widen the window/lower the threshold of the SLIDINGWINDOW, which would help bridge these dodgy patches. However, even after getting the dodgy data past the trimming stage, you still have the issue of aligning it so the HEADCROP might be better.

    You might also want to consider more liberal alignment settings in Tophat, since many of the reads are probably failing due to these poor quality bases.

    Comment


    • #17
      Hi tonybolger,

      I would suggest you perform a 'HEADCROP' of 3 bp on the data, immediately after the ILLUMINACLIP step, and before the other quality filtering steps. This will simply drop the problem bases entirely from all reads.
      I performed trimmomatic as per suggested settings. Here is my code and output:
      Code:
      TrimmomaticPE: Started with arguments: -threads 28 -phred33 _WT_CTTGTA_L001_R1_001.fastq _WT_CTTGTA_L001_R2_001.fastq Out_paired_WT_CTTGTA_L001_R1_001.fastq.gz Out_unpaired_WT_CTTGTA_L001_R1_001.fastq.gz Out_paired_WT_CTTGTA_L001_R2_001.fastq.gz Out_unpaired_WT_CTTGTA_L001_R2_001.fastq.gz ILLUMINACLIP:/home/kakrana/tools/Trimmomatic-0.32/TruSeq3-PE-2.fa:2:30:10:8:true HEADCROP:3 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
      output:
      Input Read Pairs: 22060013 Both Surviving: 14588404 (66.13%) Forward Only Surviving: 6416736 (29.09%) Reverse Only Surviving: 295779 (1.34%) Dropped: 759094 (3.44%)

      As you see HEADCROP is not helping in getting more survive reads. What do you suggest? Should I use these paired reads to do the TopHat with changed settings?

      You might also want to consider more liberal alignment settings in Tophat, since many of the reads are probably failing due to these poor quality bases.
      For the previous TopHat run I used the below standard analysis options:
      * FASTQ Quality Scale: Sanger (PHRED33)
      * Anchor length: 8
      * Maximum number of mismatches that can appear in the anchor region of spliced alignment: 0
      * The minimum intron length: 70
      * The maximum intron length: 50000
      * Minimum isoform fraction: 0.15
      * Maximum number of alignments to be allowed: 20
      * Minimum intron length that may be found during split-segment (default) search: 50
      * Maximum intron length that may be found during split-segment (default) search: 500000
      * Number of mismatches allowed in each segment alignment for reads mapped independently: 2
      * Minimum length of read segments: 20
      * Mate-Pair Inner Distance: 50
      * Bowtie 2 speed and sensitivity: Sensitive (slower)

      Would you please elaborate more about what setting should I use?

      Thanks for your suggestions.

      Comment


      • #18
        Originally posted by BADE View Post
        Input Read Pairs: 22060013 Both Surviving: 14588404 (66.13%) Forward Only Surviving: 6416736 (29.09%) Reverse Only Surviving: 295779 (1.34%) Dropped: 759094 (3.44%)

        As you see HEADCROP is not helping in getting more survive reads. What do you suggest?
        OK, not as much improvement as i hoped. I guess you will also need to be a bit more liberal with the SLIDINGWINDOW - maybe an average of 10 or 12, rather than 15. You could already get a major improvement with 5, but that is very liberal.

        You can also remove the MINLENGTH to see precisely how short the reads are getting after filtering.

        Another alternative is the MAXINFO quality filter mode (rather than sliding window) - it adaptively gets stricter during the read, so almost all reads will get close to the target length.

        Originally posted by BADE View Post
        Should I use these paired reads to do the TopHat with changed settings?
        I think you need to gain a few more reads, and then try to gain alignment rate with tophat. I would try alternative settings of --initial-read-mismatches and --segment-mismatches.

        You could also consider aligning against the reference transcriptome - it won't get you the alternative splicing, but it will indicate if the reads are mostly ok with a few errors, or completely random, since most standard aligners are a bit more liberal than tophat.

        In any case, given the quality plots and the mapping rates, it is a question of how much effort you want to spend on such low-quality data.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          05-06-2024, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 05-10-2024, 06:35 AM
        0 responses
        19 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-09-2024, 02:46 PM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-07-2024, 06:57 AM
        0 responses
        20 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-06-2024, 07:17 AM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Working...
        X