Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Eric Fournier
    Member
    • Jul 2011
    • 21

    Suggested aligner for local alignment of RNA-seq data

    Greetings,

    I'm trying to analyze the results of Illumina RNA sequencing (~5x150M 100bp PE reads). One problem that we are facing is that for a very large number of our reads, only the first ~50bp are of actual biological material, with the rest consisting of Illumina primers. Would anyone who has faced a similar problem care to suggest an alignment program/parameters to analyze this kind of data? I've tried using bowtie2, but I either get terrible alignment rates using --end-to-end, or I am unable to get any splice junctions using --local.

    Thank you very much,
    -Eric Fournier
  • pbluescript
    Senior Member
    • Nov 2009
    • 224

    #2
    You can use something like Trimmomatic to trim off the adapter sequences first.

    Comment

    • Eric Fournier
      Member
      • Jul 2011
      • 21

      #3
      Trouble with Trimmomatic

      I've now spent quite a fair amount of time trying to clean up my sequences with Trimmomatic, but I've been unable to find a set of parameters that gets rid of most of the Illumina adapter sequences.

      I've attached an example set of 5 sequences that contain Illumina adapters, as well as the adapter file I've been using (Sequences obtained from UniVec).

      When running the R1 sequences through VecScreen, it is quite obvious that the adapters are present:
      Code:
      Query  53  AGATCGGAAGAGCGGCTCAGCAGGAATGTCGTGACCGATCTCGT  96
                 ||||||||||||||| |||||||||||| || ||||||||||||
      Sbjct  61  AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGT  18
      
      Query  52  AGATCGGAAGAGCGGTTCAGCA  73
                 ||||||||||||||||||||||
      Sbjct  61  AGATCGGAAGAGCGGTTCAGCA  40
      
      Query  48  AGATCGGAAGAGCGGTTCAGCAGGAATGACGAGACCGATCTCGTATGCC  96
                 |||||||||||||||||||||||||||| ||||||||||||||||||||
      Sbjct  61  AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCC  13
      
      Query  52  AGATCGGAAGAGCGGCTCAGCAGGTATGTCGAGACCGATCTCG  94
                 ||||||||||||||| |||||||| ||| ||||||||||||||
      Sbjct  61  AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCG  19
      
      Query  45  AGATCGGAAGAGCGGCTCAGCAGGTATGCCGAGAGCGATCTCGTATG  91
                 ||||||||||||||| |||||||| ||||||||| ||||||||||||
      Sbjct  61  AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATG  15
      And yet unless I use absurd thresholds (9:3:3), most are left untouched. Am I doing something wrong?
      Attached Files
      Last edited by Eric Fournier; 01-14-2013, 12:40 PM. Reason: Fixed in-line alignment spacing

      Comment

      • MeganS
        Member
        • Sep 2010
        • 14

        #4
        I have been using trimmomatic for exactly the same situation and I have been happy with the results. Here is the command I am using:

        Code:
        java -classpath <path_trimmomatic> org.usadellab.trimmomatic.TrimmomaticPE -phred64 file1.fq file2.fq p1.fastq u1.fastq p2.fastq u2.fastq ILLUMINACLIP:./adapter.fasta:2:30:12 SLIDINGWINDOW:4:20 LEADING:10 TRAILING:10
        I think the palindrome clipping is really good, but it took a while to figure out how to get the fasta formatted properly. Here are my entries for palindrome clipping, (note they must "start with 'Prefix', and end in '/1' for the forward adapter and '/2' for the reverse adapter"):

        Code:
        >Prefix/1
        AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
        >Prefix/2
        CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
        Also, check your quality score encoding. Additionally, I modified the ILLUMINACLIP to not require a minimum prefix for palindrome clipping (public final static int MIN_PREFIX = 1; in IlluminaClippingTrimmer.java)

        Comment

        • chadn737
          Senior Member
          • Jan 2009
          • 392

          #5
          An alternative to trimmomatic would be cutadapt.

          Comment

          • Eric Fournier
            Member
            • Jul 2011
            • 21

            #6
            I've moved on from using Trimmomatic to cutadapt, and I've been able to clean up my sequences pretty well. However, I'm running into a new snag: I just can't seem to reliably align spliced reads.

            I've built a test subset of spliced reads (attached), which all align to my reference genome (Bos taurus, UMD3.1) when I use Blast or BLAT. I believe that bowtie2 cannot align spliced reads, so I've moved on to TopHat2. However, I've been unable to find a set of parameters which aligns at least a majority of the spliced reads. My best result so far only aligns 6 of the 18. I am using the following command line:

            Code:
            ./tophat-2.0.6.Linux_x86_64/tophat2 -N 6 --read-gap-length 5 --read-edit-dist 10 --splice-mismatches 2 --library-type fr-unstranded --num-threads 6 --b2-very-sensitive ~/bowtie2-indices/Bos_Taurus/Bos_Taurus Splice_R1_cut.fastq Splice_R2_cut.fastq
            Could anyone suggest additional/different parameters to increase the amount of successful alignments?

            Thanks for any help!
            Attached Files

            Comment

            • pbluescript
              Senior Member
              • Nov 2009
              • 224

              #7
              Since you have shorter reads, you might need to set --segment-length and --min-anchor-length lower.

              You could also try STAR. It finds spliced alignments by looking for the largest portion of a read that aligns, softclipping bases that don't align. That will avoid you even having to use something like cutadapt.

              Comment

              • Eric Fournier
                Member
                • Jul 2011
                • 21

                #8
                Alright, I'll try those.

                I've tried using STAR, but unfortunately I don't have enough RAM to run it, even in sparse mode. I've started looking into using Amazon Web Services or getting some time on a supercalculator in case I can't get TopHat2 to a satisfactory point.

                Comment

                • Eric Fournier
                  Member
                  • Jul 2011
                  • 21

                  #9
                  I've finally managed to get reasonable results using Tophat2 by quality trimming my reads. Even though the low-quality read ends were accurate enough for BLAT/BLAST to align them properly, they contained just too many errors for Tophat2, even with parameters allowing for high flexibility.

                  Comment

                  • chadn737
                    Senior Member
                    • Jan 2009
                    • 392

                    #10
                    In the past, when having similar issues, I find that quality trimming followed by adapter trimming gave the optimal results. Just as poor quality base pairs at the end of the read affect alignment, it also can affect adapter trimming. I also tried doing it in reverse order as well as doing simultaneously (cutadapt can both quality trim and adapter trim) but found that doing the quality trimming first resulted in the best overall alignment and the most reads aligned.

                    Comment

                    Latest Articles

                    Collapse

                    • GATTACAT
                      Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by GATTACAT
                      Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                      07-01-2026, 11:43 AM
                    • SEQadmin2
                      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by SEQadmin2


                      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                      Here are nine questions we think about, in roughly the order they matter, before...
                      06-18-2026, 07:11 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 07-02-2026, 11:08 AM
                    0 responses
                    7 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-30-2026, 05:37 AM
                    0 responses
                    12 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-26-2026, 11:10 AM
                    0 responses
                    20 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-17-2026, 06:09 AM
                    0 responses
                    54 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...