![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
RNA-Seq: Comparative Analysis of RNA-Seq Alignment Algorithms and the RNA-Seq Unified | Newsbot! | Literature Watch | 3 | 07-31-2011 08:08 PM |
ChIP-Seq: ZINBA integrates local covariates with DNA-seq data to identify broad and n | Newsbot! | Literature Watch | 0 | 07-27-2011 04:30 AM |
gap alignment and local alignment? | mingkunli | Illumina/Solexa | 3 | 02-19-2009 12:13 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Quebec, Canada Join Date: Jul 2011
Posts: 21
|
![]()
Greetings,
I'm trying to analyze the results of Illumina RNA sequencing (~5x150M 100bp PE reads). One problem that we are facing is that for a very large number of our reads, only the first ~50bp are of actual biological material, with the rest consisting of Illumina primers. Would anyone who has faced a similar problem care to suggest an alignment program/parameters to analyze this kind of data? I've tried using bowtie2, but I either get terrible alignment rates using --end-to-end, or I am unable to get any splice junctions using --local. Thank you very much, -Eric Fournier |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Boston Join Date: Nov 2009
Posts: 224
|
![]()
You can use something like Trimmomatic to trim off the adapter sequences first.
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Quebec, Canada Join Date: Jul 2011
Posts: 21
|
![]()
I've now spent quite a fair amount of time trying to clean up my sequences with Trimmomatic, but I've been unable to find a set of parameters that gets rid of most of the Illumina adapter sequences.
I've attached an example set of 5 sequences that contain Illumina adapters, as well as the adapter file I've been using (Sequences obtained from UniVec). When running the R1 sequences through VecScreen, it is quite obvious that the adapters are present: Code:
Query 53 AGATCGGAAGAGCGGCTCAGCAGGAATGTCGTGACCGATCTCGT 96 ||||||||||||||| |||||||||||| || |||||||||||| Sbjct 61 AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGT 18 Query 52 AGATCGGAAGAGCGGTTCAGCA 73 |||||||||||||||||||||| Sbjct 61 AGATCGGAAGAGCGGTTCAGCA 40 Query 48 AGATCGGAAGAGCGGTTCAGCAGGAATGACGAGACCGATCTCGTATGCC 96 |||||||||||||||||||||||||||| |||||||||||||||||||| Sbjct 61 AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCC 13 Query 52 AGATCGGAAGAGCGGCTCAGCAGGTATGTCGAGACCGATCTCG 94 ||||||||||||||| |||||||| ||| |||||||||||||| Sbjct 61 AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCG 19 Query 45 AGATCGGAAGAGCGGCTCAGCAGGTATGCCGAGAGCGATCTCGTATG 91 ||||||||||||||| |||||||| ||||||||| |||||||||||| Sbjct 61 AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATG 15 Last edited by Eric Fournier; 01-14-2013 at 12:40 PM. Reason: Fixed in-line alignment spacing |
![]() |
![]() |
![]() |
#4 |
Member
Location: US Join Date: Sep 2010
Posts: 14
|
![]()
I have been using trimmomatic for exactly the same situation and I have been happy with the results. Here is the command I am using:
Code:
java -classpath <path_trimmomatic> org.usadellab.trimmomatic.TrimmomaticPE -phred64 file1.fq file2.fq p1.fastq u1.fastq p2.fastq u2.fastq ILLUMINACLIP:./adapter.fasta:2:30:12 SLIDINGWINDOW:4:20 LEADING:10 TRAILING:10 Code:
>Prefix/1 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT >Prefix/2 CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: US Join Date: Jan 2009
Posts: 392
|
![]()
An alternative to trimmomatic would be cutadapt.
|
![]() |
![]() |
![]() |
#6 |
Member
Location: Quebec, Canada Join Date: Jul 2011
Posts: 21
|
![]()
I've moved on from using Trimmomatic to cutadapt, and I've been able to clean up my sequences pretty well. However, I'm running into a new snag: I just can't seem to reliably align spliced reads.
I've built a test subset of spliced reads (attached), which all align to my reference genome (Bos taurus, UMD3.1) when I use Blast or BLAT. I believe that bowtie2 cannot align spliced reads, so I've moved on to TopHat2. However, I've been unable to find a set of parameters which aligns at least a majority of the spliced reads. My best result so far only aligns 6 of the 18. I am using the following command line: Code:
./tophat-2.0.6.Linux_x86_64/tophat2 -N 6 --read-gap-length 5 --read-edit-dist 10 --splice-mismatches 2 --library-type fr-unstranded --num-threads 6 --b2-very-sensitive ~/bowtie2-indices/Bos_Taurus/Bos_Taurus Splice_R1_cut.fastq Splice_R2_cut.fastq Thanks for any help! |
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: Boston Join Date: Nov 2009
Posts: 224
|
![]()
Since you have shorter reads, you might need to set --segment-length and --min-anchor-length lower.
You could also try STAR. It finds spliced alignments by looking for the largest portion of a read that aligns, softclipping bases that don't align. That will avoid you even having to use something like cutadapt. |
![]() |
![]() |
![]() |
#8 |
Member
Location: Quebec, Canada Join Date: Jul 2011
Posts: 21
|
![]()
Alright, I'll try those.
I've tried using STAR, but unfortunately I don't have enough RAM to run it, even in sparse mode. I've started looking into using Amazon Web Services or getting some time on a supercalculator in case I can't get TopHat2 to a satisfactory point. |
![]() |
![]() |
![]() |
#9 |
Member
Location: Quebec, Canada Join Date: Jul 2011
Posts: 21
|
![]()
I've finally managed to get reasonable results using Tophat2 by quality trimming my reads. Even though the low-quality read ends were accurate enough for BLAT/BLAST to align them properly, they contained just too many errors for Tophat2, even with parameters allowing for high flexibility.
|
![]() |
![]() |
![]() |
#10 |
Senior Member
Location: US Join Date: Jan 2009
Posts: 392
|
![]()
In the past, when having similar issues, I find that quality trimming followed by adapter trimming gave the optimal results. Just as poor quality base pairs at the end of the read affect alignment, it also can affect adapter trimming. I also tried doing it in reverse order as well as doing simultaneously (cutadapt can both quality trim and adapter trim) but found that doing the quality trimming first resulted in the best overall alignment and the most reads aligned.
|
![]() |
![]() |
![]() |
Tags |
illumina, low quality, rna-seq |
Thread Tools | |
|
|