![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How do you specify error rate in BBduk adapter trimming? | antifolate | Bioinformatics | 11 | 07-07-2016 03:00 PM |
Odd Ray Error "Assembler panic: no k-mers found in reads." | jazz710 | Bioinformatics | 0 | 01-22-2016 06:58 AM |
Odd results with Sickle FASTQ trimming | id0 | Bioinformatics | 0 | 04-19-2013 09:07 AM |
An odd error message from Tophat | Mark.hz | Bioinformatics | 6 | 01-02-2011 09:34 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: California Join Date: Jan 2013
Posts: 8
|
![]()
Hello all,
I am attempting to process a published Solexa RNAseq dataset, but I am running into some issues due to the base quality encoding. ***This is the code I am using to attempt to trim adapter sequences: $HOME/adapterremoval/bin/AdapterRemoval --qualitybase solexa --file1 $raw_files_path/$input_filename_1 \ --file2 raw_files_path/$input_filename_2 --basename $input_filename --trimns --trimqualities --gzip \ --adapter-list $HOME/RNAseq/adapters_set1.txt ***and this is the error I keep getting. Read 2 adapters / adapter pairs from '/mnt/home/username/RNAseq/adapters-set1.txt'... Trimming paired end reads ... Error reading FASTQ record at line 1; aborting: Phred+64 encoded quality score is less than 0 (ASCII < '@'); Are these FASTQ reads actually in Phred+33 format? If so, use the command-line option "--qualitybase 33" See README for more information. I am not sure what to do, as the software is detecting quality scores that are less than zero (indicating Solexa encoding), but refusing to process the data even though I have specified "--qualitybase solexa" (as recommended in the user manual). Normally I use Trimmomatic for adapter trimming, but I have successfully used AdapterRemoval (https://github.com/MikkelSchubert/ad...terRemoval.pod) in the past on Illumina Hiseq data. Please help!!! Thank You!!! |
![]() |
![]() |
![]() |
#2 | |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,142
|
![]() Quote:
"Solexa" quality encoding of Q+64 has not been used in several years (eons in Next Generation Sequencing time). Hell, nobody even calls it "Solexa" anymore; it is Illumina. Do exactly what the error message suggests (highlighted above in red) and use "--qualitybase 33". |
|
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 6,694
|
![]()
@mike123: If that is truly "solexa" format data of a ripe vintage then you may want to recode it to currently illumina before doing adapter removal.
|
![]() |
![]() |
![]() |
#4 |
Junior Member
Location: California Join Date: Jan 2013
Posts: 8
|
![]()
Thank you both for your suggestions. After evaluating the raw data with FASTQC and actually looking at the *.fastq file entries (which I should have done in the first place...), it appears that the actual encoding is in fact Phred +33, and not Solexa (https://en.wikipedia.org/wiki/FASTQ_format#Encoding)
Lesson learned, yet again, never take summary info from public datasets at face value... |
![]() |
![]() |
![]() |
Thread Tools | |
|
|