Seqanswers Leaderboard Ad

**kmcarr** · 01-16-2017, 11:30 AM

Originally posted by mike123 View Post

Hello all,

I am attempting to process a published Solexa RNAseq dataset, but I am running into some issues due to the base quality encoding.

***This is the code I am using to attempt to trim adapter sequences:

$HOME/adapterremoval/bin/AdapterRemoval --qualitybase solexa --file1 $raw_files_path/$input_filename_1 \
--file2 raw_files_path/$input_filename_2 --basename $input_filename --trimns --trimqualities --gzip \
--adapter-list $HOME/RNAseq/adapters_set1.txt

***and this is the error I keep getting.

Read 2 adapters / adapter pairs from '/mnt/home/username/RNAseq/adapters-set1.txt'...
Trimming paired end reads ...
Error reading FASTQ record at line 1; aborting:
Phred+64 encoded quality score is less than 0 (ASCII < '@');
Are these FASTQ reads actually in Phred+33 format? If so,
use the command-line option "--qualitybase 33"

See README for more information.

I am not sure what to do, as the software is detecting quality scores that are less than zero (indicating Solexa encoding), but refusing to process the data even though I have specified "--qualitybase solexa" (as recommended in the user manual).

Normally I use Trimmomatic for adapter trimming, but I have successfully used AdapterRemoval (https://github.com/MikkelSchubert/ad...terRemoval.pod) in the past on Illumina Hiseq data.

Please help!!!

Thank You!!!

Mike,

"Solexa" quality encoding of Q+64 has not been used in several years (eons in Next Generation Sequencing time). Hell, nobody even calls it "Solexa" anymore; it is Illumina. Do exactly what the error message suggests (highlighted above in red) and use "--qualitybase 33".

**GenoMax** · 01-16-2017, 11:34 AM

@mike123: If that is truly "solexa" format data of a ripe vintage then you may want to recode it to currently illumina before doing adapter removal.

**mike123** · 01-17-2017, 09:10 AM

issue resolved - thanks

Thank you both for your suggestions. After evaluating the raw data with FASTQC and actually looking at the *.fastq file entries (which I should have done in the first place...), it appears that the actual encoding is in fact Phred +33, and not Solexa (https://en.wikipedia.org/wiki/FASTQ_format#Encoding)

Lesson learned, yet again, never take summary info from public datasets at face value...

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Odd error while trimming Solexa data with AdapterRemoval???

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News