![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Preprocessing before k-means clustering of expression data | Mchicken | Bioinformatics | 1 | 04-26-2016 01:30 AM |
If I delete one sample do I have to redo DNA methylation data preprocessing | HeidiLee | Bioinformatics | 1 | 05-08-2014 10:28 AM |
Preprocessing needed for RNA-Seq data | PFS | Bioinformatics | 10 | 03-06-2014 09:36 AM |
Test for RNAseq data preprocessing step (with regards to adapter and hexamer) | lynnco2008 | RNA Sequencing | 5 | 10-01-2012 09:02 AM |
RNA seq data preprocessing and cleaning | figo1019 | RNA Sequencing | 0 | 07-17-2012 01:45 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Olomouc Join Date: Apr 2014
Posts: 6
|
![]()
Hi,
do you use a some rule for trimming/removing or saving reads with the worst quality for DNA and RNA assembly/alignment. How do you decide to trimm/remove/save? According the type of analysis, coverage, base quality.... ???? Thanks a lot. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
In general you can do without explicit quality filtering, unless you know that you data contains a significant portion of low quality reads.
If you are aligning to a known reference you can get away with using data that may be as low as Q10. If you are doing de novo work then that would likely be the only case where quality score based filtering is warranted. You would want to filter out data Q25 or lower for that application. |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: Sydney, Australia Join Date: Jun 2011
Posts: 166
|
![]()
It's always best to do adapter trimming, even if the aligner you use later can do soft-clipping, such as STAR. It would work moderately faster if it didn't have to attempt to map the adapter sequences to the genome and then soft-clip the sequences when no alignment was found. You can have a look at the overall quality profile of the reads in a sample using a tool like FastQC. It's worth applying quality filtering, even if very few reads in the dataset are removed by it. There could be a rare circumstance where it makes an important difference.
|
![]() |
![]() |
![]() |
#4 |
Member
Location: Irvine Join Date: Dec 2013
Posts: 10
|
![]()
Its always good to trim the adapters and do quality trimming before running alignment. Quality trimming is usually done at q20 level. Programs like trim_galore can auto detect the adapter sequence that needs to be trimmed based on the input reads.
|
![]() |
![]() |
![]() |
#5 | |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]() Quote:
I generally suggest the range of Q5-Q12 for quality-trimming using HiSeq/MiSeq Illumina data that has the full range of quality scores. Illumina is moving toward binned and inaccurate Q-scores on its latest platforms, though, so the utility of quality-trimming is going to be reduced. |
|
![]() |
![]() |
![]() |
#6 | |
Member
Location: Irvine Join Date: Dec 2013
Posts: 10
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#7 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
The problem is not so much the fraction of data that is discarded, but rather, the bias - Illumina read quality is affected by sequence content, so a high quality-trimming or quality-filtering thresholds can disparately impact certain genomic regions. This is particularly important for quantitative analyses. And regardless of bias, longer reads give more accurate mapping. The confidence of a 250bp alignment, and the ability to place it correctly despite inexact repeats in the genome, is much higher compared to a 150bp read, even if the last 100bp of the 250bp read are only Q17 and thus would be expected to contain 2 mismatches. For variant-calling, q-trimming can be done AFTER alignment to allow the most accurate mapping possible.
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|