I'm just curious as how much reduction others observe during pre-processing of ngs data.
I've got 454 rna-seq data of about 300,000 reads.
Based on the fastqc report and using fastx toolkit,
- removing reads with bases below quality score of 20
- removing reads containing Ns
-removing ribosomal rna sequences (167 using bwa)
-removing reads below 100bp length
the dataset reduced to 185,735. I felt like this is too small. Is such a reduction common?
I could retain the reads less than 100bp in length , align them separately and align the entire dataset again.
I appreciate any advice and/or insight. Thank you.
I've got 454 rna-seq data of about 300,000 reads.
Based on the fastqc report and using fastx toolkit,
- removing reads with bases below quality score of 20
- removing reads containing Ns
-removing ribosomal rna sequences (167 using bwa)
-removing reads below 100bp length
the dataset reduced to 185,735. I felt like this is too small. Is such a reduction common?
I could retain the reads less than 100bp in length , align them separately and align the entire dataset again.
I appreciate any advice and/or insight. Thank you.
Comment