SEQanswers (
-   Bioinformatics (
-   -   Best strategy/tools to align poor quality reads on distant/degenerate short reference (

denisDDS 04-23-2018 03:28 AM

Best strategy/tools to align poor quality reads on distant/degenerate short reference
Hello everybody,

Here is my problem:
I have +/-50 samples that I sequenced to examine SNP at key positions. I know my key positions.
I have 4 reference sequences (+/-2000bp). Two of them are encoded with IUPAC nucleotide code and 2 of them with "ATGC" code. These references could be relativly far tha the obtained reads
I have high size (1000bp) amplicon paired reads data. As the amplicon size is far larger than my reads, I will work on my reads files separatly.
My reads quality is really poor, the base quality dropping rapidly below 20 around 90 bp (on 250bp reads).

What I did:
I started to aligne my reads direcly with bwa mem on my references, changeing the seeds quality and mismatch scores...
The I wanted to perform snp calling with freebayes. Unfortunatly, I don't think this is a good idea (I don't know the ploidy of my samples).
Then, I discoverd that my reads had a really poor quality, then I decided to had a first step of read cleaning, with trimmomatic. (SE -threads 8 -phred33 -trimlog trim_${readsname}.log ${file} ${OUTDATADIRECTORY}/${readsname}_001.trimmed.fastq.gz \ILLUMINACLIP:./adapters/NexteraPE-PE.fa:2:30:10 LEADING:10 TRAILING:10 SLIDINGWINDOW:4:15 AVGQUAL:30 MINLEN:36)
After talking with some friends, and seeing the reads siez dropping, I decided to use bwa aln and perform the snp calling with sammtools/bcftools. But the alignement parameters are harder to correctly set.

Then, my questions:
- Are my pipeline steps good? must I clean my reads before align them, or the alignement will take quality into account andthis step is useless?
- Which programs must I used for this different steps? I saw some peaople use Mosaik, ssaha, bfast, novoalign, etc... Which one is the best for my particular problem?
- Which snp calling method/program must I used? as I am focusing on specific known position, is a Bayesian haplotype-based usefull or mpilup is enough?
- Do you have any advice for me? (this is my first experiment with this kind of data, before I worked on high quality human data...).

As said a famous people "Help me, Obi-Wan Kenobi. You're my only hope."

Thx in advance.

All times are GMT -8. The time now is 08:43 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.