Hello everybody,
Here is my problem:
I have +/-50 samples that I sequenced to examine SNP at key positions. I know my key positions.
I have 4 reference sequences (+/-2000bp). Two of them are encoded with IUPAC nucleotide code and 2 of them with "ATGC" code. These references could be relativly far tha the obtained reads
I have high size (1000bp) amplicon paired reads data. As the amplicon size is far larger than my reads, I will work on my reads files separatly.
My reads quality is really poor, the base quality dropping rapidly below 20 around 90 bp (on 250bp reads).
What I did:
I started to aligne my reads direcly with bwa mem on my references, changeing the seeds quality and mismatch scores...
The I wanted to perform snp calling with freebayes. Unfortunatly, I don't think this is a good idea (I don't know the ploidy of my samples).
Then, I discoverd that my reads had a really poor quality, then I decided to had a first step of read cleaning, with trimmomatic. (SE -threads 8 -phred33 -trimlog trim_${readsname}.log ${file} ${OUTDATADIRECTORY}/${readsname}_001.trimmed.fastq.gz \ILLUMINACLIP:./adapters/NexteraPE-PE.fa:2:30:10 LEADING:10 TRAILING:10 SLIDINGWINDOW:4:15 AVGQUAL:30 MINLEN:36)
After talking with some friends, and seeing the reads siez dropping, I decided to use bwa aln and perform the snp calling with sammtools/bcftools. But the alignement parameters are harder to correctly set.
Then, my questions:
- Are my pipeline steps good? must I clean my reads before align them, or the alignement will take quality into account andthis step is useless?
- Which programs must I used for this different steps? I saw some peaople use Mosaik, ssaha, bfast, novoalign, etc... Which one is the best for my particular problem?
- Which snp calling method/program must I used? as I am focusing on specific known position, is a Bayesian haplotype-based usefull or mpilup is enough?
- Do you have any advice for me? (this is my first experiment with this kind of data, before I worked on high quality human data...).
As said a famous people "Help me, Obi-Wan Kenobi. You're my only hope."
Thx in advance.
Here is my problem:
I have +/-50 samples that I sequenced to examine SNP at key positions. I know my key positions.
I have 4 reference sequences (+/-2000bp). Two of them are encoded with IUPAC nucleotide code and 2 of them with "ATGC" code. These references could be relativly far tha the obtained reads
I have high size (1000bp) amplicon paired reads data. As the amplicon size is far larger than my reads, I will work on my reads files separatly.
My reads quality is really poor, the base quality dropping rapidly below 20 around 90 bp (on 250bp reads).
What I did:
I started to aligne my reads direcly with bwa mem on my references, changeing the seeds quality and mismatch scores...
The I wanted to perform snp calling with freebayes. Unfortunatly, I don't think this is a good idea (I don't know the ploidy of my samples).
Then, I discoverd that my reads had a really poor quality, then I decided to had a first step of read cleaning, with trimmomatic. (SE -threads 8 -phred33 -trimlog trim_${readsname}.log ${file} ${OUTDATADIRECTORY}/${readsname}_001.trimmed.fastq.gz \ILLUMINACLIP:./adapters/NexteraPE-PE.fa:2:30:10 LEADING:10 TRAILING:10 SLIDINGWINDOW:4:15 AVGQUAL:30 MINLEN:36)
After talking with some friends, and seeing the reads siez dropping, I decided to use bwa aln and perform the snp calling with sammtools/bcftools. But the alignement parameters are harder to correctly set.
Then, my questions:
- Are my pipeline steps good? must I clean my reads before align them, or the alignement will take quality into account andthis step is useless?
- Which programs must I used for this different steps? I saw some peaople use Mosaik, ssaha, bfast, novoalign, etc... Which one is the best for my particular problem?
- Which snp calling method/program must I used? as I am focusing on specific known position, is a Bayesian haplotype-based usefull or mpilup is enough?
- Do you have any advice for me? (this is my first experiment with this kind of data, before I worked on high quality human data...).
As said a famous people "Help me, Obi-Wan Kenobi. You're my only hope."
Thx in advance.