Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best strategy/tools to align poor quality reads on distant/degenerate short reference

    Hello everybody,

    Here is my problem:
    I have +/-50 samples that I sequenced to examine SNP at key positions. I know my key positions.
    I have 4 reference sequences (+/-2000bp). Two of them are encoded with IUPAC nucleotide code and 2 of them with "ATGC" code. These references could be relativly far tha the obtained reads
    I have high size (1000bp) amplicon paired reads data. As the amplicon size is far larger than my reads, I will work on my reads files separatly.
    My reads quality is really poor, the base quality dropping rapidly below 20 around 90 bp (on 250bp reads).

    What I did:
    I started to aligne my reads direcly with bwa mem on my references, changeing the seeds quality and mismatch scores...
    The I wanted to perform snp calling with freebayes. Unfortunatly, I don't think this is a good idea (I don't know the ploidy of my samples).
    Then, I discoverd that my reads had a really poor quality, then I decided to had a first step of read cleaning, with trimmomatic. (SE -threads 8 -phred33 -trimlog trim_${readsname}.log ${file} ${OUTDATADIRECTORY}/${readsname}_001.trimmed.fastq.gz \ILLUMINACLIP:./adapters/NexteraPE-PE.fa:2:30:10 LEADING:10 TRAILING:10 SLIDINGWINDOW:4:15 AVGQUAL:30 MINLEN:36)
    After talking with some friends, and seeing the reads siez dropping, I decided to use bwa aln and perform the snp calling with sammtools/bcftools. But the alignement parameters are harder to correctly set.

    Then, my questions:
    - Are my pipeline steps good? must I clean my reads before align them, or the alignement will take quality into account andthis step is useless?
    - Which programs must I used for this different steps? I saw some peaople use Mosaik, ssaha, bfast, novoalign, etc... Which one is the best for my particular problem?
    - Which snp calling method/program must I used? as I am focusing on specific known position, is a Bayesian haplotype-based usefull or mpilup is enough?
    - Do you have any advice for me? (this is my first experiment with this kind of data, before I worked on high quality human data...).

    As said a famous people "Help me, Obi-Wan Kenobi. You're my only hope."

    Thx in advance.
    Last edited by denisDDS; 04-23-2018, 03:31 AM.

Latest Articles


  • seqadmin
    Advancing Precision Medicine for Rare Diseases in Children
    by seqadmin

    Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
    12-16-2024, 07:57 AM
  • seqadmin
    Recent Advances in Sequencing Technologies
    by seqadmin

    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

    Long-Read Sequencing
    Long-read sequencing has seen remarkable advancements,...
    12-02-2024, 01:49 PM





Topics Statistics Last Post
Started by seqadmin, 12-17-2024, 10:28 AM
0 responses
Last Post seqadmin  
Started by seqadmin, 12-13-2024, 08:24 AM
0 responses
Last Post seqadmin  
Started by seqadmin, 12-12-2024, 07:41 AM
0 responses
Last Post seqadmin  
Started by seqadmin, 12-11-2024, 07:45 AM
0 responses
Last Post seqadmin  