|Thread||Thread Starter||Forum||Replies||Last Post|
|interprete and filter repeatmodeler output||balaena||Bioinformatics||0||05-20-2015 08:27 AM|
|Gviz problem---too many stacks to draw||angel-sakura||Bioinformatics||1||01-04-2015 01:20 AM|
|Stacks exce_velvetg not present||bio_jit||Bioinformatics||0||07-02-2014 09:43 PM|
|Stacks process_radtags N in overhang||CGO||Bioinformatics||0||06-18-2014 06:17 AM|
|Read stacks in RNA-seq||jwaage||Illumina/Solexa||18||09-23-2009 07:16 AM|
|02-07-2017, 06:48 AM||#1|
Location: Montpellier (France)
Join Date: May 2008
How to interprete Stacks partial results?
I'm posting here but I'm not sure I'm in the correct section of the forum. If so, I apologize in advance.
So, we are performing RADseq experiments.
DNA comes from differents organisms and is extracted (by the users of our facility) with various methods with various results, of course. Most of the time, integrity of the DNA is not checked and purity is "so so" (pigments or precipitates are not rare).
We build the librairies according to Baird and al. protocol with minor modifications in it (AMPure XP purification, QubiT quantification after the first ligation, Pippin HT sizing).
We use SR100nt sequencing mode (usually, a rapid run on an Hiseq2500).
We don't perform the analysis in house but just use Stacks to perform demultiplexing.
Usually, we end up with 75 to 89% of the sequences that include index and
enzyme cutting site which seems ok for us and our users.
But from time to time, results are not as good as that.
For example, in one of our experiments, we generated 171.000.000 sequences and, after using Stacks' process_radtags module (no mismatch allowed), we ended up with:
- 58.71% of correct sequences.
- 38% of "amiguous barcode" sequences.
- 3% of "ambiguous RADtag" sequences.
We also performed demultiplexing allowing 1 mismatch and it did not improve the results a lot (63% of correct sequences).
It was more or less expected as we only found a few sur-represented indexes in our "ambiguous barcode" sequences and most of them can be explained by a small drop in sequence quality on cycle 3.
But it leaves us with 30.000.000 sequences with an ambiguous barcode that we can not explain.
Does some of you know what are the causes for:
- High percentage (15 to 40%) of "ambiguous barcode"?
- High percentage (10 to 30%) of "ambiguous RADtag"?
Is it due to DNA quality/integrity? To an issue with our adaptors?
Is it possible to perform a diagnosis of the DNA generation/library construction just by looking at these partial Stacks results?
Thanks in advance for your answer.