![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
human contamination for mouse RNAseq samples | cassiemin | RNA Sequencing | 2 | 06-21-2018 02:23 PM |
index contamination in multiplexed run | btb | Illumina/Solexa | 19 | 12-03-2014 05:35 PM |
truseq DNA LS library preparation --- index adaptor issue | weihuameng | Sample Prep / Library Generation | 2 | 09-03-2014 10:19 PM |
Contamination between samples run on the same lane | MLog | Illumina/Solexa | 4 | 12-08-2013 05:42 PM |
GATK Unknown index type value808519985 | quoclinh | Bioinformatics | 1 | 05-04-2011 11:30 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Vienna Join Date: Jan 2018
Posts: 2
|
![]()
Hello everyone,
I have encountered a rather strange contamination in our DNA libraries and would be thankful for help in finding the source, the proper way to trim it and advice on how to avoid it in the future. Background: Our plant sample DNA are prepared in-house following a modified KAPA protocol with P5-P7 indexes (that we mix ourselves; corresponsing to Illumina adapters) and AMPure beads. We use dual-indexing attached in a PCR step (rather short with 8 cycles). Our indexes are 8bp long corresponding to TruSeq. Prior to pooling all samples are size-selected to remove fragments <150bp, hence supposedly removing all traces of adapter dimer. Samples were pooled and sent to sequencing center where they were sequenced on a HiSeqV4 PE125. A spike-in from another user was present. According to the fastqc report, on read1 the full library seems to include about 6 million reads of an unknown source, not similar to any index that we used, not to the index reported from the other user nor to published DNA sequences from Illumina, but it does 'look like' an artifact: ACCTTATTCACGCCTAAAAAGTAGACTGACTGTGGGGTGGTCGTGTTTTT It doesn't blast to any known plant sequences (seems to blast to some human sequence but with a low match, could be adapter someone else left in...). No contamination is present on read2. See attached fastqc screenshot. The truly inexplicable part is that after successfully demultiplexing (using deML) those 'contaminant' reads are present in all separate sample files in similar numbers ![]() In addition, trimming those reads using Trimmomatic gave bad results - I added the contaminant sequence to Trimmomatic's adapter file and lost almost 40% of the reads for many samples. Any hint to direct me in bioinformatic forensics would be very much welcome. Last edited by gyardeni; 07-13-2018 at 08:45 AM. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Purdue University, West Lafayette, Indiana Join Date: Aug 2008
Posts: 2,317
|
![]()
No idea. Maybe take a look at them untrimmed to see what adapter type is carrying the insert.
-- Phillip |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: Vienna Join Date: Jan 2018
Posts: 2
|
![]()
Hi Phillip,
Thanks for the reply. Since posting I actually took a look at the demultiplexing method, since it seemed like strange sequences ended up in the demultiplexed files. It turned out to be the behavior of samtools - when using the -r command to separate by RG field all the lines that don't have that field end up in the file, too. I opened an issue and it seems that're planning to fix it (follow here: https://github.com/samtools/samtools/issues/896). |
![]() |
![]() |
![]() |
Thread Tools | |
|
|