Seqanswers Leaderboard Ad

**GenoMax** · 09-22-2017, 07:00 AM

#2 must be process controls from TrueSeq kit (you can find the sequences in the Illumina sequence letter for scanning/trimming purposes). For #3 it could very well be sequence from the genome you are interested in so you shouldn't just throw it out. Use BBDuk from BBMap suite to scan and trim your data and then try running FastQC again to see how the data looks.

**svitlana** · 09-22-2017, 07:04 AM

Thank you for your response!

The strange thing with this sequence (sorry I forgot to mention it), is that it is always situated at the beginning of the reads. And I still have it even after trimming the data with BBDuk.

**GenoMax** · 09-22-2017, 07:13 AM

Do you mean to say that sequence in #3 is present at the beginning of all reads? That would certainly be very odd.

**svitlana** · 09-22-2017, 07:24 AM

No, only a certain percentage of reads contain this sequence (I think less than 1%, but I don't have the estimation yet), but for all those reads this sequence is situated at the beginning.

Actually, I have the same problem as described here. I found the explanation for all other sequences detected by FastQC (which correspond to Illumina Process Controls and which are documented on Illumina website), but I have no idea of the origin of this remaining sequence.

**GenoMax** · 09-22-2017, 07:29 AM

If you take out that sequence does the rest of the read blast to the genome of the expected species (or a close relative)? You could either drop those reads all together (since they are only 1%) or choose to trim that sequence out (with bbduk's literal= option).

**svitlana** · 09-22-2017, 10:09 AM

Thank you GenoMax for your suggestion, I just tried to blast these reads and approximately one third of them blast to... the common carp genome! But I am working on an insect (and as far as I know there is no assembly available of species close to mine). How could it be explained?

I also tried to blast the remaining (normal) reads and none of them matched to that genome.

And why is that sequence always situated at the beginning of these reads? (well, I just found 16 reads having it in the middle, but all the others 52150 have it at the beginning)

In any case, I suppose that I should remove all these reads from my assembly.

**GenoMax** · 09-22-2017, 10:20 AM

It may be best to remove them altogether. Hopefully you don't have a bigger contamination problem. Take a few of other "normal" reads and confirm them by blast before you dive in to the assembly.

**jdk787** · 09-22-2017, 05:22 PM

Looks like the reverse complement of #3 (GCGGCCGCGATATCCTGCAGATGCATCCAGTACTAGTATGGCCC) matches the last 55 base of TruSeq process controls CTA-150bp, CTA-450bp, CTA-550bp, and CTA-850bp

**svitlana** · 09-26-2017, 02:12 AM

Originally posted by jdk787 View Post

Looks like the reverse complement of #3 (GCGGCCGCGATATCCTGCAGATGCATCCAGTACTAGTATGGCCC) matches the last 55 base of TruSeq process controls CTA-150bp, CTA-450bp, CTA-550bp, and CTA-850bp

You are right jdk787, thank you very much!

**Brian Bushnell** · 09-27-2017, 08:58 AM

Incidentally, that sequence also occurs in:

CTA___650bp, CTA___350bp, CTA___250bp, CTA___750bp

These are all distributed with BBMap in /bbmap/resources/sequencing_artifacts.fa.gz. Their names were anonymized, though, as required by Illumina before I could distribute them publicly. Typically before you do things like assembly I suggest you perform adapter-trimming and synthetic artifact removal, e.g.

Code:

bbduk.sh in=in.fq.gz out=trimmed.fq.gz ktrim=r k=23 mink=11 hdist=1 tbo tpe minlen=70 ref=adapters ftm=5
bbduk.sh in=trimmed.fq.gz out=filtered.fq.gz k=31 ref=artifacts,phix ordered cardinality

The current versions of BBMap allow you to specify "ref=artifacts", for example, and it will automatically use /bbmap/resources/sequencing_artifacts.fa.gz. The full suggested pipeline is in /bbmap/pipelines/assemblyPipeline.sh but some of the specific steps may be more relevant for bacteria than insect assembly.

**svitlana** · 09-28-2017, 02:03 AM

Thank you Brian for your suggestion!

I only performed the first step (adapter trimming), I wasn't aware that bbduk was able to filter synthetic artifacts as well. I'll take a look at the suggested pipeline. Thank you again!

Topics	Statistics	Last Post
Genomics-Driven Care in Neurodevelopmental Disorders Shows Promising Results by seqadmin Started by seqadmin, 01-09-2025, 04:04 PM	0 responses 434 views 0 likes	Last Post by seqadmin 01-09-2025, 04:04 PM
Study Questions Accuracy of Genetic Testing for Opioid Use Disorder Risk by seqadmin Started by seqadmin, 01-09-2025, 09:42 AM	0 responses 441 views 0 likes	Last Post by seqadmin 01-09-2025, 09:42 AM
New Algorithm Brings Precision and Scalability to Single-Cell RNA Analysis by seqadmin Started by seqadmin, 01-08-2025, 03:17 PM	0 responses 458 views 0 likes	Last Post by seqadmin 01-08-2025, 03:17 PM
Nanopores as Precision Diagnostic Tools in Molecular Biology by seqadmin Started by seqadmin, 01-03-2025, 11:18 AM	1 response 50 views 1 like	Last Post by Tonia 01-05-2025, 12:15 PM

Seqanswers Leaderboard Ad

Announcement

Illumina process controls present in input data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News