Unconfigured Ad

**SEQnovice** · 11-30-2012, 07:51 AM

My questions have not been answered. Could someone kindly reply to some of them or at least direct me to the proper threads where this may have been discussed? I am new to this field and any feedback would be much appreciated!
Thank you,
SEQNovice

**ECO** · 11-30-2012, 08:27 AM

Originally posted by SEQnovice View Post

My questions have not been answered. Could someone kindly reply to some of them or at least direct me to the proper threads where this may have been discussed? I am new to this field and any feedback would be much appreciated!
Thank you,
SEQNovice

Patience, and searching. Please give your question more than 20 hours before bumping it.

**SEQnovice** · 11-30-2012, 09:06 AM

My apologies, this is my first post here! Thanks for the tip, and if you do have any feedback I would appreciate it though!

**blanco** · 01-17-2013, 03:17 AM

I am also interested to know the answer to some of these questions.

Perhaps to put it more simply: When trimming paired end reads, should the cutadapt command be exactly the same for both forward and reverse reads?

**fkrueger** · 01-17-2013, 03:52 AM

Originally posted by blanco View Post

I am also interested to know the answer to some of these questions.

Perhaps to put it more simply: When trimming paired end reads, should the cutadapt command be exactly the same for both forward and reverse reads?

Using the same command on both reads will most likely cause your paired-end files to go out of sync. We have written a small solution that calls Cutadapt with (what we think) sensible parameters (Trim Galore, available here); in it's default setting , e.g. trim_galore --paired file1.fq file 2.fq, it will trim Illumina adapters from both reads, quality trim reads to a Phred score of 20 and handle paired-end files as you would expect.

**blanco** · 01-17-2013, 04:57 AM

Thanks for your quick reply fkrueger - this looks to be something really useful. I have already asked one question in the appropriate thread: http://seqanswers.com/forums/showthr...ht=trim+galore

**Fernando Seixas** · 01-29-2014, 08:03 AM

Hi all,

Saw the 1st post of this thread and realized that I see exactly the same patterns described in point 1_NB2 - increased 5-mer representation in the first 10 base pairs, and GC fluctuations in those first 10bps as well (although very slight; and the same happens in the per base sequence content). Even after adapter trimming with cutadapt at both 5' and 3' ends and quality trimming (on Trimmomatic) these 'problems' persist. Any ideas of what might be causing this?

Also, and I don't know if this relates with the previous question, the per sequence GC content hasn't an exactly normal distribution - there's a slight bump at the right part of the distribution.

Thanks!

**GenoMax** · 01-29-2014, 08:11 AM

There are several posts here that cover illumina sequencing and FastQC. Search for "fastqc duplication".

If one of the posts does not answer your question then can you post example plots?

**Fernando Seixas** · 01-30-2014, 10:45 AM

Thanks for the reply. But one thing I forgot to mention is the kind of data I have. It's whole genome sequencing data from hiseq2000 machine using Truseq library prep. And if I'm not wrong (my eyes are tired of so much reading xD), all the explanations I found for those behaviours I mentioned above refer to RNA-seq data, at least for the first 10 bp base content instability..

FastQC images of the problematic parameters are attached.

For the kmer analysis I attached both 7-mer and 10-mer analysis. I can see a repetitive pattern of 7bp if I allign the k-mers (CCTGGCTCCTGGCT) so looked for all possible 7bp sequences inside this pattern but still couldn't associate any of these to adapters/primers.

Thanks!

Attached Files

**GenoMax** · 01-30-2014, 05:22 PM

The first two plots look ok. Is this a "GC" rich organism? Looks like there is some kind of duplication of sequences. Are the qualities acceptable across the entire read?

**Fernando Seixas** · 01-31-2014, 04:09 AM

But, is it really normal to have that slight fluctuations in the first 10 bp? Regarding the GC content, this data is from a mammalian genome. But even when I removed pcr duplicate these problems persisted.
And yes, the QS are good in the entire reads.

Another thing I forgot to mention is that this is PE data.

**GenoMax** · 01-31-2014, 04:26 AM

Originally posted by Fernando Seixas View Post

But, is it really normal to have that slight fluctuations in the first 10 bp?

Yes. Here is a "good" sample example report posted on the FastQC site. http://www.bioinformatics.babraham.a...qc_report.html

But even when I removed pcr duplicate these problems persisted.
Another thing I forgot to mention is that this is PE data.

What is the aim of your experiment? Are you trying to do de novo assemblies or is there a closely related genome you can use as a reference?

As Simon (author of FastQC) had mentioned in some past posts here it is difficult for him to set "limits" for various tests in FastQC that are universally applicable. So having a dataset get a "fail" in one or more categories in FastQC does not automatically mean that there is a problem with the sample.

Have you tried doing analysis with the QC'ed data? How do those results look?

**Fernando Seixas** · 01-31-2014, 05:08 AM

Is for denovo assembly. I understand what you said about the limits of the FastQC not being universally applicable but even though I should worry about the GC content and k-mer plot, no?

An no, I'm still stuck in this part because I don't fell confident enough to go to the next steps.

Thanks!

**GenoMax** · 01-31-2014, 06:01 AM

Look at it this way. If there is a problem with the sample/library itself (at this point if the qualities are good then there is likely no technical issue with sequencing) you would not be able to do much short of redoing the experiment over.

Why not press ahead and give the de novo assembly a try. It may fail and you would be out of some compute cycles/time. Since it is a mammalian genome it is probably large(ish) so you are going to have to deal with a number of other computational challenges. Do you have enough sequence (theoretically) with adequate depth (10-15x or more) for the assembly tests?

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 38 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 45 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

Confusion regarding Illumina Adapter Trimming!

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News