Seqanswers Leaderboard Ad

**choijae3** · 01-07-2014, 02:45 PM

nvm so I've found useful links to solve my problem from here and here

however is my approach makes sense in that decreasing the library size be a valid approach to see if more pooling would be beneficial?

sorry if I'm derailing the post...

**dpryan** · 01-07-2014, 03:00 PM

FYI, I assume you mean "multiplexing" rather than "pooling". While there is pooling in both cases, the former is probably a more exact description of what you're doing (I assume you're looking for sequence differences between strains or something like that, so being able to separate reads by strain would be useful).

Regarding your strategy, it's often termed "saturation analysis" or "making a saturation/rarefaction curve" or various permutations thereof. It's a very good thing to do and I've seen a few papers (mostly RNAseq) specifically doing that to estimate maximal statistical power. 70x is overkill for a lot of common things, so I wouldn't be surprised if you can get away with throwing more samples on there.

**choijae3** · 01-07-2014, 03:22 PM

Yes I should have ment multiplexing instead of pooling. I'm conducting a population genomic type project and trying to sequence as much populations without sacrificing coverage too much.

Thanks dpryan!

**barkasn** · 01-08-2014, 08:14 AM

Hi choijae3,

70X coverage is very high and an overkill for most applications so in general you are better off sequencing more samples as opposed to sequencing the same thing over and over again.

With respect to the duplication rate, I would recommend you do no trust FastQC. FastQC estimates duplication rate by looking at the first and second reads independently. Given your high coverage it is very likely that you will get 1st reads starting at the exact same spot. I would recommend you use Picard MarkDuplicates.jar to estimate the duplication rate after alignment as this takes into account both the first and second reads of each pair.

**choijae3** · 01-08-2014, 08:27 AM

Hi barkasn

thanks for the reply! I've been following best practice from broad institute and have done the mark duplicate steps. I haven't paid much attention to it (I really should have) and found that to be more helpful. Thanks again for the advice!

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 33 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 48 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 34 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 46 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

How to randomly remove portions of the raw reads from the FASTQ file

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News