Seqanswers Leaderboard Ad

**pengchy** · 03-27-2013, 04:30 PM

It seems Trinity's In silico Read Normalization hasn't been publised.

**pengchy** · 03-27-2013, 04:46 PM

the following links may be helpful:

DigiNorm on Paired-end samples

DigiNorm on Paired-end samples - SEQanswers

http://seqanswers.com/forums/showthread.php?t=23612

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

What is digital normalization, anyway?

http://ivory.idyll.org/blog/what-is-diginorm.html

I'm out at a Cloud Computing for the Human Microbiome Workshop and I've been trying to convince people of the importance of digital normalization....

Digital normalization of short-read shotgun data

http://ivory.idyll.org/blog/diginorm-paper-posted.html

We just posted a pre-submission paper to arXiv.org: A single pass approach to reducing sampling variation, removing errors, and scaling de novo...

Basic Digital Normalization

https://wiki.hpcc.msu.edu/display/Bioinfo/Basic+Digital+Normalization

A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data

http://arxiv.org/abs/1203.4802

Deep shotgun sequencing and analysis of genomes, transcriptomes, amplified single-cell genomes, and metagenomes has enabled investigation of a wide range of organisms and ecosystems. However, sampling variation in short-read data sets and high sequencing error rates of modern sequencers present many new computational challenges in data interpretation. These challenges have led to the development of new classes of mapping tools and {\em de novo} assemblers. These algorithms are challenged by the continued improvement in sequencing throughput. We here describe digital normalization, a single-pass computational algorithm that systematizes coverage in shotgun sequencing data sets, thereby decreasing sampling variation, discarding redundant data, and removing the majority of errors. Digital normalization substantially reduces the size of shotgun data sets and decreases the memory and time requirements for {\em de novo} sequence assembly, all without significantly impacting content of the generated contigs. We apply digital normalization to the assembly of microbial genomic data, amplified single-cell genomic data, and transcriptomic data. Our implementation is freely available for use and modification.

What does Trinity's In Silico normalization do?

http://ivory.idyll.org/blog/trinity-in-silico-normalize.html

This post can be referenced and cited at the following DOI: http://dx.doi.org/10.6084/m9.figshare.98198. For a few months, the Trinity list was...

**AmyEllison** · 03-28-2013, 04:46 AM

Thanks pengchy!

I have read all already and was wondering if anyone had any experiences with their own data?

It has also been suggested to me to take all the reads from 1 individual (all tissue types) and assemble as there may be too much ambiguity with using multiple individuals.

The samples are clutch mates (frogs), but not inbred lines so there will be some variability between them and heterozygosity. But the question is with that option - which individual? Control or treated?

Any thoughts from you knowledgeable lot on seqanswers much appreciated!!

**pengchy** · 03-28-2013, 04:56 AM

Hi Amy,

I am preparing to do the work and glad to exchange the experience with you here when i finish the test.

best,
pch

**AmyEllison** · 03-28-2013, 05:05 AM

Great thanks!

I am running Trinity's method at the moment (would have liked to use Titus's more efficient version of Trinity's method but waiting for that to be installed) - it's been running for 2 days now.

I gave it 100GB RAM and 10 CPUs - which seems to have been OK for jellyfish, reading kmer occurences. Its been writing the .stats file now for a looooong time but it's not maxing out the memory and only using 1 cpu.

**westerman** · 03-28-2013, 11:56 AM

I find that Trinity's normalization takes a long time to run. Almost defeats the purpose of normalization in the first place. Days of run time -- yeap. We need to fix this some day.

**AmyEllison** · 04-01-2013, 11:30 AM

If anyone is interested:

Trinity normalization on ~854 million reads took about 2 days on a high-memory machine (gave it 300GB memory and 40 cores)

Got it down from 854 to just 66 million reads!

**2seq** · 04-02-2013, 12:39 AM

That's great that you got the number of reads reduced, but how is that reduction expected to improve performance on Trinity? Will it cut the time down considerably (enough to justify normalization)?

Best regards and great to find others working on similar projects!

**AmyEllison** · 04-02-2013, 04:41 AM

Well that's the part I'd like other's experiences! This is my first de novo assembly.

To assemble the normalized reads (using same number of cores, memory etc) took less than a day. I'm running the full 854 million now to see how the assemblies compare - it's been going 2 days already.

It was also suggested I try assembling using all tissues from 1 individual (but to be careful for further analysis as this individual will map better to the assembly) as variability between individuals could create ambiguity in assembly.

I tried this: all normalized N50 = 1596, single individual N50 = 2029. Bowtie mapping back to assembly normalized = 80.65%, individual = 79.02%. I'm currently blasting to see which annotates better.

Does anyone have any other thoughts on how to test which is the "best" assembly?

**2seq** · 04-02-2013, 06:08 AM

I'm sorry I don't have any answers for you. I've got ~270 million reads so I'm not doing the normalization step for this run, but I will continue to watch this thread to see how your experiment comes out in the end. I'll be posting a question about installation of trinity with regard to jellyfish...feel free to take a peak, maybe its something you encountered?

**AmyEllison** · 04-04-2013, 06:29 AM

In the end the assembly of the full set of reads took only about 3 days - so 2 days normalizing and 1 day assembly amounts to little or no saving on time.

The full read assembly only gave rise to marginally more contigs (~455000 vs ~447000 from normalized reads) and a lower N50 (1227 vs 1596).

I think Titus Brown's version of Trinity's method (which unfortunately I could not get installed on our machines here yet) probably does make normalizing worth it for my kind of sized read set.

**2seq** · 04-04-2013, 10:21 PM

fyi for those with access to more computing power - I used 24 cores and 119G on my 270 million reads without normalization and finished in 1 day.
But it may have also gone a bit faster b/c I ran it with the --min_kmer_cov 2 parameter

**eppi** · 04-11-2013, 06:19 AM

map back to the assembly

Hello everybody, interesting discussion.
Here we used Trinity on 10 samples, 5 tissues from 2 animals, sick and not sick. Total reads were >600M and on a 'big machine', sorry not sure of RAM and cores, it took <3 days.

The problem is that mapping the reads back to the contigs as suggested will map only 30% back!! Any clue? Does anyone else have this problem? Is this an issue or it is normal due to tissue diversity?

Thanks for your help!
eppi

**AmyEllison** · 04-11-2013, 06:33 AM

Hi eppi,

I'm afraid I don't have that problem but out of interest, how many transcripts and components did you get?

I have produced ~447,000 transcripts (~350,000 component) - this seems far too many. I'm worried its from pooling tissues and individuals together for assembly? Anyone else got such a large number?? Any suggestions on how to reduce redundancy?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 29 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

In Silico read normalization prior to de novo assembly

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News