I have used Abyss and then Trans-abyss for a de novo transcriptome assembly with ~70 million reads on a machine with 20gb RAM. From what I've read Abyss is less RAM-intensive than other Trinity and SOAPdenovo. And the memory-intensive phase for Abyss and other assemblers is loading the hash table, which depends on the kmer size, not the number of reads. For these reasons I don't think memory is your issue. Your issue about a large number of duplicate reads specific to one pair of reads and not the other sounds like a possible issue--Abyss has issues when coverage is too high, for example. The Abyss support group is probably a good place to turn. https://groups.google.com/forum/?fro...um/abyss-users
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Trinity-duplicate removal
Dear All,
I am using trinity for transcriptomics assembly. I have few queries:-
1) have two condition(Control and Treated) and each condition has 4 replicates. so if I merge these .fq files together, how the generated assembly from this merged .fq file would be better than the assembly generated from single(using only one replicate) sample?
2) Do I need to remove duplicates from individual fastq file before merging or after merging them together?
3) I saw there is a script "fasta_remove_duplicates" in the trinity folder. So is there any chance that "In-silico-normalization" in trinity take care of these duplicate reads?
I would appreciate any explanations.
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...-
Channel: Articles
Yesterday, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
39 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
41 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
35 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
55 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Comment