Hi all. We've been using 454 data previously and are still relatively new to ngs analysis. Our current dataset is in the form of HiSeq fastq files, with each sample having 4 reads of about 40 million ~40bp sequences per read. These are environmental samples of complex/diverse communities. What software can I use to detect & remove the junk sequences, and assemble the genomes left in the remainder? I appreciate any advice. Thanks in advance.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
What insert size are the libraries? Are these paired ends? Why did you stop with such short reads? What sort of hardware do you have available?
When you say "4 reads", do you mean "4 runs" -- you'll confuse less if you restrict the term 'read' to a single string of sequence information off the instrument.
Most assemblers should be able to do something with this, but it is a pretty small dataset for a complex community (40M reads of 40Bp is only 1.6Gb of data -- 3.2 if this is paired-end, so each such dataset is (for example). velvet is very popular; I'm a heavy user of Ray (particularly handy if you have access to a cluster).
-
Thanks for the reply krobinson. These are single-end reads and short (50bp) to allow increased 'depth' while keeping down costs.
I have a few hardware options for analysis: 1) a single PC (Ubuntu, bio-linux) with 8 cores (intel i7, 3.2 GHz) and 24 GB RAM. 2) a few small remote clusters (8-24 cores) 3) some large remote clusters that I have not used before (500 - 3500 cores, 1-2 GB / core.)
Thanks for the correction: each sample was processed and run 4 times, with ~40million reads per run. And the reads are all 50bp. The library insert size I think would be 300 - 500 bp; I know it's the TruSeq DNA Library protocol, and from my quick search on TruSeq DNA prep kits, that's the insert size range.
I'll take a look at Ray and Velvet (or is it Metavelvet?) I've also come across SOAPdenovo. Any tips on what to watch out for when using Ray? Or the others? Thanks for the help
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 08:47 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Yesterday, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
54 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
Comment