SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Help regg. tools for comparing two sets of contigs from de-novo assembly jsreddy82 Bioinformatics 1 03-04-2014 06:13 AM
Raw reads (from NGS) to perform a functional annotation analysis? Hel Bioinformatics 1 01-10-2014 05:08 AM
How to perform a Genome Assembly with Rnnotator? rohitngs Bioinformatics 1 10-18-2012 11:15 PM
De novo assembly: raw data type & volume moinul Bioinformatics 3 04-23-2012 04:17 AM
sff_extract: combining data from 454 Flx and Titanium data sets agroster Bioinformatics 7 01-14-2010 10:19 AM

Reply
 
Thread Tools
Old 02-19-2014, 02:06 PM   #1
Dagga
Member
 
Location: Sydney

Join Date: Feb 2014
Posts: 20
Default Best way to perform assembly with two sets of raw data

Best way to merge data from two separate sequence runs
Hi,

We have performed 2 sequencing runs of a bacterial organism with a genome about 4.4Mb. One was performed a few years ago by BGI and we again sequenced the same organism a few weeks ago. I am trying to determine the best way to go about the assembly.


1) Merge the 4 fastq files and perform the assembly as normal.

2) Map the reads of the second assembly to a fasta file of the first assembly.

3) Align the two individually assembled genomes.

4) Are there any other methods I haven't thought of.

Just a side note - from looking at the raw data - I think the first assembly has less contamination then the second run (these organisms are prone to contamination with heterotrophic bacteria.
Dagga is offline   Reply With Quote
Old 02-19-2014, 02:14 PM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,049
Default

If the first data set was done a few years ago you would want to check what format the quality values are in. They may be in "illumina" format and would need to be converted to "sanger" quality if you are going to do any combined analysis. (Ref: http://en.wikipedia.org/wiki/FASTQ_format#Encoding)

Depending on the amount of data available for each of those runs (and time you can spend) you could do all mentioned options in parallel. With a bacterial genome it should not be very time consuming affair.
GenoMax is offline   Reply With Quote
Old 02-20-2014, 08:36 AM   #3
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

Depending on what type of data each is in terms of SE/PE or 50bp/100bp, etc, it maybe worth it to just completely ignore the old data. Though you don’t mention it, I’d bet you have absurd coverage, and it is possible to “over assemble” a genome with ridiculous coverage.

Like GenoMax said, you might as well do all of them with a bacterial genome, but without more specifics its hard to recommend which route is likely to be better.
Wallysb01 is offline   Reply With Quote
Old 03-06-2014, 04:56 PM   #4
Dagga
Member
 
Location: Sydney

Join Date: Feb 2014
Posts: 20
Default

Hi All,

The I have assembled both sets of data individually and it seems the data for the second run is not as good as there is some contamination. So I have a further question.

If I was going to use my first assemble as a reference, how do I map the reads of the second run to this?

Also - if I use my first run as a reference, can the contigs be lengthed using the read reads or will my contigs be limited in size to the reference and can no longer be expanded, even with new reads.

Cheers
Dagga is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:54 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO