SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Best way to perform assembly with two sets of raw data (http://seqanswers.com/forums/showthread.php?t=41060)

Dagga 02-19-2014 02:06 PM

Best way to perform assembly with two sets of raw data
 
Best way to merge data from two separate sequence runs
Hi,

We have performed 2 sequencing runs of a bacterial organism with a genome about 4.4Mb. One was performed a few years ago by BGI and we again sequenced the same organism a few weeks ago. I am trying to determine the best way to go about the assembly.


1) Merge the 4 fastq files and perform the assembly as normal.

2) Map the reads of the second assembly to a fasta file of the first assembly.

3) Align the two individually assembled genomes.

4) Are there any other methods I haven't thought of.

Just a side note - from looking at the raw data - I think the first assembly has less contamination then the second run (these organisms are prone to contamination with heterotrophic bacteria.

GenoMax 02-19-2014 02:14 PM

If the first data set was done a few years ago you would want to check what format the quality values are in. They may be in "illumina" format and would need to be converted to "sanger" quality if you are going to do any combined analysis. (Ref: http://en.wikipedia.org/wiki/FASTQ_format#Encoding)

Depending on the amount of data available for each of those runs (and time you can spend) you could do all mentioned options in parallel. With a bacterial genome it should not be very time consuming affair.

Wallysb01 02-20-2014 08:36 AM

Depending on what type of data each is in terms of SE/PE or 50bp/100bp, etc, it maybe worth it to just completely ignore the old data. Though you don’t mention it, I’d bet you have absurd coverage, and it is possible to “over assemble” a genome with ridiculous coverage.

Like GenoMax said, you might as well do all of them with a bacterial genome, but without more specifics its hard to recommend which route is likely to be better.

Dagga 03-06-2014 04:56 PM

Hi All,

The I have assembled both sets of data individually and it seems the data for the second run is not as good as there is some contamination. So I have a further question.

If I was going to use my first assemble as a reference, how do I map the reads of the second run to this?

Also - if I use my first run as a reference, can the contigs be lengthed using the read reads or will my contigs be limited in size to the reference and can no longer be expanded, even with new reads.

Cheers


All times are GMT -8. The time now is 12:31 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.