![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
need some help to understand the VCF file | seraphin | Bioinformatics | 0 | 06-05-2013 03:44 PM |
To understand Punnett Squares | ardmore | General | 2 | 08-31-2011 02:03 PM |
How to understand the output of mpileup like this | skblazer | Bioinformatics | 0 | 12-05-2010 11:43 AM |
Help me understand MAQ indexing | pieffe | Bioinformatics | 0 | 06-01-2009 08:09 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: sub-surface moon base Join Date: Apr 2013
Posts: 372
|
![]()
Hello,
So, I've been working with some assemblies. Recently, I submitted our paired-end reads into MG-RAST, and discovered that only a small percentage (~5) of them were merged when FastqJoin -m 8 -p 10. Now, I managed to get my hands on some actual QC data: ![]() Our metagenomic samples are from deep underground, so low DNA yields are not super surprising. However, I learned that prior to sending, our samples were amplified by MDA, so Question 1: Perhaps the low amount of DNA is somewhat surprising? For some reason Macrogen ended up making TruSeq library from one sample, and Nextera libraries from the other samples: ![]() To me, Sample 1 looks pretty good, I'd expect that a large percentage of the reads could be merged later on. However, this was not the case. In fact, I think this sample had the smallest percentage of merged reads in MG-RAST. Question 2: Is there any rational explanation for this? ![]() ![]() As far as I can tell (I'm really more of a computer guy), all the other samples look rather awful. Question 3: Does it make sense that for these samples merging failed so hard? I mean, the insert sizes are clearly too large, yes? Question 4: Sample 3 had the highest concentration and amount of DNA in the beginning, then all of a sudden it became rather bad. Can the blame be assigned to Macrogen, or are there other possible explanations for this? So yeah, I'd really appreciate it if somebody with more experience could share some thoughts on this whole thing.. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Connecticut Join Date: Jul 2011
Posts: 162
|
![]()
Wow, lots of things going on here, but before I begin an in depth answer I'll need to ask how the libraries were sequenced. If it was HiSeq, which I presume, then was it a 2x100bp run or a 2x150bp run? Regardless, none of the inserts in your libraries appear to be small enough to overlap with a 2x150 run.
Overall I'd say all of the libraries except Sample 5 look good as far as the Bioanalyzer traces go. You have a valid argument for one library being TruSeq with the others being Nextera, but there's technically nothing wrong with doing either. |
![]() |
![]() |
![]() |
#3 | |
Senior Member
Location: sub-surface moon base Join Date: Apr 2013
Posts: 372
|
![]() Quote:
Last edited by rhinoceros; 08-16-2013 at 10:09 AM. |
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]()
If this was done 2 years ago then I am not sure that 2 x 150 bp was possible at that time.
Last edited by GenoMax; 08-16-2013 at 10:30 AM. |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Connecticut Join Date: Jul 2011
Posts: 162
|
![]() |
![]() |
![]() |
![]() |
#6 | |
Senior Member
Location: Connecticut Join Date: Jul 2011
Posts: 162
|
![]() Quote:
Also, the raw files will easily tell you the read lengths. Overall, I wouldn't expect these reads to overlap, so you'll have to drop that step from your analysis. |
|
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: sub-surface moon base Join Date: Apr 2013
Posts: 372
|
![]()
Well the order date is July 20 2011. I think post QC mean sequence length of ~105 bp means that it had to be 2x150bp? I mean, post QC mean sequence length of 2x100bp would be a lot shorter than 100bp, no?
|
![]() |
![]() |
![]() |
#8 | |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]() Quote:
Not necessarily. With good libraries/sequencing you may not lose a single base. |
|
![]() |
![]() |
![]() |
#9 | ||
Senior Member
Location: sub-surface moon base Join Date: Apr 2013
Posts: 372
|
![]() Quote:
Quote:
Last edited by rhinoceros; 08-16-2013 at 10:59 AM. |
||
![]() |
![]() |
![]() |
#10 |
Senior Member
Location: sub-surface moon base Join Date: Apr 2013
Posts: 372
|
![]()
Yeah, I have the raw data (but not at hand). I never did any QC as that was done long before I started at the job. God, I hate it so much when I can't do all the work from the start..
|
![]() |
![]() |
![]() |
#11 | |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]() Quote:
You will easily be able to tell the machine/number of cycles as long as you have the "original" data. Post an ID here and we can tell otherwise. |
|
![]() |
![]() |
![]() |
#12 | |
Senior Member
Location: Connecticut Join Date: Jul 2011
Posts: 162
|
![]() Quote:
As for the Nextera libraries, you really have no control over the fragment size because it's a transposon that fragments that DNA. That's why the distribution of fragments is much larger in the Bioanalyzer traces for the Nextera libraries compared to the TruSeq which is fragmented mechanically (e.g. Covaris). This would go a ways to explaining why the TruSeq library had the lowest number of overlapping reads while the Nextera libraries had more. To put it simply, >98% of the TruSeq fragments being sequenced should be >300bp, so 2x100bp reads will rarely overlap. Conversely, because there's a greater chance of small fragments being generated and sequenced with Nextera, a higher proportion of the reads would be expected to overlap. As I noted before, your best bet is to stop focusing on read merging, because it doesn't look like it's going to happen. Even though you're doing metagenomics, there have been a lot of papers that have used 2x100bp HiSeq data for assemblies where the reads do not overlap. I'd recommend taking a look at what they're doing instead of just relying on MG-RAST. |
|
![]() |
![]() |
![]() |
#13 | |
Senior Member
Location: sub-surface moon base Join Date: Apr 2013
Posts: 372
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|