Hello,
So, I've been working with some assemblies. Recently, I submitted our paired-end reads into MG-RAST, and discovered that only a small percentage (~5) of them were merged when FastqJoin -m 8 -p 10.
Now, I managed to get my hands on some actual QC data:
Our metagenomic samples are from deep underground, so low DNA yields are not super surprising. However, I learned that prior to sending, our samples were amplified by MDA, so
Question 1: Perhaps the low amount of DNA is somewhat surprising?
For some reason Macrogen ended up making TruSeq library from one sample, and Nextera libraries from the other samples:
To me, Sample 1 looks pretty good, I'd expect that a large percentage of the reads could be merged later on. However, this was not the case. In fact, I think this sample had the smallest percentage of merged reads in MG-RAST.
Question 2: Is there any rational explanation for this?
As far as I can tell (I'm really more of a computer guy), all the other samples look rather awful.
Question 3: Does it make sense that for these samples merging failed so hard? I mean, the insert sizes are clearly too large, yes?
Question 4: Sample 3 had the highest concentration and amount of DNA in the beginning, then all of a sudden it became rather bad. Can the blame be assigned to Macrogen, or are there other possible explanations for this?
So yeah, I'd really appreciate it if somebody with more experience could share some thoughts on this whole thing..
So, I've been working with some assemblies. Recently, I submitted our paired-end reads into MG-RAST, and discovered that only a small percentage (~5) of them were merged when FastqJoin -m 8 -p 10.
Now, I managed to get my hands on some actual QC data:
Our metagenomic samples are from deep underground, so low DNA yields are not super surprising. However, I learned that prior to sending, our samples were amplified by MDA, so
Question 1: Perhaps the low amount of DNA is somewhat surprising?
For some reason Macrogen ended up making TruSeq library from one sample, and Nextera libraries from the other samples:
To me, Sample 1 looks pretty good, I'd expect that a large percentage of the reads could be merged later on. However, this was not the case. In fact, I think this sample had the smallest percentage of merged reads in MG-RAST.
Question 2: Is there any rational explanation for this?
As far as I can tell (I'm really more of a computer guy), all the other samples look rather awful.
Question 3: Does it make sense that for these samples merging failed so hard? I mean, the insert sizes are clearly too large, yes?
Question 4: Sample 3 had the highest concentration and amount of DNA in the beginning, then all of a sudden it became rather bad. Can the blame be assigned to Macrogen, or are there other possible explanations for this?
So yeah, I'd really appreciate it if somebody with more experience could share some thoughts on this whole thing..
Comment