View Single Post
Old 02-04-2017, 11:15 PM   #60
mcmc
Member
 
Location: Midwest, USA

Join Date: Jan 2016
Posts: 14
Default

Quote:
Originally Posted by Brian Bushnell View Post
This appears to be getting killed by the scheduler because it's using more memory than it's allowed to, although usually that happens much faster so I can't be sure. I suggest you reduce the -Xmx argument. However, I don't know what you should reduce it to, since I don't know the memory limit for that node... but you could just try, say, -Xmx50g and see if that works. Otherwise, I suggest you talk to your sysadmin and ask how much memory you are allowed to use on that node, or why the job got killed.
Indeed, reducing it to -Xmx50g solved the problem (on a 64g exclusive node).

I've tried to improve my merging rate above ~75% and as you suspected I think poor quality is limiting the merging. Here are the gory details:

First I adapter trimmed with bbduk ref=truseq.fa.gz ktrim=r mink=11 hdist=2 k=21 tpe tbo

Then bbmerge:
1. default: 76.661% join, median=292
2. adapter=default, qtrim2=r > 76.672% join, median=292, adapter expected=11900, adapter found=3188
3. -Xmx50g prealloc=t prefilter=1 extend2=50 k=62 rem adapter=default > 76.564%, median=293, adapter expected=11511, adapter found=2805
4. Focusing on the unmerged reads from (1):
  • 4a. qtrim2=r trimq=10,15,20 adapter=default > 34.5% (of unmerged) now join, median=322
  • 4b. qtrim2=r trimq=10,15,20,25 adapter=default forcetrimright2=25 > 41.15% (of unmerged) now join, median=321
  • 4c. ecct qtrim2=r adapter=default > 10.40% (of unmerged) now join, median=339
  • 4d. xloose > 33.6% (of unmerged) to join
5. Try sequential merging:
default bbmerge; feed unmerged to qtrim2=r, adapter=default, trimq=10,15,20; feed unmerged to forcetrimright2=50.
Result: total merge rate of 86.8%, leaving 3.57M/27M unmerged
6. Combining in one command did worse (order of operations different?): qtrim2=r, trimq=10,15,20, adapter=default, forcetrimright2=50, ecct > 73.8% total join

So it seems like quality trimming and even force trimming have the greatest effect?

Questions:
- what is the order of operations (if specified in a single command): qtrim2, forcetrimright2, ecct, adapter. Is it better to run iteratively on the unmerged reads?
- are trim operations performed even if the merge fails (i.e. are the unmerged output reads trimmed)?
- can you force trim from just the R2 read?

Any other suggestions for dealing with poor quality, primarily the last 50 bases of the R2 read (in a 2x250 HiSeq run)?

Thanks very much,
MC
Attached Images
File Type: png insert_size_hist_default.png (19.0 KB, 4 views)
mcmc is offline   Reply With Quote