Seqanswers Leaderboard Ad

**GenoMax** · 11-09-2015, 07:46 AM

See post#4 here. You may want to recombine your fastq files with the method described there.

**ea11** · 11-09-2015, 07:52 AM

Thanks for that, I shall give it a try and see if it works.

The thing is though, when I run the concatenated file through fastqc, it shows the correct number of reads you would expect (~80million). Does that still mean it didn't merge correctly?

**GenoMax** · 11-09-2015, 08:12 AM

Simon (author of FastQC) is probably accounting for that in his code (he was the one who posted this observation first).

As @Brian comes along later today he will comment on BBDuk (he generally takes these kinds of things into account but there are so many and there is only one @Brian).

**ea11** · 11-09-2015, 08:25 AM

Ah right ok, thanks for the clarification. Well I am re-merging the files using the command in the other post, and will hopefully wait to hear back from Brian when he gets round to it.

Thanks

**Brian Bushnell** · 11-09-2015, 10:42 AM

I've concatenated gzipped files before and not had a problem with it. That said, I don't remember ever having a problem with it, and I tried just now and it worked fine... it might be related to the java version? Do you mind running "java -Xmx20m -version" and posting the output? Also, it could have to do with the program used to do the compression...

Either way, Genomax's post (zcatting them) should solve it.

And by the way - BBDuk2 is not quite a drop-in replacement for BBDuk. They have somewhat different syntax. In this case, if you are trying to trim to the right using "ref" and to the left using "literal", you need to use the flag "rref" instead of "ref" and "lliteral" instead of "literal", so that it knows to use the ref for right-trimming and the literal for left-trimming. That is what you want to do, correct?

**ea11** · 11-09-2015, 11:23 AM

Thanks for the reply Brian. I am zcatting the files at the moment, so hopefully they should work like that, but would be nice to work out why my files didn't working just concatenating them.

java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.1) (rhel-1.45.1.11.1.el6-x86_64)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

Yes that is what I am trying to do. Thanks for letting me know about the slight differences I need to make in the code. So is BBDuk2 good to use for what I want to do or should I go back to BBDuk?

Thanks

**Brian Bushnell** · 11-09-2015, 11:35 AM

BBDuk2 should work fine if you adjust the parameters as I indicated. That said - personally, I would use 2 passes of BBDuk because BBDuk2 is a bit more confusing and less flexible (you can't use different kmer lengths for left and right trimming, for example). I designed BBDuk2 for integration into pipelines that get written once and then run exactly the same way for years, to achieve maximal efficiency, since it can do all kmer operations in a single pass (filtering, left-trimming, right-trimming, and masking). But actually I never use it because I usually want different values of K and a different hamming distance for the different steps.

The issue here is either that you are running OpenJDK, or version 1.6, and probably both combined. I only test with Oracle's JDK, and use version 1.7 and 1.8.

**ea11** · 11-09-2015, 11:40 AM

Ok, I shall give BBDuk a look and see the results of that.

With regards to the Java version, I'm running all my analysis on a computer cluster so the only version of Java installed by the administrators is 1.6. Should zcatting work with version 1.6?

**Brian Bushnell** · 11-09-2015, 11:46 AM

Yes, zcatting should work with any version of Java; the only disadvantage is that it takes longer than cat. But, I recommend that you request your sysadmin upgrade to the latest supported version, which is 1.8 for Oracle and I believe 1.8 for OpenJDK (and I would suggest Oracle's, but that's just a personal preference since I test on Oracle's - they are supposed to be identical). Java is backwards compatible, and 1.6 is quite old now.

**ea11** · 11-09-2015, 11:48 AM

Yep, am currently finding out that zcatting takes longer, which is why I'm running it overnight. I shall put forward a request to upgrade the version of Java that we have.

Thanks for you help Brian

**GenoMax** · 11-09-2015, 11:52 AM

You could trim the 8 file pairs independently and then combine the bam's into one at later step. This would provide some (brute force) parallelization :-)

**ea11** · 11-09-2015, 11:58 AM

Yea Geno, that was an option that we talked through, but then as a lab group, we decided to merge all the files from the start and work on 2 per sample rather than 16 per sample

**GenoMax** · 11-09-2015, 12:04 PM

It is easy enough for whoever you got the sequencing done from to generate the output as a single file, instead of the pieces. Next time you may want to request that.

**ea11** · 11-09-2015, 12:06 PM

Yea that is what I was expecting from the people who did the sequencing, which is why I was shocked when I was given 640 files rather than the 80 I was expecting.

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 26 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 28 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

BBDuk2 not trimming whole file

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News