SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Adapter trimming figo1019 RNA Sequencing 2 07-17-2018 05:00 AM
Quality-, adapter- and RRBS-trimming with Trim Galore! fkrueger Bioinformatics 132 04-18-2017 02:04 AM
Adapter trimming and trimming by quality question alisrpp Bioinformatics 5 04-08-2013 05:55 PM
adapter trimming - help a_mt Bioinformatics 6 11-12-2012 08:36 PM
3' Adapter Trimming caddymob Bioinformatics 0 05-27-2009 01:53 PM

Reply
 
Thread Tools
Old 07-12-2018, 10:24 PM   #301
kokyriakidis
Member
 
Location: Thessaloniki, Greece

Join Date: Jul 2018
Posts: 12
Default Different Libraries

Quote:
Originally Posted by GenoMax View Post
Have you looked at the in-line help for "splitnextera.sh"? The adapters for Mate Pair libraries are in the "adapters.fa" file so you should be able to trim them as usual (Nextera_LMP_Read1_External_Adapter, Nextera_LMP_Read2_External_Adapter)
In SplitNextera guide it states that it it different from LMP. Nextera mate pair is not the same as Mate pair library v2 and also, Mate pair library v2 does not have these LMP adapters you mentioned!

From JGI site:
"SplitNextera splits Nextera LMP libraries into subsets based on linker orientation. It is designed strictly for Nextera LMP (long-mate-pair) reads, not for normal libraries using a Nextera kit. Nextera LMP libraries must be split prior to further processing; they are not usable raw. Adapter-trimming should still be done on Nextera LMP libraries prior to splitting."

Mate SamplePrep V2 Documentation:
https://support.illumina.com/content...15008135_A.pdf
kokyriakidis is offline   Reply With Quote
Old 07-13-2018, 10:36 PM   #302
jamie225
Junior Member
 
Location: USA

Join Date: Jul 2018
Posts: 2
Default

hi. i am new here how are you all.
jamie225 is offline   Reply With Quote
Old 07-18-2018, 10:46 PM   #303
jsena33
Junior Member
 
Location: Santa Fe, New Mexico

Join Date: Jul 2018
Posts: 2
Default

Hi All,

Is it possible to match degenerate sequences like below, trim the sequences and place the degenerate sequences in the fastq header? I am attempting to trim an adapter with the following structure Adapter(21nt)-UMI(16nts)-Adapter(24nt) and place it in the fastq header.

Matching degenerate sequences such as primers:
bbduk.sh in=reads.fq out=matching.fq literal=ACGTTNNNNNGTC copyundefined k=13 mm=f

Thank you for your help!
jsena33 is offline   Reply With Quote
Old 07-19-2018, 04:57 AM   #304
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,800
Default

@jsena33: You should take a look at UMI tools for this type of application.
GenoMax is offline   Reply With Quote
Old 07-19-2018, 07:34 AM   #305
jsena33
Junior Member
 
Location: Santa Fe, New Mexico

Join Date: Jul 2018
Posts: 2
Default

Hi GenoMax,

Thanks for the suggestion! I have used UMI tools which works ok but I am working with long reads with a higher error rate (indel bias) than Illumina reads. Therefore, it is likely that the adapter and UMI will contain indels so the adapter structure may actually look like this, Adapter(19-21nt)-UMI(14-16nts)-Adapter(22-24nt).

Thanks again for your advice!
jsena33 is offline   Reply With Quote
Old 10-03-2018, 10:47 PM   #306
FlySquirrelFly
Junior Member
 
Location: USA

Join Date: Oct 2018
Posts: 2
Default

Hi all,

Newcomer to RNA-seq/bbduk/the forum here.. I have a question that's probably really basic, but I have read through the bbduk docs, ctrl+F'ed "maq" and "minavgquality" through all 16 pages of this thread, and tried googling; all to no avail. So here I am.

When using `minavgquality` (`maq`), I'm very puzzled as to how the "average quality" is calculated. I was filtering full-length reads (91bp) by average quality (no trimming involved), and was expecting a very straightforward calculation -- taking the unweighted mean of individual Phred scores across individual bases.

For example, with

@A00325:34:H3FM7DRXX:1:1101:1208:1047 2:N:0:0NACTCTAA
AGTCGTACCGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAGAAAAGTAAACTGCGTTTATACCAATGCGTCCGCGGACAGGCGTTT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,,F,,F,F,,,F,,:F,,,FF:F,F,:,,,,,,,FF:F::,,F:::F,:F

There are 25 `,`, 10 `:`, and 56 `F`. Under Illumina 1.8+ encoding scheme, I was expecting something like (25*(44-33)+10*(58-33)+56*(70-33))/(25+10+56)=2597/91=28.5.

I was shocked when this read got filtered with an `maq` of 20.

I looked through some of the source code mentioning `minAvgQuality`. In BBQC.java and RQCFilter2.java, the default `minAvgQuailty` settings seem to be 8 and 5 respectively. This, plus the fact that when I tried `maq=30` all (!) my reads were filtered, made me suspect that bbduk calculates "average quality" differently somehow? Can someone please explain this? (Is this what the "Phred algorithm" alluded to by the bbduk doc is referring to?)

(I read here during my googling attempt that "Calculating average Q (Phred) scores is a bad idea". But it's something that our lab routinely does and I think my PI would want me to do it anyways..)

Command I was using (version 38.25):

bbduk.sh in=raw.fastq out=raw_qual-pass.fastq outm=raw_qual-fail.fastq maq=20 ordered=t

(also tried adding `k=91` since all my reads are 91bp, `qin=33`, `qout=33`; no difference whatsoever)

Thanks!
FlySquirrelFly is offline   Reply With Quote
Old 10-04-2018, 05:41 AM   #307
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,800
Default

@FlySquirrelFly: It has become difficult to get a hold of Brian (due to his day job responsibilities) but I will flag your post for him to see if he can respond.

I recall from some past discussion that average quality is calculated as a rolling window average and as soon as it drops below your set value it will trim/filter the rest of the read.

You should also consider this:
Quote:
Note - if neither ktrim nor kmask is set, the default behavior is kfilter.
All three are mutually exclusive.
You may want to explicitly set "qtrim=" if you only want to quality trim. You may also want to use
Quote:
trimq=6 Regions with average quality BELOW this will be trimmed, if qtrim is set to something other than f.
Unless you are doing de novo work there is generally no need to filter based on quality. If you have a good reference to align to data as low as Q10 should still be usable.

Last edited by GenoMax; 10-04-2018 at 05:44 AM.
GenoMax is offline   Reply With Quote
Old 10-06-2018, 05:32 PM   #308
FlySquirrelFly
Junior Member
 
Location: USA

Join Date: Oct 2018
Posts: 2
Default

@GenoMax:

Thanks for your quick reply and for flagging the post for Brian! Much appreciated.

Indeed, since I did not set ktrim or kmask, kfilter should have been carried out (which is what I intended).

I wanted to filter based on the average quality of the full-length read, so I did not use the options related to and including`qtrim`.

I'm doing two separate analyses. One involves the canonical type of transcriptomic analysis (quantification of gene expression, differential expression analysis, etc). For that, like you said, there's probably no need to filter based on quality. The other involves doing some de novo assembly using the raw reads (for antibody V(D)J receptor). I figured that for the latter it'd probably be nice to have an extra layer of QC.
FlySquirrelFly is offline   Reply With Quote
Reply

Tags
adapter, bbduk, bbtools, cutadapt, trimmomatic

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:54 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO