SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Quality-, adapter- and RRBS-trimming with Trim Galore! fkrueger Bioinformatics 132 04-18-2017 01:04 AM
Adapter trimming figo1019 RNA Sequencing 1 04-07-2014 10:58 AM
Adapter trimming and trimming by quality question alisrpp Bioinformatics 5 04-08-2013 04:55 PM
adapter trimming - help a_mt Bioinformatics 6 11-12-2012 07:36 PM
3' Adapter Trimming caddymob Bioinformatics 0 05-27-2009 12:53 PM

Reply
 
Thread Tools
Old 05-05-2017, 12:07 PM   #221
BrianS
Junior Member
 
Location: East Coast, USA

Join Date: May 2017
Posts: 2
Default

That makes sense. Thank you.
BrianS is offline   Reply With Quote
Old 05-06-2017, 03:07 PM   #222
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

This behaviour is a bit un-Unix like?


bbduk.sh in1=R1.fq.gz in2=R2.fq.gz loglog loglogk=31 out=/dev/null

Unspecified format for output /dev/null; defaulting to fastq.

Exception in thread "main" java.lang.AssertionError: /dev/null already exists; please delete it.
Torst is offline   Reply With Quote
Old 05-06-2017, 04:26 PM   #223
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,435
Default

Quote:
Originally Posted by Torst View Post
This behaviour is a bit un-Unix like?
@Brian will have a more official answer but BBTools are pure Java and are coded to be OS agnostic (will run on any OS with Java).

Not specifying an "out" option with most BBTools produces all statistics without result output (giving you out=/dev/null effect).
GenoMax is offline   Reply With Quote
Old 05-06-2017, 04:27 PM   #224
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,668
Default

Haha

The syntax would be:

bbduk.sh in1=R1.fq.gz in2=R2.fq.gz loglog loglogk=31 out=stdout.fq > /dev/null/

But, you don't need to specify anything, as the default is to not print anything rather than writing to stdout, so just do this:

bbduk.sh in1=R1.fq.gz in2=R2.fq.gz loglog loglogk=31

Edit: @Genomax beat me by a minute
Brian Bushnell is offline   Reply With Quote
Old 05-17-2017, 12:46 PM   #225
phylloxera
Junior Member
 
Location: Nebraska

Join Date: May 2017
Posts: 2
Default Getting different results with bbduk command line vs geneious plugin

Hi,

I have been searching the default settings in the command line and still haven't identified the source of the discrepancy... Here is my linux command:

sh ~/bbmap/bbduk.sh in1=~/path/to/forwards.fastq.gz in2=~/path/to/reverses.fastq.gz out=~/path/to/output.fastq.gz ref=~/bbmap/resources/adapters.fa ktrim=r k=23 mink=11 hdist=1 minoverlap=24 tbo

result: 3628348 reads, 581467350 bases

and here is my plugin command from the geneious output:
java.exe -ea -Xmx100m -cp ...\currenjgi.BBDukF ktrim=r k=23 hdist=1 edist=0 mink=11 ref=adapters.fa minlength=10 trimbyoverlap=t minoverlap=24 qin=33 in=input1.fastq in2=input2.fastq out=output1.fastq out2=output2.fastq

result: 3628348 reads, 584214527 bases

the plugin command seems compatible with my data and the defaults in bbduk.sh. Any idea why 3M more bases in the plugin?

Thanks, Aaron
phylloxera is offline   Reply With Quote
Old 05-17-2017, 02:08 PM   #226
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,668
Default

No idea; those commands should be equivalent, unless the adapters.fa file is different. Can you post the full output of the command so I can see where the bases were lost? Also, you may want to add "stats=stats.txt" which will indicate which adapter sequences are being hit, and should be identical in both cases.
Brian Bushnell is offline   Reply With Quote
Old 05-17-2017, 03:41 PM   #227
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,435
Default

Perhaps geneious is using an adapters.fa file that is different?
GenoMax is offline   Reply With Quote
Old 05-18-2017, 06:06 AM   #228
phylloxera
Junior Member
 
Location: Nebraska

Join Date: May 2017
Posts: 2
Default re: different results from command line and geneious plugin

Yes, the adapters files are different. Geneious says using All Truseq, Nextera, and PhiX adapters (152) sequences. The count in the resources folder of bbmap is 154 sequences. I don't know which 2 are lacking from geneious, presuming the shared set is 152. I could not find the adapters file geneious is calling... or the stats.txt file I asked it to produce. May need to call in Geneious support to help with this. Among the top stats.txt hits in the linux output are pcr dimer (16% of reads) and pcr primers (15% of reads). I wonder if these are lacking in geneious... Geneious is trimming more sequences by overlap 206,591 vs 157,344 and fewer sequences by ktrim 2,630,578 vs 2,657,654. Now... which one is best? The one trimming more?
phylloxera is offline   Reply With Quote
Old 05-18-2017, 11:16 AM   #229
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,668
Default

Probably the one trimming more is better. Geneious is probably trimming more by overlap because overlap trimming happens after adapter-sequence trimming, so if some adapter sequences are missing, those will (usually) get overlap-trimmed rather than sequence-trimmed. But it's hard to say. You could align the first 1m reads to the reference (if you have one) and look at the mapping rates and error rates to see which dataset is better.
Brian Bushnell is offline   Reply With Quote
Old 06-06-2017, 07:23 AM   #230
jweger1988
Member
 
Location: Paris, France

Join Date: Apr 2017
Posts: 34
Default

Hi Brian,

I'm trying to do some trimming of primer sequences from amplicons that I sequenced.

I have a viral genome that was amplified in 5 large-ish overlapping pieces (~2-3kb) and then fragmented and sequenced on a nextseq (1x150 SE).

I would like to trim only the primer sequences on the end of the read. The problem is I have primers of all different lengths, how would you go about encompassing all of the primers in one round of bbduk trimming? Or would you have to do them individually for each k value?

I was thinking something like this
Code:
bbduk.sh in=$f1 out=${file_base}_ff.fq.gz literal=AGTTGTTAGTCTACGTGGA,AGCTGGAAAAACAAAGAACT,GCCAAGAAACAGGATGTCGTTG,GCCGTAATTGGTATCGATACTGGA,GGACATGGGCAGATTGAC,ACTGTGAGGATGGCTATGA,GCAGACAGAAAGTGGTGTTTT,ACAAAACGTGGGCTTACCATG,AGGCCTAACAAAAGGAGGAC,TTTCCCAGCGTCAATATGCT ktrim=l k=24 mink=15 hdist=2 qtrim=rl trimq=30 restrictleft
Where the mink=15 encompasses all of the primer shorter than the longest (k=24). Does this seem OK?

Thanks in advance.

Last edited by GenoMax; 06-06-2017 at 07:46 AM. Reason: Added [CODE] tags
jweger1988 is offline   Reply With Quote
Old 06-06-2017, 11:31 AM   #231
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,668
Default

Since you have 5 different overlapping sections, if you trim the primers all at once with no precautions, you will break your assembly into 5 segments since anything read overlapping a primer region will have the primer region trimmed. You can mostly eliminate this effect, though, using the "restrictleft" flag as in your command, but you need to specify a position - in this case, "restrictleft=24" or maybe slightly more if there is some uncertainty about where, exactly, sequencing starts on the amplification primer (restrictleft=26 should be fine).

With that change, I think your command will be fine. And certainly the most convenient. So, try it, and see if the virus assembles in one contig.

If it doesn't, you can refine it more. If you have the data split into 5 libraries with different bar codes, I'd recommend running each one independently with just its own sequences and kmer lengths. But if not, you can still get a slight increase in precision by running each primer individually with its own specific kmer length (in which case you should not do quality-trimming until after all primer-trimming is complete). You can also increase precision by turning off "rcomp" when running primers individual (since left-end primer sequences will always be the original sequence rather than reverse-complemented). You might need to play around with this a bit depending on how the sequences are reported (e.g. maybe half of the primers are already reported as reverse-complements). But those steps usually are not necessary with super-short genomes as in this case, particularly with primers that are designed to be unique in the genome; I'm just listing it as an example of what could be done if needed.
Brian Bushnell is offline   Reply With Quote
Old 06-06-2017, 12:52 PM   #232
jweger1988
Member
 
Location: Paris, France

Join Date: Apr 2017
Posts: 34
Default

Brian,

Thanks for the quick response.

It seems like splitting it up does give me a little better precision (as judged by a higher % of bases lost), so I'll stick with that.

On a similar note, I'm wondering what your thoughts are on quality trimming. If I trim to q30 (qtrim=rl trimq=30) I lose about half my bases. Do you think this level of stringency is required for variant calling? What would you normally do?
jweger1988 is offline   Reply With Quote
Old 06-06-2017, 02:20 PM   #233
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,668
Default

I do not recommend extremely stringent quality-trimming prior to variant-calling (or most anything, for that matter). In general, it reduces mapping accuracy and increases various forms of bias; though the magnitude is different for a tiny nonrepetitive virus than human. Also, shorter reads reduce variant quality-score estimates and the ability to detect indels.

BBTools' callvariants.sh can do quality-trimming on mapped reads just prior to variant-calling, so the quality-trimming does not reduce mapping accuracy. The default value is 10. Additionally, the default direction is "qtrim=r" because Illumina quality scores are inaccurate on the left side (artificially low) due to the base-caller; with correctly-calibrated quality scores, though, I would use "qtrim=rl". You can, actually, recalibrate the quality scores if you want using calctruequality.sh:

Starting with the raw reads, for variant-calling on single-ended amplicon data from a NextSeq, I'd generally do this:

Code:
bbduk.sh in=raw.fq out=trimmed1.fq ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 minlen=150

bbmap.sh in=trimmed1.fq out=mapped_raw.sam ref=virus.fa ambig=toss slow

calctruequality.sh in=mapped_raw.sam ref=virus.fa callvariants

filterbytile.sh in=raw.fq out=filtered.fq

clumpify.sh in=filtered.fq out=clumped.fq dedupe optical spany adjacent

bbduk.sh in=clumped.fq out=recal.fq recalibrate

bbduk.sh in=recal.fq out=recal_trimmed.fq ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 minlen=30

(primer-trim)

bbduk.sh in=primer_trimmed.fq out=qtrimmed.fq qtrim=rl trimq=8 minlen=30

bbmap.sh in=qtrimmed.fq out=mapped_final.sam ref=virus.fa

callvariants.sh in=mapped_final.sam ref=virus.fa out=vars.vcf ploidy=1
(possible additional variant-calling flags like "rarity=0.05" depending on whether you are interested in minority alleles of a multi-strain viral sample)
Note that quality-score recalibration and filter-by-tile are only effective if you have sufficient reads (millions are recommended) to prevent over-fitting. If you only have 100x data for a 15kbp virus I don't recommend them (the first 4 steps), but without recalibration, I don't recommend quality-trimming on the left end.

If the virus is sufficiently divergent from the reference you can alternatively assemble the raw reads and use that assembly as "ref" for recalibration (step 2 and 3).

Last edited by Brian Bushnell; 06-06-2017 at 02:46 PM. Reason: Made more complete
Brian Bushnell is offline   Reply With Quote
Old 06-07-2017, 11:20 AM   #234
jweger1988
Member
 
Location: Paris, France

Join Date: Apr 2017
Posts: 34
Default

Very interesting. I like the idea of quality score recalibration. If I don't have millions of reads (only between 1000-20000x avg. coverage) can I still do all of the first steps without the filterbytile step? Will qScore recal still help me here?

Also,

"If the virus is sufficiently divergent from the reference you can alternatively assemble the raw reads and use that assembly as "ref" for recalibration (step 2 and 3)."

Do you mean use something like tadpole so de novo assemble the divergent virus and then generate a fasta file to use as a new reference?

I haven't tried clumpify's consensus feature but it would be cool if it would generate a new fasta file based on a reference as the consensus. Or is this possible already?

One other question related to variant calling. I'm using callvariants.sh and I'm noticing some variants that are clearly not real due to problems at the ends of the reads.

If I want to call variants from only the middle 60% of the read (1x150bp) is this the correct command?

callvariants.sh in=in.bam out=out.vcf ref=ref.fa minedist=30


Thanks again for your help.
jweger1988 is offline   Reply With Quote
Old 06-07-2017, 12:55 PM   #235
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,668
Default

Quote:
Originally Posted by jweger1988 View Post
Very interesting. I like the idea of quality score recalibration. If I don't have millions of reads (only between 1000-20000x avg. coverage) can I still do all of the first steps without the filterbytile step? Will qScore recal still help me here?
20000x * 15000bp / 150bp = 2 million reads, which is plenty. 1000x * 15000bp /150bp = 100 thousand reads, which is insufficient. So, it depends. However, you can use the entire lane for filterbytile or recalibration (which will give optimal results). Filterbytile, specifically, does not need any mapping so it does not matter what the reference is; you can just dump a whole lane or flowcell into it and you will get optimal filtering. To use it effectively on a small library, you can run it in 2 stages, one where you generate the quality map using all reads, then a second for filtering just your library of interest:

Code:
filterbytile.sh in=whole_flowcell.fq dump=map.flowcell
filterbytile.sh in=library.fq indump=map.flowcell out=filtered.fq
You can do something similar with recalibration, but that is trickier, unless everything in the lane has the same reference. With >1 million reads recalibration should be useful. With less than that, it shouldn't hurt anything but will be less effective. The maximal amount a quality score is changed is modified by the total number of observations of the context- if you only have 3 examples of a context, a base's quality won't be changed. If you have thousands, it will. Contexts are fairly specific (e.g. quality score 26 for position 17 in a read where the base call is 'A') so you need many mapped reads to get just one observation of a given context, and many observations of a context to get a useful phred score. However, since NextSeq has binned q-scores, there are fewer possible contexts, so maybe even 100k reads would be enough.

Quote:
"If the virus is sufficiently divergent from the reference you can alternatively assemble the raw reads and use that assembly as "ref" for recalibration (step 2 and 3)."

Do you mean use something like tadpole so de novo assemble the divergent virus and then generate a fasta file to use as a new reference?
Yes, that's what I mean. But that fasta would only be used for recalibration. To recalibrate you need to align to a fasta and count matches/mismatches. The closer that fasta is to the sample, the more accurate the process is, because differences between the sample and its reference will no longer appear as mismatches. So assembling the reads is often the best approach rather than using an existing reference, particularly for a small organism that's easy to assemble, or when there is a large difference between the sample and ref. For the final variant-calling, you still need to use the reference rather than an assembly.

Quote:
I haven't tried Clumpify's consensus feature but it would be cool if it would generate a new fasta file based on a reference as the consensus. Or is this possible already?
Unfortunately... it only produces consensuses per clump, which is not what you want. Tadpole would be the best option.

Quote:
One other question related to variant calling. I'm using callvariants.sh and I'm noticing some variants that are clearly not real due to problems at the ends of the reads.

If I want to call variants from only the middle 60% of the read (1x150bp) is this the correct command?

callvariants.sh in=in.bam out=out.vcf ref=ref.fa minedist=30

Thanks again for your help.
Yes, "minedist=30" should do what you want (ignore variants that are mainly near the ends of reads), but might be too aggressive. For 150bp reads the average distance of a variant from the read end should be 150/4 = 37.5, given random distribution, so maybe minedist=20 would be safer. You can alternatively yes "border=30" which will ignore the outermost 30bp of the reads and thus never consider anything in that area - that allows the whole read to be used for alignment, but only the best part in the middle to be used for variant-calling.

If you use the flags "out=vars.txt vcf=vars.vcf", CallVariants will also produce a tab-delimited text file that you can paste into Excel and manually look at the "edist" column to see whether it looks useful to exclude false positives. You can theoretically do that with VCF also but the tab-delimited one is much more convenient!
Brian Bushnell is offline   Reply With Quote
Old 06-07-2017, 01:32 PM   #236
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,435
Default

Quote:
Originally Posted by Brian Bushnell View Post
Yes, "minedist=30" should do what you want (ignore variants that are mainly near the ends of reads), but might be too aggressive. For 150bp reads the average distance of a variant from the read end should be 150/4 = 37.5, given random distribution, so maybe minedist=20 would be safer. You can alternatively yes "border=30" which will ignore the outermost 30bp of the reads and thus never consider anything in that area - that allows the whole read to be used for alignment, but only the best part in the middle to be used for variant-calling.

If you use the flags "out=vars.txt vcf=vars.vcf", CallVariants will also produce a tab-delimited text file
Love these tidbits that you mention in the answers. I was going to look at a vcf2tab program for VCF's generated by callvariants.sh. Will try this out tomorrow.
GenoMax is offline   Reply With Quote
Old 06-27-2017, 05:30 AM   #237
KseniaA
Junior Member
 
Location: Europe

Join Date: Jun 2017
Posts: 3
Default

Hi Brian,

I am trying to QC sequencing library downloaded from SRA, so I do not know sequencing machine was used. It seems that BBDuk is struggling with recognition of paired reads in it and stop working without exit. So, below are detail:

These are FASTA example:
@ERR011089.1.1 I58_1_FC30DYKAAXX:8:1:3:1833 length=75
CTTCGGAAGACATTTCTTCAGAAAGTTTTTGGAGAAGATTTTTTAACGAAACCTATGGAGCGGGCTCAGCAGGAT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII(IIIIIIIIIIIIIIIIIDIIIII.I
@ERR011089.1.2 I58_1_FC30DYKAAXX:8:1:3:1833 length=75
CTGTCAAATCCNNTTTATCCATATTCTCCATAATGTAAAGCTTCACTTCATTCAAAAATGCCAGATTGGCATCCT
+
IIIIIIIIIII!!IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@ERR011089.2.1 I58_1_FC30DYKAAXX:8:1:3:377 length=75
CGGCCAACCTCGTCGTCACGGACTACGCCTTCTCGCCCGACGAACGGCAGATACTCATCGCCTCGGGCGCCAGGC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII'III
@ERR011089.2.2 I58_1_FC30DYKAAXX:8:1:3:377 length=75
GGCGTCGCGCGNNGCTTCGGCCTCGCGGAGTATGGGCATCAGACTGTTGCCCGAGGCCAGGAAATAATCGGTCGC
+
IIIIIIIIIII!!IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII)IIIIIIIIIIII&II%


And this is BBDuk message:

BBDuk version 36.64
Set INTERLEAVED to true
Reset INTERLEAVED to false because paired input files were specified.
Initial:
Memory: max=102900m, free=99142m, used=3758m

Added 134628693 kmers; time: 37.045 seconds.
Memory: max=102900m, free=83248m, used=19652m

Input is being processed as paired
Started output streams: 0.239 seconds.
Exception in thread "Thread-16" java.lang.AssertionError:
There appear to be different numbers of reads in the paired input files.
The pairing may have been corrupted by an upstream process. It may be fixable by running repair.sh.
at stream.ConcurrentGenericReadInputStream.pair(ConcurrentGenericReadInputStream.java:480)
at stream.ConcurrentGenericReadInputStream.readLists(ConcurrentGenericReadInputStream.java:345)
at stream.ConcurrentGenericReadInputStream.run(ConcurrentGenericReadInputStream.java:189)
at java.lang.Thread.run(Thread.java:748)

I also tried as suggested repair.sh, it cannot fix the problem with reads as well.

So I would be grateful for help with this situation. At least may it is possible to change code somehow to push program exit in case of such crash.

Ksenia
KseniaA is offline   Reply With Quote
Old 06-27-2017, 10:48 AM   #238
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,668
Default

Hi Ksenia,

This is always a tricky problem when people rename the reads. However, it looks like in your case the file is interleaved (that's my guess from the read headers), and thus you should only be using a single input to bbduk and add the "interleaved" flag, but it seems you are giving it two input files. Can you post the bbduk command, and explain exactly what came out of the SRA extraction process? Also, posting the SRA extraction command you use might help (it won't help me, but it might mean something to somebody).
Brian Bushnell is offline   Reply With Quote
Old 06-27-2017, 10:08 PM   #239
KseniaA
Junior Member
 
Location: Europe

Join Date: Jun 2017
Posts: 3
Default

Thank you Brian for swift reply!

Yes, you are right, I used reads files separated, not interleaved. Here is my SRA download command:

/sratoolkit.2.8.2-1-ubuntu64/bin/fastq-dump --gzip --skip-technical --readids --dumpbase --split-files --clip "$NAME"

And my BBDuk command:

bbduk.sh -Xmx100g in="$NAME"_1.fastq.gz in2="$NAME"_2.fastq.gz ref=/home/ksenia/ksenia2/Programms/bbmap/resources/adapters.fa,/home/ksenia/ksenia2/Programms/bbmap/resources/phix174_ill.ref.fa.gz out="$ID"_filtered.fastq interleaved=t fastawrap=10000 ktrim=r qtrim=rl minlength=30 maq=10 maxns=0 overwrite=t entropy=0.5 outs="$ID"_singletons.fastq tpe=t

I am quite sure in both these commands as for several hundred libraries they worked perfectly, but for several similar libraries, example of fastq I showed BBDuk failed. But it doesn't crash and exit, BBDuk just freeze and hang the whole pipeline.

These are examples of true input files:

reads1:

@ERR011090.1.1 I328_1_FC30MD2AAXX:3:1:3:296 length=75
TGCCTATCTCATGTGGATTTATGCCATTTTTCTTCATAAGTCTATCAACCAACAATCTGCCCCATNCATCNGNCA
+ERR011090.1.1 I328_1_FC30MD2AAXX:3:1:3:296 length=75
[email protected]<I7I>[email protected]<A0/60C<,A2;93%"E/+2"+"65
@ERR011090.2.1 I328_1_FC30MD2AAXX:3:1:3:894 length=75
TAGGCTATAAATTGTTTTTAGCGGTTGAAACAGGAATTTTCATTACGGCAGGGACACTCCCCTTTNTTTTNTNTT
+ERR011090.2.1 I328_1_FC30MD2AAXX:3:1:3:894 length=75
IIIIIIIIIIIIIIIIIIIIIIII8IH0IG/>[email protected]?A/.0,=111$,#))$+0+#$#""()$-"+"-$


reads2:

@ERR011090.1.2 I328_1_FC30MD2AAXX:3:1:3:296 length=75
ACCAAAGATAGATCCGTTTGAGGGANTTTTTGGNGTTTTTGCAGATAGTCTGCCTGATGGATGGGGCAGATTGTT
+ERR011090.1.2 I328_1_FC30MD2AAXX:3:1:3:296 length=75
[email protected]"HIIIIII"I96IAII18F0B65+/5C5298-:,2,.0<57$)-%2--/.
@ERR011090.2.2 I328_1_FC30MD2AAXX:3:1:3:894 length=75
TTCAGAACGCTGTAAAAGACACACANCAAAATGNAAAAAACATAGGTCTTACTGAAAAAGACAACAAGAACAATC
+ERR011090.2.2 I328_1_FC30MD2AAXX:3:1:3:894 length=75
IIIIIIIIIIIIDIIIIIIIIBIII"IIIIIFI"IIH?II7E>?=D0;E;IBII9A;8<69:G25I5889-3812

When I tried to launch repair.sh, it can't recognise paired reads and output everything as singletons. Maybe this info also can help. BBDuk, actually, starts to produce some output as well and sort some sequences on interleaved file and singletons but at some point stops working. And I have several libraries with such problem. Any idea how this can be solved?
KseniaA is offline   Reply With Quote
Old 06-29-2017, 07:29 PM   #240
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,435
Default

Can you dump these reads with "-F" option for fastq-dump to see if you are able to restore the original Illumina fastq headers. repair.sh should (if needed) work after that.
GenoMax is offline   Reply With Quote
Reply

Tags
adapter, bbduk, bbtools, cutadapt, trimmomatic

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:49 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO