SEQanswers

Go Back   SEQanswers > Search Forums


Showing results 1 to 25 of 500
Search took 0.17 seconds.
Search: Posts Made By: Brian Bushnell
Forum: Bioinformatics 04-24-2018, 08:42 AM
Replies: 5
Views: 630
Posted By Brian Bushnell
Oh, yep - it's fixed now, sorry for not...

Oh, yep - it's fixed now, sorry for not mentioning that!
Forum: Bioinformatics 04-18-2018, 09:46 AM
Replies: 638
Views: 126,250
Posted By Brian Bushnell
The coordinate system varies across platforms;...

The coordinate system varies across platforms; the physical distance of 1 pixel is much larger on HiSeq 2500 than HiSeq 3000/4000m for example.

NovaSeq has unique problems, though. The distance...
Forum: Bioinformatics 04-17-2018, 04:38 PM
Replies: 638
Views: 126,250
Posted By Brian Bushnell
Not exactly. reformat.sh has a "mappedonly"...

Not exactly. reformat.sh has a "mappedonly" flag, but that would only keep the mapped reads. However, you can use requiredbits and filterbits in 2 passes, like this:

reformat.sh in=data.sam...
Forum: Bioinformatics 04-17-2018, 04:31 PM
Replies: 638
Views: 126,250
Posted By Brian Bushnell
MAPQ is a measure of the probability (estimate)...

MAPQ is a measure of the probability (estimate) that the mapping location is correct. This can factor in various things, including the number of mismatches... but, for example, with these very...
Forum: Bioinformatics 04-11-2018, 03:45 PM
Replies: 125
Views: 44,678
Posted By Brian Bushnell
@chloe - It's normally simplest and most...

@chloe - It's normally simplest and most effective to do QC first on the raw data, then anything else (such as merging) later.

@silask - they way you are doing it is currently the most effective...
Forum: Bioinformatics 04-04-2018, 09:10 AM
Replies: 5
Views: 630
Posted By Brian Bushnell
Hi Greg, Sorry about that, there was a bug...

Hi Greg,

Sorry about that, there was a bug that slipped in around v37.87 with regards to multisample VCF names. It's fixed in v37.96 which I will release this week.

-Brian
Forum: Bioinformatics 03-21-2018, 03:24 PM
Replies: 638
Views: 126,250
Posted By Brian Bushnell
The problem here is that minimap uses old-style...

The problem here is that minimap uses old-style cigar strings (M symbol instead of = and X) and also does not produce MD tags. I've added the ability to handle reads in that situation and it will be...
Forum: Bioinformatics 03-21-2018, 03:21 PM
Replies: 212
Views: 53,877
Posted By Brian Bushnell
While BBMap is not originally designed for this...

While BBMap is not originally designed for this purpose; I made a version that does a much better job at finding all mappings above some identity threshold, bbmapskimmer.sh. The usage is the same as...
Forum: Bioinformatics 03-21-2018, 03:18 PM
Replies: 212
Views: 53,877
Posted By Brian Bushnell
BBMerge might be able to help in this case, if...

BBMerge might be able to help in this case, if you have paired reads with a sufficient number of short inserts. You can run it like this:

bbmerge.sh in1=read1.fq in2=read2.fq outa=adapters.fa
...
Forum: Bioinformatics 03-21-2018, 03:13 PM
Replies: 638
Views: 126,250
Posted By Brian Bushnell
This means 72 percent of the reads mapped with an...

This means 72 percent of the reads mapped with an "N" symbol in the match string, an internal data structure similar to a cigar string. The "N" symbol denotes either an N in the read or an N in the...
Forum: Bioinformatics 03-20-2018, 09:07 AM
Replies: 22
Views: 14,549
Posted By Brian Bushnell
Reformat won't do that, but you can use...

Reformat won't do that, but you can use partition.sh:

partition.sh in=X.fa out=X%.fa ways=10

That will produce 10 output files with an equal number of sequences and no duplication.
Forum: Bioinformatics 01-04-2018, 10:27 AM
Replies: 12
Views: 2,149
Posted By Brian Bushnell
I concur; 17 is really too short for this...

I concur; 17 is really too short for this purpose. When trying to estimate genome size, it's important for most of the kmers to be unique (aside from long perfect repeats); so, kmer lengths greater...
Forum: Bioinformatics 10-13-2017, 10:14 AM
Replies: 94
Views: 19,528
Posted By Brian Bushnell
Hi Gopo, I don't particularly recommend...

Hi Gopo,

I don't particularly recommend Tadpole for diploid (or higher) genomes, as it has absolutely no capability of dealing with heterozygous sites. However, it's really fast, so even with a...
Forum: Bioinformatics 10-11-2017, 05:52 PM
Replies: 125
Views: 44,678
Posted By Brian Bushnell
As GenoMax says, trimming to Q30 is not...

As GenoMax says, trimming to Q30 is not beneficial before merging reads. BBMerge has some internal quality-trimming options, so it can try to merge, then quality-trim if it is unsuccessful, then try...
Forum: Bioinformatics 10-11-2017, 01:43 PM
Replies: 60
Views: 16,808
Posted By Brian Bushnell
Actually, "nodisk" does not work with BBSplit... ...

Actually, "nodisk" does not work with BBSplit... sorry! I'll clarify that in the documentation. It's not like it's impossible to make it work, but it would be pretty complicated; one of those...
Forum: Bioinformatics 10-11-2017, 12:48 PM
Replies: 125
Views: 44,678
Posted By Brian Bushnell
Hi Ashu, "Ambiguous" means there are...

Hi Ashu,

"Ambiguous" means there are multiple possible overlaps. For example, if read 1 and read 2 both end with "ACACACACACACACACACACAC", there are lots of possible overlap frames, none of which...
Forum: Illumina/Solexa 10-11-2017, 12:29 PM
Replies: 26
Views: 4,285
Posted By Brian Bushnell
I have not looked into that yet. Actually, I...

I have not looked into that yet. Actually, I don't even know if we are spiking PhiX into our Novaseq runs, but that rate is worth examining, after I find out whether there is actually any PhiX...
Forum: Bioinformatics 10-11-2017, 12:24 PM
Replies: 1
Views: 488
Posted By Brian Bushnell
I downloaded NA12878 from NIST, and they also...

I downloaded NA12878 from NIST, and they also have validated sets of small variations, but I didn't really find them all that useful. If anyone has validated CNV sets for those it would be NIST. ...
Forum: Illumina/Solexa 10-10-2017, 01:07 AM
Replies: 26
Views: 4,285
Posted By Brian Bushnell
It only works for applications that are not...

It only works for applications that are not sensitive to crosstalk. Personally, I would never multiplex samples of the same genus on a NovaSeq unless all libraries had dual unique barcodes. The...
Forum: Illumina/Solexa 10-09-2017, 05:59 PM
Replies: 26
Views: 4,285
Posted By Brian Bushnell
It's interesting to me that Illumina introduced...

It's interesting to me that Illumina introduced NovaSeq without accompanying adapter kits to enable a high degree of multiplexing. Their current 24-unique-index kit seems targeted at human...
Forum: Illumina/Solexa 10-09-2017, 02:06 PM
Replies: 26
Views: 4,285
Posted By Brian Bushnell
Oh, sorry, I meant OUR HiSeq machines :) Those...

Oh, sorry, I meant OUR HiSeq machines :) Those are 2000/2500/1T. In this specific case I was comparing it to a 2500 run.

To clarify, from isolate random fragment data downsampled to the same...
Forum: Bioinformatics 10-09-2017, 01:57 PM
Replies: 638
Views: 126,250
Posted By Brian Bushnell
Hi Gopo, Yes, I will add that (as an...

Hi Gopo,

Yes, I will add that (as an option). Is that common practice in other variant-callers? Note that callvariants.sh does currently have a "PF" (pass filter) field per sample, but I want to...
Forum: Bioinformatics 10-05-2017, 12:35 PM
Replies: 638
Views: 126,250
Posted By Brian Bushnell
reformat.sh has an option "underscore" which will...

reformat.sh has an option "underscore" which will change whitespace in sequence headers into underscores, if the extra information is important. Alternatively, as Genomax says, you can use...
Forum: Illumina/Solexa 10-05-2017, 12:24 PM
Replies: 26
Views: 4,285
Posted By Brian Bushnell
I don't know anything about the lab issues, but...

I don't know anything about the lab issues, but the sequence quality is good. Coverage exhibits slightly more bias than HiSeq for the same libraries. Using unique dual barcodes and performing...
Forum: General 10-05-2017, 12:19 PM
Replies: 15
Views: 2,626
Posted By Brian Bushnell
RAM is often the limiting factor in...

RAM is often the limiting factor in bioinformatics computing. I would not recommend buying a computer that you plan to use for bioinformatics with only 16 GB RAM unless it will be dedicated to some...
Showing results 1 to 25 of 500

 


All times are GMT -8. The time now is 08:30 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO