SEQanswers

Go Back   SEQanswers > Search Forums


Showing results 1 to 25 of 500
Search took 0.21 seconds.
Search: Posts Made By: Brian Bushnell
Forum: Bioinformatics 04-24-2018, 08:42 AM
Replies: 5
Views: 756
Posted By Brian Bushnell
Oh, yep - it's fixed now, sorry for not...

Oh, yep - it's fixed now, sorry for not mentioning that!
Forum: Bioinformatics 04-18-2018, 09:46 AM
Replies: 651
Views: 133,940
Posted By Brian Bushnell
The coordinate system varies across platforms;...

The coordinate system varies across platforms; the physical distance of 1 pixel is much larger on HiSeq 2500 than HiSeq 3000/4000m for example.

NovaSeq has unique problems, though. The distance...
Forum: Bioinformatics 04-17-2018, 04:38 PM
Replies: 651
Views: 133,940
Posted By Brian Bushnell
Not exactly. reformat.sh has a "mappedonly"...

Not exactly. reformat.sh has a "mappedonly" flag, but that would only keep the mapped reads. However, you can use requiredbits and filterbits in 2 passes, like this:

reformat.sh in=data.sam...
Forum: Bioinformatics 04-17-2018, 04:31 PM
Replies: 651
Views: 133,940
Posted By Brian Bushnell
MAPQ is a measure of the probability (estimate)...

MAPQ is a measure of the probability (estimate) that the mapping location is correct. This can factor in various things, including the number of mismatches... but, for example, with these very...
Forum: Bioinformatics 04-11-2018, 03:45 PM
Replies: 126
Views: 47,213
Posted By Brian Bushnell
@chloe - It's normally simplest and most...

@chloe - It's normally simplest and most effective to do QC first on the raw data, then anything else (such as merging) later.

@silask - they way you are doing it is currently the most effective...
Forum: Bioinformatics 04-04-2018, 09:10 AM
Replies: 5
Views: 756
Posted By Brian Bushnell
Hi Greg, Sorry about that, there was a bug...

Hi Greg,

Sorry about that, there was a bug that slipped in around v37.87 with regards to multisample VCF names. It's fixed in v37.96 which I will release this week.

-Brian
Forum: Bioinformatics 03-21-2018, 03:24 PM
Replies: 651
Views: 133,940
Posted By Brian Bushnell
The problem here is that minimap uses old-style...

The problem here is that minimap uses old-style cigar strings (M symbol instead of = and X) and also does not produce MD tags. I've added the ability to handle reads in that situation and it will be...
Forum: Bioinformatics 03-21-2018, 03:21 PM
Replies: 212
Views: 58,117
Posted By Brian Bushnell
While BBMap is not originally designed for this...

While BBMap is not originally designed for this purpose; I made a version that does a much better job at finding all mappings above some identity threshold, bbmapskimmer.sh. The usage is the same as...
Forum: Bioinformatics 03-21-2018, 03:18 PM
Replies: 212
Views: 58,117
Posted By Brian Bushnell
BBMerge might be able to help in this case, if...

BBMerge might be able to help in this case, if you have paired reads with a sufficient number of short inserts. You can run it like this:

bbmerge.sh in1=read1.fq in2=read2.fq outa=adapters.fa
...
Forum: Bioinformatics 03-21-2018, 03:13 PM
Replies: 651
Views: 133,940
Posted By Brian Bushnell
This means 72 percent of the reads mapped with an...

This means 72 percent of the reads mapped with an "N" symbol in the match string, an internal data structure similar to a cigar string. The "N" symbol denotes either an N in the read or an N in the...
Forum: Bioinformatics 03-20-2018, 09:07 AM
Replies: 25
Views: 15,786
Posted By Brian Bushnell
Reformat won't do that, but you can use...

Reformat won't do that, but you can use partition.sh:

partition.sh in=X.fa out=X%.fa ways=10

That will produce 10 output files with an equal number of sequences and no duplication.
Forum: Bioinformatics 01-04-2018, 10:27 AM
Replies: 12
Views: 2,327
Posted By Brian Bushnell
I concur; 17 is really too short for this...

I concur; 17 is really too short for this purpose. When trying to estimate genome size, it's important for most of the kmers to be unique (aside from long perfect repeats); so, kmer lengths greater...
Forum: Bioinformatics 10-13-2017, 10:14 AM
Replies: 97
Views: 21,503
Posted By Brian Bushnell
Hi Gopo, I don't particularly recommend...

Hi Gopo,

I don't particularly recommend Tadpole for diploid (or higher) genomes, as it has absolutely no capability of dealing with heterozygous sites. However, it's really fast, so even with a...
Forum: Bioinformatics 10-11-2017, 05:52 PM
Replies: 126
Views: 47,213
Posted By Brian Bushnell
As GenoMax says, trimming to Q30 is not...

As GenoMax says, trimming to Q30 is not beneficial before merging reads. BBMerge has some internal quality-trimming options, so it can try to merge, then quality-trim if it is unsuccessful, then try...
Forum: Bioinformatics 10-11-2017, 01:43 PM
Replies: 62
Views: 19,136
Posted By Brian Bushnell
Actually, "nodisk" does not work with BBSplit... ...

Actually, "nodisk" does not work with BBSplit... sorry! I'll clarify that in the documentation. It's not like it's impossible to make it work, but it would be pretty complicated; one of those...
Forum: Bioinformatics 10-11-2017, 12:48 PM
Replies: 126
Views: 47,213
Posted By Brian Bushnell
Hi Ashu, "Ambiguous" means there are...

Hi Ashu,

"Ambiguous" means there are multiple possible overlaps. For example, if read 1 and read 2 both end with "ACACACACACACACACACACAC", there are lots of possible overlap frames, none of which...
Forum: Illumina/Solexa 10-11-2017, 12:29 PM
Replies: 26
Views: 4,832
Posted By Brian Bushnell
I have not looked into that yet. Actually, I...

I have not looked into that yet. Actually, I don't even know if we are spiking PhiX into our Novaseq runs, but that rate is worth examining, after I find out whether there is actually any PhiX...
Forum: Bioinformatics 10-11-2017, 12:24 PM
Replies: 1
Views: 542
Posted By Brian Bushnell
I downloaded NA12878 from NIST, and they also...

I downloaded NA12878 from NIST, and they also have validated sets of small variations, but I didn't really find them all that useful. If anyone has validated CNV sets for those it would be NIST. ...
Forum: Illumina/Solexa 10-10-2017, 01:07 AM
Replies: 26
Views: 4,832
Posted By Brian Bushnell
It only works for applications that are not...

It only works for applications that are not sensitive to crosstalk. Personally, I would never multiplex samples of the same genus on a NovaSeq unless all libraries had dual unique barcodes. The...
Forum: Illumina/Solexa 10-09-2017, 05:59 PM
Replies: 26
Views: 4,832
Posted By Brian Bushnell
It's interesting to me that Illumina introduced...

It's interesting to me that Illumina introduced NovaSeq without accompanying adapter kits to enable a high degree of multiplexing. Their current 24-unique-index kit seems targeted at human...
Forum: Illumina/Solexa 10-09-2017, 02:06 PM
Replies: 26
Views: 4,832
Posted By Brian Bushnell
Oh, sorry, I meant OUR HiSeq machines :) Those...

Oh, sorry, I meant OUR HiSeq machines :) Those are 2000/2500/1T. In this specific case I was comparing it to a 2500 run.

To clarify, from isolate random fragment data downsampled to the same...
Forum: Bioinformatics 10-09-2017, 01:57 PM
Replies: 651
Views: 133,940
Posted By Brian Bushnell
Hi Gopo, Yes, I will add that (as an...

Hi Gopo,

Yes, I will add that (as an option). Is that common practice in other variant-callers? Note that callvariants.sh does currently have a "PF" (pass filter) field per sample, but I want to...
Forum: Bioinformatics 10-05-2017, 12:35 PM
Replies: 651
Views: 133,940
Posted By Brian Bushnell
reformat.sh has an option "underscore" which will...

reformat.sh has an option "underscore" which will change whitespace in sequence headers into underscores, if the extra information is important. Alternatively, as Genomax says, you can use...
Forum: Illumina/Solexa 10-05-2017, 12:24 PM
Replies: 26
Views: 4,832
Posted By Brian Bushnell
I don't know anything about the lab issues, but...

I don't know anything about the lab issues, but the sequence quality is good. Coverage exhibits slightly more bias than HiSeq for the same libraries. Using unique dual barcodes and performing...
Forum: General 10-05-2017, 12:19 PM
Replies: 15
Views: 3,610
Posted By Brian Bushnell
RAM is often the limiting factor in...

RAM is often the limiting factor in bioinformatics computing. I would not recommend buying a computer that you plan to use for bioinformatics with only 16 GB RAM unless it will be dedicated to some...
Showing results 1 to 25 of 500

 


All times are GMT -8. The time now is 07:08 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO