Seqanswers Leaderboard Ad

**Michael.James.Clark** · 01-28-2010, 05:10 PM

Thanks for posting this, Nils.

I look forward to any comments about our work.

**krobison** · 02-05-2010, 07:02 AM

Nils & Michael:

I've been meaning to write something longer on my blog, but it seems to be stuck on the procrastination non-express.

It would appear you were much more successful at finding short indels than the Sanger paper which used SOLiD -- in particular their automated pipeline failed to find a known oncogenic 2-nt deletion.

How much of the credit do you think is due to BFAST and how much from longer read lengths (2x50 vs 2x25)? Any other factors?

Keith R.

**Michael.James.Clark** · 02-05-2010, 10:28 AM

Hi Keith,

Glad the paper piqued your interest. Both of the reasons you came up with increased our sensitivity to indels. I'd also suggest that our relatively high coverage can also be in part credited with our success identifying small indels.

BFAST in particular is quite sensitive to indels. You may also have noticed that we were able to detect some relatively large indels (up to 21 bases in length). BFAST was able to correctly align over these events.

You can read the BFAST paper (check the supplemental materials--it explains in great detail) for some enlightenment about why it's able to do this perhaps better than alternative aligners.

Michael

**NGSfan** · 02-08-2010, 03:46 AM

Thanks for posting it - this is a very exciting paper for those of us working on cancer cell line sequencing! Especially since it is using BFAST, an aligner that we are very keen on using for somatic mutation detection.

From this paper I have also gotten interested in the SeqWare pipeline - I see that the pipeline is capable of automated annotation of gene mutations (eg - calling SNVs & indels and assigning them a description of "frameshift, start-codon loss, etc" ). And this with UCSC KnownGene tables ! Looking forward to trying it.

Lastly, I would like to practice my alignments and analysis on the exon captured illumina sequencing data, but I am a bit confused about which of the three datasets is the one with the Illumina exon capture?

Whole Genome Sequencing of U87MG Glioblastoma Cell Line by SOLiD - SRA - NCBI

http://www.ncbi.nlm.nih.gov/sites/entrez?db=sra&term=SRX015657&report=full

**Michael.James.Clark** · 02-08-2010, 09:30 AM

Originally posted by NGSfan View Post

Thanks for posting it - this is a very exciting paper for those of us working on cancer cell line sequencing! Especially since it is using BFAST, an aligner that we are very keen on using for somatic mutation detection.

From this paper I have also gotten interested in the SeqWare pipeline - I see that the pipeline is capable of automated annotation of gene mutations (eg - calling SNVs & indels and assigning them a description of "frameshift, start-codon loss, etc" ). And this with UCSC KnownGene tables ! Looking forward to trying it.

Lastly, I would like to practice my alignments and analysis on the exon captured illumina sequencing data, but I am a bit confused about which of the three datasets is the one with the Illumina exon capture?

http://www.ncbi.nlm.nih.gov/sites/en...57&report=full

Yeah, the SeqWare database is really interesting and useful and I really encourage you to play around with it.

As for the Illumina pull-down data, looks like it hasn't been uploaded to SRA yet! Those three sets are the SOLiD data.

For resources from the paper, I strongly suggest anyone interested go look at http://genome.ucla.edu/U87 because there are many useful links including direct links to variant files.

**NGSfan** · 02-10-2010, 06:33 AM

Thanks for reminding me to check out the paper's webpage - that has a lot of useful info!

Please let us know when the Illumina reads come out.

Btw - I didn't catch it in the paper, but as part of the variant detection, did you guys try recalibrating the quality scores using the GATK software? They make some pretty convincing arguments how this improves variant calls. If I get the Illumina data to practice on, I will try recalibration with GATK and see what happens out of curiousity.

**Michael.James.Clark** · 02-10-2010, 10:15 AM

Everything we did is described in the paper.

I have heard like you have that GATK is good for variant calling, though, and we are looking into it for current projects.

**Michael.James.Clark** · 03-02-2010, 01:43 PM

Originally posted by NGSfan View Post

Please let us know when the Illumina reads come out.

Just FYI, the Illumina Exon Pull-Down Data is now available on the U87MG page:

https://secure.genome.ucla.edu/index.php/U87#Exon_Pull-down_Data

The BAM file was aligned as described in the paper (using BFAST). The raw FASTQ is also provided.

There's been a lot of demand for that data in particular.

**NGSfan** · 03-03-2010, 05:22 PM

Excellent!! Thank you kindly for the update! I'm looking forward to practicing pair end alignments with the BFAST program! It will give me a head start on our own data.

**Chipper** · 03-26-2010, 12:25 AM

Nils,

I am trying to figure out wich BFAST settings you used and how the alignments were filtered. In the paper you write "We choose the “best scoring” alignment, accepting an alignment only if it was at least the equivalent edit distance of two color errors away from the next best alignment", is this the same as the -A 2 or A 3 option?

**nilshomer** · 03-26-2010, 09:21 AM

Originally posted by Chipper View Post

Nils,

I am trying to figure out wich BFAST settings you used and how the alignments were filtered. In the paper you write "We choose the “best scoring” alignment, accepting an alignment only if it was at least the equivalent edit distance of two color errors away from the next best alignment", is this the same as the -A 2 or A 3 option?

We used "-A 3" in "bfast postprocess", then a minimum mapping quality of 20 assuming you left "-q" in "bfast localalign" as the default.

Nils

Topics	Statistics	Last Post
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 7 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 59 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 50 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM

Seqanswers Leaderboard Ad

U87MG Decoded: The Genomic Sequence of a Cytogenetically Aberrant Human Cancer Cell L

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News