SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
PubMed: A biologist's guide to de novo genome assembly using next-generation sequence Newsbot! Literature Watch 0 01-10-2012 12:10 PM
Tablet assembly viewer valei Bioinformatics 2 03-07-2011 07:09 AM
problem of viewing assembly data in Tablet percy Bioinformatics 4 09-15-2010 06:42 AM
Tablet visualization tool for linux seq_GA Bioinformatics 19 06-22-2010 01:35 AM
Tablet Next Generation Sequence Assembly Visualization mbayer Literature Watch 0 01-12-2010 06:43 AM

Reply
 
Thread Tools
Old 01-12-2010, 06:18 AM   #1
imilne
Member
 
Location: JHI, Dundee, UK

Join Date: Jan 2010
Posts: 68
Default Tablet - Next Generation Sequence Assembly Visualization

Hi all,

Some of you may be aware of this already, but here at SCRI we've been working on producing a visualization tool for NGS assembly data called Tablet. Our main goal was to create something that's not only fast and provides nice visualizations, but also be both easy to use and easy to install.

Tablet was designed to aid our work here, which is primarily on plant data, but we've had some success loading in some human stuff too (just for fun obviously!).

We've attempted to get working parsers for a wide range of common formats (ace, afg, maq, soap) and we currently have experimental support for sam too (until we can get the Picard API working properly).

Please give it a try, and let us know what you think, either here, or by emailing us directly (there's an option within Tablet itself to do this). It's available in both 32 and 64bit versions, for all the usual suspects (Windows, OS X, Linux, Solaris), and can be downloaded from the following URL:

http://bioinf.scri.ac.uk/tablet

We also have a paper in advance (open) access with Bioinformatics, which is linked to from the above.

Tablet is still very much work in progress though, so do feel free to suggest any enhancements and improvements you'd like to see.

Iain

(ps: I think these attached pictures are going to be tiny, but the full versions are on the web site)
Attached Images
File Type: jpg tablet1.jpg (19.4 KB, 533 views)
File Type: jpg tablet2.jpg (19.1 KB, 245 views)
File Type: jpg tablet3.jpg (19.8 KB, 204 views)
File Type: jpg tablet4.jpg (19.5 KB, 134 views)
File Type: jpg tablet5.jpg (19.8 KB, 125 views)

Last edited by imilne; 01-12-2010 at 07:36 AM.
imilne is offline   Reply With Quote
Old 01-12-2010, 09:02 AM   #2
Zigster
(Jeremy Leipzig)
 
Location: Philadelphia, PA

Join Date: May 2009
Posts: 116
Default

this is very slick and unlike every other short-read viewer I have ever used it worked right out of the box
__________________
--
Jeremy Leipzig
Bioinformatics Programmer
--
My blog
Twitter
Zigster is offline   Reply With Quote
Old 01-12-2010, 10:03 AM   #3
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

For human data, it would be essential to support BAM natively. By "natively", I mean to directly load BAM alignment without converting it to other formats. Most human alignments are available in BAM on NCBI ftp. It is awkward and inefficient to convert a compressed file of >100GB. In addition, tablet seems not supporting insertion in SAM/BAM probably because unlike ACE/CAF, the viewer must be able to compute padding by itself.

Except these, tablet looks great!
lh3 is offline   Reply With Quote
Old 01-12-2010, 10:46 PM   #4
maasha
Senior Member
 
Location: Denmark

Join Date: Apr 2009
Posts: 153
Default

How does it compare to magicviewer:

http://bioinformatics.zj.cn/magicviewer/

?

Martin
maasha is offline   Reply With Quote
Old 01-12-2010, 11:49 PM   #5
imilne
Member
 
Location: JHI, Dundee, UK

Join Date: Jan 2010
Posts: 68
Default

Quote:
Originally Posted by lh3 View Post
For human data, it would be essential to support BAM natively. By "natively", I mean to directly load BAM alignment without converting it to other formats. Most human alignments are available in BAM on NCBI ftp. It is awkward and inefficient to convert a compressed file of >100GB. In addition, tablet seems not supporting insertion in SAM/BAM probably because unlike ACE/CAF, the viewer must be able to compute padding by itself.

Except these, tablet looks great!
Yeah, we're aware of the issues with BAM. We've been having trouble getting the Java API for SAM/BAM working properly (which is why we knocked out a quick and basic SAM-only parser). We're not sure yet what the problem is: Picard might be too strict on the files it accepts, some of the assemblers might not be producing valid files, we might not be using the API correctly, etc. To complicate matters further, we don't actually have anybody here at SCRI generating data in SAM/BAM format either.

Things have been looking a bit more hopeful this week though, so maybe we'll have it working soon...
imilne is offline   Reply With Quote
Old 01-31-2010, 12:28 AM   #6
imilne
Member
 
Location: JHI, Dundee, UK

Join Date: Jan 2010
Posts: 68
Default

We now have a new release out - 1.10.01.28 - with the following new, changed, or fixed features:

- NEW: Added a right-click "save a summary of read information (per contig)" option to the contigs table.
- NEW: Tablet can now read from http streams (supported on the command line and the Open Assembly dialog).
- NEW: Tablet can now read from compressed gzip data streams (http or file).
- NEW: Features outside the scope of the current contig are now highlighted in red in the Features Table.
- NEW: Features with a Name= tag in a GFF3 file will now show this name in the Features Table.
- NEW: Added an option to toggle on or off the use of regular expressions in searches.
- CHG: The Importing Assembly dialog now shows its progress in MB/s too.
- CHG: Removed support for consensus tags in ACE files. GFF3 formatted files are now the only way to import features.
- BUG: Loading the same features file twice will no longer result in duplicate entries.

Support for BAM is also progressing well, and should be available in the next release if all goes to plan.

Iain
imilne is offline   Reply With Quote
Old 02-08-2010, 02:15 AM   #7
imilne
Member
 
Location: JHI, Dundee, UK

Join Date: Jan 2010
Posts: 68
Default

Another update out today - 1.10.02.08

- NEW: Added (experimental) support for BAM, using in-memory only reading for now, NOT indexed support.
- NEW: Support for most CIGAR operations, with the positions of Insertions being tagged as features for now.
- NEW: Displayed consensus sequences now use 8 bytes per base less memory.
- NEW: Added a (prefs-file only) option to disable all disk caching for maximum performance at the expense of memory.
- NEW: The Contigs Table now lists read vs consensus mismatch percentage information.
- CHG: Some miscellaneous changes to cache support to speed up read retrieval.
- BUG: Fixed a memory-leak problem with display-time objects not being removed after closing a contig.
- BUG: Tablet now only deletes the cache files it creates, rather than every file in its cache directory.

This is the first version of Tablet to support BAM, but please note that it is just experimental support. We are currently reading BAM by parsing the entire file (as we do with all other assembly formats within Tablet). This means a large multi-GB file is read entirely by Tablet before display. Although this lets you work with BAM files, it's not ideal, so the next step is to get proper indexed BAM support working. We think we can do this without affecting how Tablet works or the number of assembly formats that it supports.

Iain
imilne is offline   Reply With Quote
Old 02-08-2010, 04:54 AM   #8
NSTbioinformatics
Member
 
Location: netherlands

Join Date: Apr 2009
Posts: 24
Default

Tablet can not open Eland aligment file s_*_sorted.txt. It is pity!!!

If Tablet can handle more format of input files, it would be very nice.

By the way, is there any tool could convert s_*_sorted.txt eland alignment file to format which Tablet can read? I failed to use MAQ to convert export2maq.
NSTbioinformatics is offline   Reply With Quote
Old 02-08-2010, 05:07 AM   #9
imilne
Member
 
Location: JHI, Dundee, UK

Join Date: Jan 2010
Posts: 68
Default

Quote:
Originally Posted by NSTbioinformatics View Post
Tablet can not open Eland aligment file s_*_sorted.txt. It is pity!!!

If Tablet can handle more format of input files, it would be very nice.
Tablet already handles ACE, AFG, SOAP, MAQ, SAM, BAM, FASTA, FASTQ, and GFF files. That's more than a lot of software out there. Of course, there's always room for more, but there's also only so many hours in the day. If you can send me (or point me in the direction) of details on the specification of the format though, we'll see what we can do about it.

Quote:
Originally Posted by NSTbioinformatics View Post
By the way, is there any tool could convert s_*_sorted.txt eland alignment file to format which Tablet can read? I failed to use MAQ to convert export2maq.
I've seen messages talking about converting it into sam, so perhaps samtools can manage?

Iain
imilne is offline   Reply With Quote
Old 02-08-2010, 05:47 AM   #10
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,168
Default

Quote:
Originally Posted by NSTbioinformatics View Post
Tablet can not open Eland aligment file s_*_sorted.txt. It is pity!!!
According to the latest Illumina documentation ("CASAVA Software 1.6 User Guide", which also covers GERALD) the s_N_sorted.txt format is being deprecated so you probably shouldn't be basing your pipeline around it. Sorry to be the bearer of bad news.
kmcarr is offline   Reply With Quote
Old 02-08-2010, 06:26 AM   #11
NSTbioinformatics
Member
 
Location: netherlands

Join Date: Apr 2009
Posts: 24
Default

Some alignment was done by eland (previous version), We are happy for the alignment and would like to view the alignment. Thus we still need to translate the format for viewing by tablet or use other tools.

SAMtool may work. I will try it.
NSTbioinformatics is offline   Reply With Quote
Old 02-08-2010, 06:48 AM   #12
NSTbioinformatics
Member
 
Location: netherlands

Join Date: Apr 2009
Posts: 24
Default

Actually the format of s_N_sorted.txt is same to the format of s_N_export.txt file. Just s_N_sorted.txt contain the reads passed purity filitering and with a unique alignment. I think it is very useful for us. Otherwise, we have to pasor S_N_export.txt to fetch these reads.
NSTbioinformatics is offline   Reply With Quote
Old 02-08-2010, 07:00 AM   #13
NSTbioinformatics
Member
 
Location: netherlands

Join Date: Apr 2009
Posts: 24
Default

The format of s_N_sorted.txt and s_N_export.txt is for example:
HW201 91113 5 100 1124 1381 0 1 TTTATCAAGATAATTTTTCGACTCATCAGAAATATCCGAAAGTGTTAACTTCTGCGTCATGGAAGCGATAAAACTC abbbbbbbbbabbbabbabbababaaaaa\aaaaaaaaa``aTa^a`a_a____a
`_]^^^PY_]RV^UL[BBBBB phiv2.fa 1 R 76 394 645 5246 F
*****
machinename\tRunNumber\tLane\tTile\tX coordinate of cluster\t Y coordinate of cluster\t index value\t Read number\tRead\t QualityString\tMatch chromosome\t MatchContig\tNatcgPosition\tMatch strand(F\R)\tMatch Descriptor\tSingle-Read alignment score\t paired-read alignmnet score\t partner chromosome (for second pair only)\t partner contig (for second pair)\t partner offset (for second pair)\t partner Strand (the sconed pair)\tFiltering (Y/N)\n

*****************
If imilne could make tablet read eland format, it would be very useful for us, also for users using eland for alignment

We are looking forward to the update of tablet for that.
NSTbioinformatics is offline   Reply With Quote
Old 02-10-2010, 08:48 PM   #14
orcy
Junior Member
 
Location: Brisbane

Join Date: Jan 2010
Posts: 8
Default

Any chance for some way to change the scales on the coverage plots.

I really like this browser, and can load enough reads from a SAM to keep me happy, but there just seem to be too few "tweaks" available for the viewer.

certainly much more memory efficient than IGV in my hands.

cheers
orcy is offline   Reply With Quote
Old 02-11-2010, 07:07 AM   #15
imilne
Member
 
Location: JHI, Dundee, UK

Join Date: Jan 2010
Posts: 68
Default

Quote:
Originally Posted by orcy View Post
Any chance for some way to change the scales on the coverage plots.
Yep, that's something we're actively working on. It might not appear for a version or two, but it's definitely on the drawing board.

Iain
imilne is offline   Reply With Quote
Old 02-12-2010, 04:56 AM   #16
NSTbioinformatics
Member
 
Location: netherlands

Join Date: Apr 2009
Posts: 24
Default

Finally, i modified export2sam.pl in samtool package to a new script of sorted2sam.pl to process s_N_sorted.txt (eland alignment). It works well.
Now i am able to use Tablet tool to view my alignment. It looks cool.

Two features i would like to have:
1. It is better that the base rules will show the number, although the mouse shows where is it.
2. if we could sort reads by readname, it would be very nice.
We will see much better of SNPs between samples if reads from the same sample could be ordered together.

By the way, if someone wants sorted2sam.pl, please email me:jifeng.tang@keygene.com
Note: I have tested on single read alignment and not paired end read alignment.
NSTbioinformatics is offline   Reply With Quote
Old 02-23-2010, 10:06 AM   #17
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Someone was complaining to me that he could not see insertions reported by bwa-sw. I have not tried though. Does tablet show insertions?
lh3 is offline   Reply With Quote
Old 02-23-2010, 11:35 PM   #18
imilne
Member
 
Location: JHI, Dundee, UK

Join Date: Jan 2010
Posts: 68
Default

Quote:
Originally Posted by lh3 View Post
Someone was complaining to me that he could not see insertions reported by bwa-sw. I have not tried though. Does tablet show insertions?
CIGAR insertions?

If so, then the answer is "kind of". In the current version, Tablet will tag them as features and list their locations in the Features Table for the contig in question. Why? Because we don't really know how to handle them properly (and would certainly welcome any insights) but this seemed like the best short-term solution. It's also why we state support for sam/bam in Tablet is still experimental.

We did fire off a few emails to folks we hoped could help, including to the samtools mailing list, but so far no one has replied (we're isolated enough up here on the NE coast of Scotland as it is ). I'll ask Gordon (our other Tablet programmer) to post what the issues were again though...

Iain

Last edited by imilne; 02-23-2010 at 11:51 PM.
imilne is offline   Reply With Quote
Old 02-24-2010, 12:49 AM   #19
NSTbioinformatics
Member
 
Location: netherlands

Join Date: Apr 2009
Posts: 24
Default

Li Heng is right. Tablet does not show insertions. As least it happened on my dataset.
Also both the soft clipping and hard clipping do not show in Tablet. I would prefer to show them in small letters or letter with light color.

I still think Tablet, is a good viewer, which is faster and user-friendly... It needs much more work and efforts to make it better.
NSTbioinformatics is offline   Reply With Quote
Old 03-01-2010, 07:39 AM   #20
GStephen
Junior Member
 
Location: Dundee, UK

Join Date: Mar 2010
Posts: 9
Default

I'm the other Tablet programmer that imilne mentioned. As imilne said I had previously asked for clarification on how to handle inserts in SAM/BAM on the samtools mailing list, but I got no response there. It would be good if lh3, or somebody else who is relatively close to the SAM/BAM project could clarify the way in which CIGAR insertions should be handled.

Given the following example (adapted from the SAM/BAM format paper) what is the correct output? I've kept the example simple to save space.

Code:
ref  - AGCATGTTAGATAAGATAGCTGTGCTAGTAGGCAGTCAGCGCCATGGAT

SAM file:
@HD	VN:1.0	SO:Sorted
@SQ	SN:ref	LN:60
r001	0	ref	7	30	8M2I4M1D3M	*	0	0	TTAGATAAAGGATACTG	!!!!!!!!!!!!!!!!!
r002	0	ref	9	30	3S6M1P1I4M	*	0	0	AAAAGATAAGGATA	!!!!!!!!!!!!!!
The SAM format paper proposes that the result should look something like the following:
Code:
ref  - AGCATGTTAGATAA**GATAGCTGTGCTAGTAGGCAGTCAGCGCCATGGAT

r001 -       TTAGATAAAGGATA*CTG
r002 -      aaaAGATAA*GGATA
Whereas Samtools tview 0.1.7 on Windows produces (change marked in bold):
Code:
ref  - AGCATGTTAGATAA**GATAGCTGTGCTAGTAGGCAGTCAGCGCCATGGAT

r001 -       TTAGATAAAGGATA*CTG
r002 -      aaaAGATAA**GATA
And IGV produces:
Code:
ref  - AGCATGTTAGATAAGATAGCTGTGCTAGTAGGCAGTCAGCGCCATGGAT

r001 -       TTAGATAA|GATA*CTG
r002 -      aaaAGATAA|GATA
Where | represents an insertion marker. Note also that the reference has not had the two * characters inserted.

I could list a few more viewers as there seem to be subtle differences between the way CIGAR insertions are handled between all viewers.

Would it be possible for somebody to clarify which of the above is the intended outcome. We would really appreciate a clarification so that CIGAR insertions can be displayed as intended in the SAM/BAM format. As an interim measure we have a system in Tablet whereby we mark the insertions as features; which can be navigated to via our features table. Ideally we would like to display the insertions in whichever manner is deemed correct by those who work on the SAM/BAM format. We suspect the correct way to display CIGAR insertions is either the way seen in the SAM format paper, or the way seen in Samtools tview, we would really appreciate confrimation of this though.


On a related note, what is the position on reads which extend beyond the reference sequence at the start or end of the sequence. The Picard API in particular doesn't seem to like reads which either start before the reference sequence, or end after it. Are these reads valid in SAM/BAM or not?

Gordon.
GStephen is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:24 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO