SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
PubMed: A biologist's guide to de novo genome assembly using next-generation sequence Newsbot! Literature Watch 0 01-10-2012 12:10 PM
Tablet assembly viewer valei Bioinformatics 2 03-07-2011 07:09 AM
problem of viewing assembly data in Tablet percy Bioinformatics 4 09-15-2010 06:42 AM
Tablet visualization tool for linux seq_GA Bioinformatics 19 06-22-2010 01:35 AM
Tablet Next Generation Sequence Assembly Visualization mbayer Literature Watch 0 01-12-2010 06:43 AM

Reply
 
Thread Tools
Old 03-01-2010, 07:46 AM   #21
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

It should be (i.e. the samtools paper gives the right view):

Code:
NNNNNNNNNNNNNN**NNNNNN
      TTAGATAAAGGATA*CTG
        AGATAA*GGATA
Samtools tview does not support padding (P) and thus fails to give the right view. Like tablet, IGV does not show inserted bases at the moment. It is being implemented so far as I know. I guess Gambit is the only viewer that supports padding and insertion (never tried, though). Gap5 should also work with these operations, but it does not directly work with BAM (am I right?).

As to reads standing out of the reference, we are inclined to claiming them as invalid. But it would be good for a tool to be robust to that.

Last edited by lh3; 03-01-2010 at 07:53 AM.
lh3 is offline   Reply With Quote
Old 03-02-2010, 06:52 AM   #22
GStephen
Junior Member
 
Location: Dundee, UK

Join Date: Mar 2010
Posts: 9
Default

Many thanks for your clarification on these points - and speedy response - lh3. This will be very helpful when it comes to expanding our current support for insertions in SAM/BAM.

Gordon.
GStephen is offline   Reply With Quote
Old 03-04-2010, 07:25 AM   #23
imilne
Member
 
Location: JHI, Dundee, UK

Join Date: Jan 2010
Posts: 68
Default

It's update time again, and most importantly, Tablet now has proper, native support for BAM assemblies!

It's worth reading the new section in the help that specifically covers BAM as it obviously works a little differently to all of the other assembly types that Tablet supports (prepare to meet the "bambambar" )

As for the rest of the changes, they're as follows:

- NEW: Added support for indexed BAM assemblies.
- NEW: Added a new navigation bar for moving the viewing window around within a larger BAM assembly.
- NEW: Implemented background (multicore) thread support for calculating auxiliary display data, meaning large contigs can now be viewed instantly.
- NEW: Various implementation changes mean less memory is now used for large contigs.
- NEW: The Open Assembly dialog is now significantly quicker at detecting the assembly type.
- NEW: The page left/right controls now hide when the edges of the display are reached.
- NEW: Added options to the Preferences dialog to enable/disable disk caching (where supported).
- NEW: Loading a very large number of features is now much faster.
- NEW: The bp coordinates (data set/BAM window and current view) are now shown on the overview images.
- CHG: Updated the mismatch code to ignore regions of reads that extend beyond the left or right ends of the consensus.
- BUG: Fixed a problem with files being ignored when passed to Tablet on the command line.
- BUG: Fixed a bug which caused Tablet to not exit correctly when a filter had been applied in the Contigs Table.
- BUG: Fixed multiple issues with AFG parsing related to clr and gap tags.
- BUG: The unpadded read length was no longer being shown in the popup information box.

A few other points:

- The issues with CIGAR parsing remain (for now).

- We have an Eland parser ready to go, we're just waiting on confirmation of a few things before we release it. (We don't use Eland here, so we have no data files ya'see).

- I'd also like to take this opportunity to thank the Picard/samtools folks for their Java API. Without it, proper BAM support in Tablet would probably never have happened, so we (and hopefully lots of Tablet users too) are very grateful to the developers for making it available.

Iain
imilne is offline   Reply With Quote
Old 03-04-2010, 10:17 AM   #24
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Another feature you may consider is to add FTP support. Most human alignments are sitting on NCBI ftp. If I want to look at a small region for multiple samples, I would need to download alignments, which is a lot. The best way is to directly open BAMs over FTP without downloading the entire file, like what UCSC is doing now. IGV allows you to open alignments on the IGV server. I know someone is thinking to open BAM from amazon/google cloud.

I am just throwing ideas. Do not take them seriously.
lh3 is offline   Reply With Quote
Old 03-10-2010, 05:43 AM   #25
jeny
Member
 
Location: france

Join Date: Mar 2010
Posts: 16
Default

Hi all,

I am working on Illumina GAIIx single read data.
To test Tablet, I used NSTbioinformatics sorted2sam.pl script on my data (s_N_sorted.txt) to generate s_N_sorted.sam (SAM format). That worked fine.
But using s_N_sorted.sam as assembly file and phix.fa as reference file in Tablet, it displayed a window with only reference sequence and 0 read.
The trouble is that in s_N_sorted.sam there are 119,795 reads matched to the reference genome.
There is no info in output.log.
Could you give me some clue to solve this problem ?
Any help appreciate ?

Cheers,
Jennifer
jeny is offline   Reply With Quote
Old 03-10-2010, 05:49 AM   #26
imilne
Member
 
Location: JHI, Dundee, UK

Join Date: Jan 2010
Posts: 68
Default

Quote:
Originally Posted by jeny View Post
Hi all,

I am working on Illumina GAIIx single read data.
To test Tablet, I used NSTbioinformatics sorted2sam.pl script on my data (s_N_sorted.txt) to generate s_N_sorted.sam (SAM format). That worked fine.
But using s_N_sorted.sam as assembly file and phix.fa as reference file in Tablet, it displayed a window with only reference sequence and 0 read.
The trouble is that in s_N_sorted.sam there are 119,795 reads matched to the reference genome.
There is no info in output.log.
Could you give me some clue to solve this problem ?
Any help appreciate ?

Cheers,
Jennifer
0 reads in Tablet probably means that the contig names assigned to each read do not match the contig names that are in the reference file, ie, Tablet has (probably) seen the reads, but doesn't know what to associate them with.

Have a look at the first few lines of the sam file and see if the contig name for each line (3rd column) matches the name(s) of the any of the reference sequence(s) in the fasta file.

Iain
imilne is offline   Reply With Quote
Old 03-10-2010, 07:16 AM   #27
jeny
Member
 
Location: france

Join Date: Mar 2010
Posts: 16
Default

Hi Iain,

you are right. The problem is that s_N_export.txt or s_N_sorted.txt files contain chromosome matched (or genome) and not contig name. So, in fact Tablet can't match chromosome (in export or sorted files) to contig names in reference file. I will have a look and return to you when I find a solution.
Any help appriciate !

Jeny
jeny is offline   Reply With Quote
Old 05-06-2010, 05:22 AM   #28
imilne
Member
 
Location: JHI, Dundee, UK

Join Date: Jan 2010
Posts: 68
Default

Hi folks,

Thought I'd post a quick message to let you know that Tablet has been updated again. It's now at version 1.10.05.06 (special UK election day version or something ). Changes are as follows:

- NEW: Added a new colouring scheme based on read direction/orientation.
- NEW: The overview displays can now be "subsetted", forcing them to show only an overview of whatever region you define for them.
- NEW: Variant highlighting (in red) can now be turned on or off for the overviews or popups.
- NEW: Added the ability to search for sequence substrings within reads.
- NEW: Added a new graphical layer that performs "read shadowing", highlighting any reads under the a given column or mouse position.
- NEW: Example files (linked to on the web) can now be opened directly from within Tablet.
- NEW: The nucleotide text for each base can now be shown or hidden.
- NEW: Added (optional) support for trimming reads in ACE files based on the QA tags.
- CHG: Redesigned and added additional options to the control ribbon.
- CHG: The ACE parser now ignores base quality information, making it more compatible with badly formatted files.
- BUG: The SAM parser now looks for all possible header tags to determine if a file is SAM or not.
- BUG: Fixed various issues with searching for reads.
- BUG: Fixed some miscellaneous rendering issues when switching between contigs.

As usual, it should auto-update the next time you run it, or you can grab the download manually from http://bioinf.scri.ac.uk/tablet

Iain
__________________
Our software: Tablet | Flapjack | Strudel | CurlyWhirly | TOPALi
imilne is offline   Reply With Quote
Old 05-18-2010, 04:37 AM   #29
imilne
Member
 
Location: JHI, Dundee, UK

Join Date: Jan 2010
Posts: 68
Default

Another update today, mainly to fix a few bugs and make things a bit more robust when dealing with BAM files.

- NEW: Implemented a new, hopefully more robust, method of determining the version number for update purposes.
- NEW: Added an option to set Picard's SAMFileReader validation stringency to lenient to suppress SamValidation errors.
- NEW: Added a parse-time option force DNA ambiguity codes to be read as N.
- CHG: Changed the protein translator to treat any base with an N as ok (codon will translate to X) rather than having it skip the base.
- BUG: Fixed a problem copying any of the reverse strand protein translations to the clipboard.
- BUG: Fixed problems relating to Tablet not re-initialising its BAM reader after encountering SamValidation error.

Proper visual support for paired end data (might) be next. Please do let us know what you'd like to see...

Iain
__________________
Our software: Tablet | Flapjack | Strudel | CurlyWhirly | TOPALi
imilne is offline   Reply With Quote
Old 05-19-2010, 09:59 AM   #30
vamin
Junior Member
 
Location: Gainesville, FL

Join Date: May 2010
Posts: 1
Default

Would it be possible to implement some sort of features track to make it easier to see pileups over exons, etc?
vamin is offline   Reply With Quote
Old 05-28-2010, 04:54 AM   #31
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

I was wondering if it would be possible to show IUPAC codes in consensus.
As I frequently use MIRA to assemble EST datasets, I'd prefer the correct IUPAC code over '?' in the consensus ;-)

cheers,
Sven
sklages is offline   Reply With Quote
Old 05-28-2010, 05:55 AM   #32
imilne
Member
 
Location: JHI, Dundee, UK

Join Date: Jan 2010
Posts: 68
Default

Quote:
Originally Posted by vamin View Post
Would it be possible to implement some sort of features track to make it easier to see pileups over exons, etc?
We actively working on that, but it's tricky and time consuming unfortunately, and there isn't much demand for it from people in my group. Tablet is an assembly viewer, and to turn it into a genome browser (which I think it would have to be to display the features properly) isn't something we can really commit to.

Attached, however, is a sneak preview of what we do have working in one of the builds. At best it'll be one feature "type" per track. If stuff overlaps then that'll be tough luck - for this version at least

Iain
Attached Images
File Type: png features.png (5.5 KB, 52 views)
__________________
Our software: Tablet | Flapjack | Strudel | CurlyWhirly | TOPALi
imilne is offline   Reply With Quote
Old 05-28-2010, 06:03 AM   #33
imilne
Member
 
Location: JHI, Dundee, UK

Join Date: Jan 2010
Posts: 68
Default

Quote:
Originally Posted by sklages View Post
I was wondering if it would be possible to show IUPAC codes in consensus.
As I frequently use MIRA to assemble EST datasets, I'd prefer the correct IUPAC code over '?' in the consensus ;-)
In the short term, it's unlikely. One of Tablet's original optimizations was to create its own internal alphabet of supported characters. This alphabet was purposely designed to be tiny, mainly so it wouldn't use up much memory. All raw data is translated into a compressed bytestream version of this, allowing us to display data very very quickly. When we moved to caching data on disk, we thought maybe it wouldn't be needed any more, but it turns out it was still useful because the amount of data we need to read and write to disk is subsequently much smaller too (and therefore aids creation/retrieval time), meaning the overhead of using a hard drive over memory isn't that bad.

So for now, we're locked into the way we currently do things, with an alphabet that can't support much more than the basic set of A, C, G, T, and some others we use to help speed up rendering. Ultimately it's currently limited to 16 characters (2^4).

I'm told (correct me if I'm wrong) that MIRA is one of few assemblers that even outputs ambiguity codes, and even then it can be told not to?

Iain
__________________
Our software: Tablet | Flapjack | Strudel | CurlyWhirly | TOPALi
imilne is offline   Reply With Quote
Old 05-28-2010, 11:38 AM   #34
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Quote:
Originally Posted by imilne View Post
I'm told (correct me if I'm wrong) that MIRA is one of few assemblers that even outputs ambiguity codes, and even then it can be told not to?
Iain
Yes, it is one of the assemblers that does ... but that's not a drawback but a feature, especially for denovo cDNAs assemblies without reference ...

thanks,
Sven
sklages is offline   Reply With Quote
Old 06-07-2010, 02:43 AM   #35
agc
Member
 
Location: Jerusalem

Join Date: May 2010
Posts: 26
Default SAMSequenceDictionary

Hi,

While attempting to view a sorted BAM file (that has been indexed) with a fasta reference on Tablet, I receive the following error:

java.lang.IllegalArgumentException: Cannot add sequence that already exists in SAMSequence dictionary

Any ideas on what the problem might be?
agc is offline   Reply With Quote
Old 06-07-2010, 03:47 AM   #36
imilne
Member
 
Location: JHI, Dundee, UK

Join Date: Jan 2010
Posts: 68
Default

Quote:
Originally Posted by agc View Post
Hi,

While attempting to view a sorted BAM file (that has been indexed) with a fasta reference on Tablet, I receive the following error:

java.lang.IllegalArgumentException: Cannot add sequence that already exists in SAMSequence dictionary

Any ideas on what the problem might be?
That's not one of our errors, so it must be from Picard, the API we use to read BAM files. It probably means there's something wrong with the underlying BAM file itself, or something about it that Picard can't handle. You might be able to confirm by trying one of the other Java viewers - if you get the same error, it'll be the file; if you don't, then maybe it is Tablet that's at fault.

Someone on the Picard or samtools mailing lists may also be able to help further (or will hopefully read this)...

Iain
__________________
Our software: Tablet | Flapjack | Strudel | CurlyWhirly | TOPALi
imilne is offline   Reply With Quote
Old 06-19-2010, 11:34 PM   #37
agc
Member
 
Location: Jerusalem

Join Date: May 2010
Posts: 26
Default

Yes, there was a problem with the BAM file - several sequence names were left blank (due to my reference fasta file including a space between the '>' sign and the sequence name - IE '> chr07' instead of '>chr07'), and therefore were seen as duplicates.

Thanks!
agc is offline   Reply With Quote
Old 07-15-2010, 11:49 PM   #38
Adamo
Member
 
Location: Paris

Join Date: Jun 2010
Posts: 28
Default

Hi,

Tablet gives me the following error message while trying to load a sorted bam file and its corresponding genome:

java.lang.Exception: java.lang.RuntimeException: SAM validation error: ERROR: Record 9187, Read name SRR033843.58925, Zero-length read without CS or CQ tag

Do you know where it can come from?
I've aligned reads with Blat, used pls2sam and processed the file with samtools.

I wonder why *this* read in particular...?
Adamo is offline   Reply With Quote
Old 07-15-2010, 11:56 PM   #39
imilne
Member
 
Location: JHI, Dundee, UK

Join Date: Jan 2010
Posts: 68
Default

Quote:
Originally Posted by Adamo View Post
SAM validation error: ERROR: Record 9187, Read name SRR033843.58925, Zero-length read without CS or CQ tag

Do you know where it can come from?
I'll leave this one for the assembler/SAM experts.

With Tablet we've always felt it best to leave these kinds of errors in rather than have it ignore the reads. That way you can be informed when there's something (potentially) not right with the data.

Iain
__________________
Our software: Tablet | Flapjack | Strudel | CurlyWhirly | TOPALi
imilne is offline   Reply With Quote
Old 07-16-2010, 12:03 AM   #40
Adamo
Member
 
Location: Paris

Join Date: Jun 2010
Posts: 28
Default

Thank you anyway.

This problem happens with all the reads, not only the one mentionned. It may be psl2sam or Blat which causes this error.
Adamo is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:05 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO