Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Finally, i modified export2sam.pl in samtool package to a new script of sorted2sam.pl to process s_N_sorted.txt (eland alignment). It works well.
    Now i am able to use Tablet tool to view my alignment. It looks cool.

    Two features i would like to have:
    1. It is better that the base rules will show the number, although the mouse shows where is it.
    2. if we could sort reads by readname, it would be very nice.
    We will see much better of SNPs between samples if reads from the same sample could be ordered together.

    By the way, if someone wants sorted2sam.pl, please email me:[email protected]
    Note: I have tested on single read alignment and not paired end read alignment.

    Comment


    • #17
      Someone was complaining to me that he could not see insertions reported by bwa-sw. I have not tried though. Does tablet show insertions?

      Comment


      • #18
        Originally posted by lh3 View Post
        Someone was complaining to me that he could not see insertions reported by bwa-sw. I have not tried though. Does tablet show insertions?
        CIGAR insertions?

        If so, then the answer is "kind of". In the current version, Tablet will tag them as features and list their locations in the Features Table for the contig in question. Why? Because we don't really know how to handle them properly (and would certainly welcome any insights) but this seemed like the best short-term solution. It's also why we state support for sam/bam in Tablet is still experimental.

        We did fire off a few emails to folks we hoped could help, including to the samtools mailing list, but so far no one has replied (we're isolated enough up here on the NE coast of Scotland as it is ). I'll ask Gordon (our other Tablet programmer) to post what the issues were again though...

        Iain
        Last edited by imilne; 02-24-2010, 12:51 AM.
        Our software: Tablet | Flapjack | Strudel | CurlyWhirly | TOPALi

        Comment


        • #19
          Li Heng is right. Tablet does not show insertions. As least it happened on my dataset.
          Also both the soft clipping and hard clipping do not show in Tablet. I would prefer to show them in small letters or letter with light color.

          I still think Tablet, is a good viewer, which is faster and user-friendly... It needs much more work and efforts to make it better.

          Comment


          • #20
            I'm the other Tablet programmer that imilne mentioned. As imilne said I had previously asked for clarification on how to handle inserts in SAM/BAM on the samtools mailing list, but I got no response there. It would be good if lh3, or somebody else who is relatively close to the SAM/BAM project could clarify the way in which CIGAR insertions should be handled.

            Given the following example (adapted from the SAM/BAM format paper) what is the correct output? I've kept the example simple to save space.

            Code:
            ref  - AGCATGTTAGATAAGATAGCTGTGCTAGTAGGCAGTCAGCGCCATGGAT
            
            SAM file:
            @HD	VN:1.0	SO:Sorted
            @SQ	SN:ref	LN:60
            r001	0	ref	7	30	8M2I4M1D3M	*	0	0	TTAGATAAAGGATACTG	!!!!!!!!!!!!!!!!!
            r002	0	ref	9	30	3S6M1P1I4M	*	0	0	AAAAGATAAGGATA	!!!!!!!!!!!!!!
            The SAM format paper proposes that the result should look something like the following:
            Code:
            ref  - AGCATGTTAGATAA**GATAGCTGTGCTAGTAGGCAGTCAGCGCCATGGAT
            
            r001 -       TTAGATAAAGGATA*CTG
            r002 -      aaaAGATAA*GGATA
            Whereas Samtools tview 0.1.7 on Windows produces (change marked in bold):
            Code:
            ref  - AGCATGTTAGATAA**GATAGCTGTGCTAGTAGGCAGTCAGCGCCATGGAT
            
            r001 -       TTAGATAAAGGATA*CTG
            r002 -      aaaAGATAA*[B]*[/B]GATA
            And IGV produces:
            Code:
            ref  - AGCATGTTAGATAAGATAGCTGTGCTAGTAGGCAGTCAGCGCCATGGAT
            
            r001 -       TTAGATAA[B]|[/B]GATA*CTG
            r002 -      aaaAGATAA[B]|[/B]GATA
            Where | represents an insertion marker. Note also that the reference has not had the two * characters inserted.

            I could list a few more viewers as there seem to be subtle differences between the way CIGAR insertions are handled between all viewers.

            Would it be possible for somebody to clarify which of the above is the intended outcome. We would really appreciate a clarification so that CIGAR insertions can be displayed as intended in the SAM/BAM format. As an interim measure we have a system in Tablet whereby we mark the insertions as features; which can be navigated to via our features table. Ideally we would like to display the insertions in whichever manner is deemed correct by those who work on the SAM/BAM format. We suspect the correct way to display CIGAR insertions is either the way seen in the SAM format paper, or the way seen in Samtools tview, we would really appreciate confrimation of this though.


            On a related note, what is the position on reads which extend beyond the reference sequence at the start or end of the sequence. The Picard API in particular doesn't seem to like reads which either start before the reference sequence, or end after it. Are these reads valid in SAM/BAM or not?

            Gordon.

            Comment


            • #21
              It should be (i.e. the samtools paper gives the right view):

              Code:
              NNNNNNNNNNNNNN**NNNNNN
                    TTAGATAAAGGATA*CTG
                      AGATAA*GGATA
              Samtools tview does not support padding (P) and thus fails to give the right view. Like tablet, IGV does not show inserted bases at the moment. It is being implemented so far as I know. I guess Gambit is the only viewer that supports padding and insertion (never tried, though). Gap5 should also work with these operations, but it does not directly work with BAM (am I right?).

              As to reads standing out of the reference, we are inclined to claiming them as invalid. But it would be good for a tool to be robust to that.
              Last edited by lh3; 03-01-2010, 08:53 AM.

              Comment


              • #22
                Many thanks for your clarification on these points - and speedy response - lh3. This will be very helpful when it comes to expanding our current support for insertions in SAM/BAM.

                Gordon.

                Comment


                • #23
                  It's update time again, and most importantly, Tablet now has proper, native support for BAM assemblies!

                  It's worth reading the new section in the help that specifically covers BAM as it obviously works a little differently to all of the other assembly types that Tablet supports (prepare to meet the "bambambar" )

                  As for the rest of the changes, they're as follows:

                  - NEW: Added support for indexed BAM assemblies.
                  - NEW: Added a new navigation bar for moving the viewing window around within a larger BAM assembly.
                  - NEW: Implemented background (multicore) thread support for calculating auxiliary display data, meaning large contigs can now be viewed instantly.
                  - NEW: Various implementation changes mean less memory is now used for large contigs.
                  - NEW: The Open Assembly dialog is now significantly quicker at detecting the assembly type.
                  - NEW: The page left/right controls now hide when the edges of the display are reached.
                  - NEW: Added options to the Preferences dialog to enable/disable disk caching (where supported).
                  - NEW: Loading a very large number of features is now much faster.
                  - NEW: The bp coordinates (data set/BAM window and current view) are now shown on the overview images.
                  - CHG: Updated the mismatch code to ignore regions of reads that extend beyond the left or right ends of the consensus.
                  - BUG: Fixed a problem with files being ignored when passed to Tablet on the command line.
                  - BUG: Fixed a bug which caused Tablet to not exit correctly when a filter had been applied in the Contigs Table.
                  - BUG: Fixed multiple issues with AFG parsing related to clr and gap tags.
                  - BUG: The unpadded read length was no longer being shown in the popup information box.

                  A few other points:

                  - The issues with CIGAR parsing remain (for now).

                  - We have an Eland parser ready to go, we're just waiting on confirmation of a few things before we release it. (We don't use Eland here, so we have no data files ya'see).

                  - I'd also like to take this opportunity to thank the Picard/samtools folks for their Java API. Without it, proper BAM support in Tablet would probably never have happened, so we (and hopefully lots of Tablet users too) are very grateful to the developers for making it available.

                  Iain
                  Our software: Tablet | Flapjack | Strudel | CurlyWhirly | TOPALi

                  Comment


                  • #24
                    Another feature you may consider is to add FTP support. Most human alignments are sitting on NCBI ftp. If I want to look at a small region for multiple samples, I would need to download alignments, which is a lot. The best way is to directly open BAMs over FTP without downloading the entire file, like what UCSC is doing now. IGV allows you to open alignments on the IGV server. I know someone is thinking to open BAM from amazon/google cloud.

                    I am just throwing ideas. Do not take them seriously.

                    Comment


                    • #25
                      Hi all,

                      I am working on Illumina GAIIx single read data.
                      To test Tablet, I used NSTbioinformatics sorted2sam.pl script on my data (s_N_sorted.txt) to generate s_N_sorted.sam (SAM format). That worked fine.
                      But using s_N_sorted.sam as assembly file and phix.fa as reference file in Tablet, it displayed a window with only reference sequence and 0 read.
                      The trouble is that in s_N_sorted.sam there are 119,795 reads matched to the reference genome.
                      There is no info in output.log.
                      Could you give me some clue to solve this problem ?
                      Any help appreciate ?

                      Cheers,
                      Jennifer

                      Comment


                      • #26
                        Originally posted by jeny View Post
                        Hi all,

                        I am working on Illumina GAIIx single read data.
                        To test Tablet, I used NSTbioinformatics sorted2sam.pl script on my data (s_N_sorted.txt) to generate s_N_sorted.sam (SAM format). That worked fine.
                        But using s_N_sorted.sam as assembly file and phix.fa as reference file in Tablet, it displayed a window with only reference sequence and 0 read.
                        The trouble is that in s_N_sorted.sam there are 119,795 reads matched to the reference genome.
                        There is no info in output.log.
                        Could you give me some clue to solve this problem ?
                        Any help appreciate ?

                        Cheers,
                        Jennifer
                        0 reads in Tablet probably means that the contig names assigned to each read do not match the contig names that are in the reference file, ie, Tablet has (probably) seen the reads, but doesn't know what to associate them with.

                        Have a look at the first few lines of the sam file and see if the contig name for each line (3rd column) matches the name(s) of the any of the reference sequence(s) in the fasta file.

                        Iain
                        Our software: Tablet | Flapjack | Strudel | CurlyWhirly | TOPALi

                        Comment


                        • #27
                          Hi Iain,

                          you are right. The problem is that s_N_export.txt or s_N_sorted.txt files contain chromosome matched (or genome) and not contig name. So, in fact Tablet can't match chromosome (in export or sorted files) to contig names in reference file. I will have a look and return to you when I find a solution.
                          Any help appriciate !

                          Jeny

                          Comment


                          • #28
                            Hi folks,

                            Thought I'd post a quick message to let you know that Tablet has been updated again. It's now at version 1.10.05.06 (special UK election day version or something ). Changes are as follows:

                            - NEW: Added a new colouring scheme based on read direction/orientation.
                            - NEW: The overview displays can now be "subsetted", forcing them to show only an overview of whatever region you define for them.
                            - NEW: Variant highlighting (in red) can now be turned on or off for the overviews or popups.
                            - NEW: Added the ability to search for sequence substrings within reads.
                            - NEW: Added a new graphical layer that performs "read shadowing", highlighting any reads under the a given column or mouse position.
                            - NEW: Example files (linked to on the web) can now be opened directly from within Tablet.
                            - NEW: The nucleotide text for each base can now be shown or hidden.
                            - NEW: Added (optional) support for trimming reads in ACE files based on the QA tags.
                            - CHG: Redesigned and added additional options to the control ribbon.
                            - CHG: The ACE parser now ignores base quality information, making it more compatible with badly formatted files.
                            - BUG: The SAM parser now looks for all possible header tags to determine if a file is SAM or not.
                            - BUG: Fixed various issues with searching for reads.
                            - BUG: Fixed some miscellaneous rendering issues when switching between contigs.

                            As usual, it should auto-update the next time you run it, or you can grab the download manually from http://bioinf.scri.ac.uk/tablet

                            Iain
                            Our software: Tablet | Flapjack | Strudel | CurlyWhirly | TOPALi

                            Comment


                            • #29
                              Another update today, mainly to fix a few bugs and make things a bit more robust when dealing with BAM files.

                              - NEW: Implemented a new, hopefully more robust, method of determining the version number for update purposes.
                              - NEW: Added an option to set Picard's SAMFileReader validation stringency to lenient to suppress SamValidation errors.
                              - NEW: Added a parse-time option force DNA ambiguity codes to be read as N.
                              - CHG: Changed the protein translator to treat any base with an N as ok (codon will translate to X) rather than having it skip the base.
                              - BUG: Fixed a problem copying any of the reverse strand protein translations to the clipboard.
                              - BUG: Fixed problems relating to Tablet not re-initialising its BAM reader after encountering SamValidation error.

                              Proper visual support for paired end data (might) be next. Please do let us know what you'd like to see...

                              Iain
                              Our software: Tablet | Flapjack | Strudel | CurlyWhirly | TOPALi

                              Comment


                              • #30
                                Would it be possible to implement some sort of features track to make it easier to see pileups over exons, etc?

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM
                                • seqadmin
                                  The Impact of AI in Genomic Medicine
                                  by seqadmin



                                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                  02-26-2024, 02:07 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-14-2024, 06:13 AM
                                0 responses
                                34 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-08-2024, 08:03 AM
                                0 responses
                                72 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-07-2024, 08:13 AM
                                0 responses
                                81 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-06-2024, 09:51 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X