Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Hi Heng,
    I did do "samtools sort" first. Here is what I did:

    samtools import homo_sapiens.fasta.fai S1.sam S1.bam

    samtools sort S1.bam S1_sorted

    samtools pileup -f homo_sapiens.fasta -c S1_sorted.bam

    Q

    Comment


    • #32
      Is there a running list somewhere that shows which programs produce (or take as input) SAM/BAM format? It would help us make a final decision in what format we should produce our mapping results for the mapping we do here at Complete Genomics.

      The negative gap structure will probably be something that is dealt with, with special tags (GS, GQ and GC) so software that wants to make optimal use of all the data should use those tags, but most of the software should work "out of the box" if they support SAM/BAM...
      Thon
      __________________________________
      Thon de Boer, Ph.D.
      Director of Product Management, Software
      Strand Life Sciences
      548 Market Street, Suite 82804
      San Francisco, CA 94104, USA
      [email protected]
      www.strandls.com
      Pioneers in Discovery Research Informatics
      _______________________________________

      Comment


      • #33
        On the subject of tools that use SAMtools, I'm very interested in adding in support for my project, based in java. I'm aware that there exists java based tools for reading/writing in this format, but I'm unable to find any documentation on the software. Has anyone come across any information on how to use the Java SAM tools code?
        The more you know, the more you know you don't know. —Aristotle

        Comment


        • #34
          To thondeboer: currently BWA natively generates alignments in the SAM format. BFAST also generates SAM. We also provide converters for SOAP, Bowtie, Export, novoalign and even blast. However, most of these converters are incomplete in that sometimes they cannot convert every information due to the lack of documentation especially for short indels. So far as I know, all aligners generate its own format. SAM is probably the first effort in unifying the alignment format, in particular for alignment for the new sequencing data.

          To apfejes: I am not able to comment much on the Java implemention. I know the I/O part is complete and actually does more nice things than the C version of samtools. you may send an email to the mailing list to ask for the documentations.

          Comment


          • #35
            This is an excellent, and very exciting idea. With a standard alignment format (SAM), and a standard raw read format (Phred/Sanger fastq) we can drastically reduce the time most of us spend writing our own file format parsers and converters, and eliminate a common source of error in data analysis (incorrect parsing).

            It's great to see the bioinformatics community coming together in this way.

            To anyone developing alignment tools: Please include support for this format in future versions of your software!
            @1
            NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
            +
            """"""""""""""""""""""""""""""""""""

            Comment


            • #36
              Can SAMtools convert SAM back to MAQ?

              Hi lh3,
              I am glad that SAMtools can do maq2sam , but will it be easy to do sam2maq?
              The reason I ask is that I want to MAQ to generate SNP and INDELs. BWA can not do that (yet).

              thanks
              g

              Comment


              • #37
                SAMtools has a SNP caller, based on the same code of MAQ. See this page for more information: http://samtools.sourceforge.net/cns0.shtml. What is missing in SAMtools is a SNP filtration script like "maq.pl SNPfilter", but it is easy to write your own at the moment.

                SAMtools' indel caller uses a different algorithm. It outperforms MAQ.

                Comment


                • #38
                  lh3,
                  Actually, I tried SAMtools before but somehow "pileup" it's not outputing anything so I am thinking go back to use MAQ.
                  Did I run the program correctly? (use the example files which come with samtools package)
                  examples> ../samtools pileup -f ex1.fa ex1.sam
                  --> return nothing
                  examples> ../samtools pileup ex1.sam
                  --> return nothing

                  thanks again!
                  g

                  Comment


                  • #39
                    should be: ../samtools pileup -t ex1.fa.fai ex1.sam or ../samtools pileup ex1.bam. I have added a Makefile.

                    Note that there is a companion format called BAM which is the binary representation of SAM. Most of samtools commands work on BAM only. I know having two formats is a bit confusing, but this is necessary for faster parsing.

                    Comment


                    • #40
                      lh3,
                      Thanks for quick response, pileup is working now!
                      Here is some questions about pileup format when I look at them at first time:
                      (1) what is a "*" in read bases , which is not documented in "http://samtools.sourceforge.net/pileup.shtml".
                      (2) Is it okay for a base in the same position pile-up twice ? (chr1 1949878 occur twice in my first pileup output)
                      thanks,
                      Below is the piece pile-up output I found the problem:
                      chr1 1949878 A A 142 0 60 55 C$....,,,...,,.C..,.,,..,...,,,.,.,.+1C,,,,..,.,........,^F
                      ,^], &5,2IIII5I<II+%5=I+II(II8@CII3*I0I+IIII,I$@IIAAIIDI@I*5
                      chr1 1949878 * */+C 38 38 * +C 13 4 30 8
                      chr1 1949879 A A 150 0 60 54 ....,G,...,,.-1G..-1G.,.,,..,...,,,.,.,.,,,,..,.,........,,
                      , II*IIII;I.II&(&1I3II%9III:II7&I&@&IIII5I"6IIIIG.I33I$6
                      chr1 1949879 * -G/* 481 481 -G * 6 9 30 9
                      chr1 1949880 G G 25 0 60 55 .$A$..,A,..A,,*.*A,A,,A.,A..,,,.,A,.,,,,.+1A.,.,........A,,
                      ^], +(,.III)8-II8%D.I0II#,I@5III.$I+I,IIII$I2EIIIIIIIIIE&?/
                      chr1 1949880 * */+A 350 350 * +A 24 3 19 9
                      chr1 1949881 A A 162 0 60 53 ..,,,...,,....,.,,..,...,,,.,.,.,,,,..,.,........G,,, 6DI
                      II3I$II81D)I%II+.III'III$I2I+IIII+II<II46IIAHI.2IB

                      Comment


                      • #41
                        You are invoking pileup with "-c" and you should also read this page:



                        A read base "*" means a deletion. The second line at "chr1 1949878" shows indel call. In principle this is not part of pileup.

                        Comment


                        • #42
                          I have a working patch to view ABI SOLiD color space using samtools (http://samtools.sourceforge.net/) text viewer. For example, some of the features using output from BFAST (in SAM format), which includes the "CS" and "CQ" tags, are:

                          - option to display colors instead of nucleotides.
                          - option to color bases/colors based on color. This is similar if you want to color bases based on the given base.
                          - option to color bases/colors based on color quality.
                          - the "." (dot) option when displaying color space will only show those colors that were corrected during alignment (i.e. the color errors).
                          - option to remove all insertions in the current display (in some regions, spurious insertions can cause a headache when viewing that region).

                          PM me and I can supply you with a source version.

                          Comment


                          • #43
                            Hi, I'm a novice geneticist who is interested in using the 1000 Genome project data available on NCBI and I can't quite figure how to obtain sequence information from the BAM file, SAMTools' website is little help. I am wondering if anyone knows a good place to get information for this kind of work.

                            (Offtopic, anyone know why the 1000 Genome project has a log-in but no register option?)

                            Comment


                            • #44
                              The first thing you may want to try is:

                              samtools view -h aln.bam | less -S

                              Comment


                              • #45
                                Understanding samtools pileup output

                                Hi,

                                I'm having trouble trying to parse the samtools output. In the example below, at position 60, I have 108 reads. As I understand it, 8 reads terminate (since there are 8 '$'s), and there are 2 new reads (marked by the '^') on the next line.
                                So the next line - line 61 - should have 108-8+2=102 reads.
                                Instead, it has 99.
                                What am I missing here?
                                This is the 40th line of input with 40bp reads, and this is the first instance where '$' appears. Other lines seem to work out fine.


                                seq1 60 a 108 .$,$,$,$,$g$....ggt*.G,g,,.,,+2tt,+3agcG.,+4atgc,+4ttgcg.c,c.$.,,,..$,+4aggat+6ccgttt,..,tt,.,,..,.
                                +7CTGCCTG,.,.,,.,,..,.,..,.,.,,.,,..,,.,.,.,.,.,,.,.,.,.,,^].^],^], CBB=ABBA>BBCBB7BB<BBBBCBBBBBCBB@BB@BBCB:CBBAACC>ABCBBBBBBBBBCC9BBABB@B
                                B<BBB7CBBBBABBBBBCBBBBB@BB;BBBC@CBCBCB
                                seq1 61 g 99 A$.$.$.$t$c$t$,..,,a,A,,$..,.a,,.,+4gcag,,.,,$,A.,,+7ctgtttg,t$A,a..,.,.,.,,.,,..,.,..,.,.,,.,,..,,
                                .,.,.,.,.,,.,.,.,.,,.,,^].^]t BCAABB@B1BB<BBBBBBBBBBBBBACBB@BBABBBBBBBBBCBBCCBBBBBBBBCCBBB6BBBBBBBBBBBABCBBBBBBBBBBBBCAC?BCCBCBB@
                                Last edited by mhc; 05-01-2009, 10:30 AM.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                18 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                22 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                17 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X