Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Hi Heng,
    I did do "samtools sort" first. Here is what I did:

    samtools import homo_sapiens.fasta.fai S1.sam S1.bam

    samtools sort S1.bam S1_sorted

    samtools pileup -f homo_sapiens.fasta -c S1_sorted.bam

    Q

    Comment


    • #32
      Is there a running list somewhere that shows which programs produce (or take as input) SAM/BAM format? It would help us make a final decision in what format we should produce our mapping results for the mapping we do here at Complete Genomics.

      The negative gap structure will probably be something that is dealt with, with special tags (GS, GQ and GC) so software that wants to make optimal use of all the data should use those tags, but most of the software should work "out of the box" if they support SAM/BAM...
      Thon
      __________________________________
      Thon de Boer, Ph.D.
      Director of Product Management, Software
      Strand Life Sciences
      548 Market Street, Suite 82804
      San Francisco, CA 94104, USA
      [email protected]
      www.strandls.com
      Pioneers in Discovery Research Informatics
      _______________________________________

      Comment


      • #33
        On the subject of tools that use SAMtools, I'm very interested in adding in support for my project, based in java. I'm aware that there exists java based tools for reading/writing in this format, but I'm unable to find any documentation on the software. Has anyone come across any information on how to use the Java SAM tools code?
        The more you know, the more you know you don't know. —Aristotle

        Comment


        • #34
          To thondeboer: currently BWA natively generates alignments in the SAM format. BFAST also generates SAM. We also provide converters for SOAP, Bowtie, Export, novoalign and even blast. However, most of these converters are incomplete in that sometimes they cannot convert every information due to the lack of documentation especially for short indels. So far as I know, all aligners generate its own format. SAM is probably the first effort in unifying the alignment format, in particular for alignment for the new sequencing data.

          To apfejes: I am not able to comment much on the Java implemention. I know the I/O part is complete and actually does more nice things than the C version of samtools. you may send an email to the mailing list to ask for the documentations.

          Comment


          • #35
            This is an excellent, and very exciting idea. With a standard alignment format (SAM), and a standard raw read format (Phred/Sanger fastq) we can drastically reduce the time most of us spend writing our own file format parsers and converters, and eliminate a common source of error in data analysis (incorrect parsing).

            It's great to see the bioinformatics community coming together in this way.

            To anyone developing alignment tools: Please include support for this format in future versions of your software!
            @1
            NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
            +
            """"""""""""""""""""""""""""""""""""

            Comment


            • #36
              Can SAMtools convert SAM back to MAQ?

              Hi lh3,
              I am glad that SAMtools can do maq2sam , but will it be easy to do sam2maq?
              The reason I ask is that I want to MAQ to generate SNP and INDELs. BWA can not do that (yet).

              thanks
              g

              Comment


              • #37
                SAMtools has a SNP caller, based on the same code of MAQ. See this page for more information: http://samtools.sourceforge.net/cns0.shtml. What is missing in SAMtools is a SNP filtration script like "maq.pl SNPfilter", but it is easy to write your own at the moment.

                SAMtools' indel caller uses a different algorithm. It outperforms MAQ.

                Comment


                • #38
                  lh3,
                  Actually, I tried SAMtools before but somehow "pileup" it's not outputing anything so I am thinking go back to use MAQ.
                  Did I run the program correctly? (use the example files which come with samtools package)
                  examples> ../samtools pileup -f ex1.fa ex1.sam
                  --> return nothing
                  examples> ../samtools pileup ex1.sam
                  --> return nothing

                  thanks again!
                  g

                  Comment


                  • #39
                    should be: ../samtools pileup -t ex1.fa.fai ex1.sam or ../samtools pileup ex1.bam. I have added a Makefile.

                    Note that there is a companion format called BAM which is the binary representation of SAM. Most of samtools commands work on BAM only. I know having two formats is a bit confusing, but this is necessary for faster parsing.

                    Comment


                    • #40
                      lh3,
                      Thanks for quick response, pileup is working now!
                      Here is some questions about pileup format when I look at them at first time:
                      (1) what is a "*" in read bases , which is not documented in "http://samtools.sourceforge.net/pileup.shtml".
                      (2) Is it okay for a base in the same position pile-up twice ? (chr1 1949878 occur twice in my first pileup output)
                      thanks,
                      Below is the piece pile-up output I found the problem:
                      chr1 1949878 A A 142 0 60 55 C$....,,,...,,.C..,.,,..,...,,,.,.,.+1C,,,,..,.,........,^F
                      ,^], &5,2IIII5I<II+%5=I+II(II8@CII3*I0I+IIII,I$@IIAAIIDI@I*5
                      chr1 1949878 * */+C 38 38 * +C 13 4 30 8
                      chr1 1949879 A A 150 0 60 54 ....,G,...,,.-1G..-1G.,.,,..,...,,,.,.,.,,,,..,.,........,,
                      , II*IIII;I.II&(&1I3II%9III:II7&I&@&IIII5I"6IIIIG.I33I$6
                      chr1 1949879 * -G/* 481 481 -G * 6 9 30 9
                      chr1 1949880 G G 25 0 60 55 .$A$..,A,..A,,*.*A,A,,A.,A..,,,.,A,.,,,,.+1A.,.,........A,,
                      ^], +(,.III)8-II8%D.I0II#,I@5III.$I+I,IIII$I2EIIIIIIIIIE&?/
                      chr1 1949880 * */+A 350 350 * +A 24 3 19 9
                      chr1 1949881 A A 162 0 60 53 ..,,,...,,....,.,,..,...,,,.,.,.,,,,..,.,........G,,, 6DI
                      II3I$II81D)I%II+.III'III$I2I+IIII+II<II46IIAHI.2IB

                      Comment


                      • #41
                        You are invoking pileup with "-c" and you should also read this page:



                        A read base "*" means a deletion. The second line at "chr1 1949878" shows indel call. In principle this is not part of pileup.

                        Comment


                        • #42
                          I have a working patch to view ABI SOLiD color space using samtools (http://samtools.sourceforge.net/) text viewer. For example, some of the features using output from BFAST (in SAM format), which includes the "CS" and "CQ" tags, are:

                          - option to display colors instead of nucleotides.
                          - option to color bases/colors based on color. This is similar if you want to color bases based on the given base.
                          - option to color bases/colors based on color quality.
                          - the "." (dot) option when displaying color space will only show those colors that were corrected during alignment (i.e. the color errors).
                          - option to remove all insertions in the current display (in some regions, spurious insertions can cause a headache when viewing that region).

                          PM me and I can supply you with a source version.

                          Comment


                          • #43
                            Hi, I'm a novice geneticist who is interested in using the 1000 Genome project data available on NCBI and I can't quite figure how to obtain sequence information from the BAM file, SAMTools' website is little help. I am wondering if anyone knows a good place to get information for this kind of work.

                            (Offtopic, anyone know why the 1000 Genome project has a log-in but no register option?)

                            Comment


                            • #44
                              The first thing you may want to try is:

                              samtools view -h aln.bam | less -S

                              Comment


                              • #45
                                Understanding samtools pileup output

                                Hi,

                                I'm having trouble trying to parse the samtools output. In the example below, at position 60, I have 108 reads. As I understand it, 8 reads terminate (since there are 8 '$'s), and there are 2 new reads (marked by the '^') on the next line.
                                So the next line - line 61 - should have 108-8+2=102 reads.
                                Instead, it has 99.
                                What am I missing here?
                                This is the 40th line of input with 40bp reads, and this is the first instance where '$' appears. Other lines seem to work out fine.


                                seq1 60 a 108 .$,$,$,$,$g$....ggt*.G,g,,.,,+2tt,+3agcG.,+4atgc,+4ttgcg.c,c.$.,,,..$,+4aggat+6ccgttt,..,tt,.,,..,.
                                +7CTGCCTG,.,.,,.,,..,.,..,.,.,,.,,..,,.,.,.,.,.,,.,.,.,.,,^].^],^], CBB=ABBA>BBCBB7BB<BBBBCBBBBBCBB@BB@BBCB:CBBAACC>ABCBBBBBBBBBCC9BBABB@B
                                B<BBB7CBBBBABBBBBCBBBBB@BB;BBBC@CBCBCB
                                seq1 61 g 99 A$.$.$.$t$c$t$,..,,a,A,,$..,.a,,.,+4gcag,,.,,$,A.,,+7ctgtttg,t$A,a..,.,.,.,,.,,..,.,..,.,.,,.,,..,,
                                .,.,.,.,.,,.,.,.,.,,.,,^].^]t BCAABB@B1BB<BBBBBBBBBBBBBACBB@BBABBBBBBBBBCBBCCBBBBBBBBCCBBB6BBBBBBBBBBBABCBBBBBBBBBBBBCAC?BCCBCBB@
                                Last edited by mhc; 05-01-2009, 10:30 AM.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM
                                • seqadmin
                                  The Impact of AI in Genomic Medicine
                                  by seqadmin



                                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                  02-26-2024, 02:07 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-14-2024, 06:13 AM
                                0 responses
                                33 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-08-2024, 08:03 AM
                                0 responses
                                72 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-07-2024, 08:13 AM
                                0 responses
                                81 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-06-2024, 09:51 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X