Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    I was wondering if I could get some help on the SAM specification.

    I get the error:

    samtools: bam_lpileup.c:115: tview_func: Assertion `l == tv->n_pre' failed
    When I page the viewer to my reads containing indels. A sample of my data is

    suis_indels_30749_30917_a0b7e 147 Streptococcus_suis 30782 150 60M7D15M * 0 0 CAGATTGTTGCGGTTTATAAGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCA YX[QRQZETFQLZT[FQMX]_``\SZ```_`^``_[``^XZ```^``__``X]T[```_^`_````````````` NM:i:7 MD:Z:60M7D15M
    suis_indels_30862_31024_a3384 131 Streptococcus_suis 30801 150 41M7D34M * 0 0 AGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCATCTTT `````H^_^^_```Z]_``````]``^]``_]```]]X````]_]^````]Z_URYT]]\\]_X``_^][U^``\ NM:i:7 MD:Z:41M7D34M
    suis_indels_30874_31073_a388e 131 Streptococcus_suis 30813 150 29M7D46M * 0 0 CGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCATCTTTGATTTTTGAAAA ```^`\[`````````````_X][U]RIST\`XDDLIHGK]J\XMFVER^]`````[POHLGTU]^_Z``]]]KV NM:i:7 MD:Z:29M7D46M
    suis_indels_30890_31082_a9685 131 Streptococcus_suis 30829 150 13M7D62M * 0 0 TAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCATCTTTGATTTTTGAAAAATTGAATGAAGCTGGT `]__`ZOY^``ZUV``]TTY`_``````^LU```_`_``XV[````^\`\\^UU[`XHOPKITNEUNIRRNDWW` NM:i:7 MD:Z:13M7D62M
    suis_indels_16281_16438_b9354 131 Streptococcus_suis 16224 150 6M1D69M * 0 0 AAATGAGATATTTACGAATTATCTCACGCAATACAATCACCGGGATTGATAAACGATGAATTATTAATGTTGAG`[``````_`_I_]O[__`__Y__VFMY\````^Z```^Z\U``_``````]S`\Y^[_]^```^]`_\Y^ZQXH NM:i:2 MD:Z:6M1D69M
    suis_indels_2007371_2007505_bf820 147 Streptococcus_suis 2007364 150 71M3D4M * 0 0 GTTCTGACTTTTATTCACATCTTGTGGAAAACTTTTCTTAACAGTGTGGATTTTAAAAATTATCTGTGGAATTTT MRT^\_]RZ__`]]YP[^`ZT`T]VH``^]^TS]]``]^^````````_W_`_\\```_````````````\^`^ NM:i:3 MD:Z:71M3D4M
    suis_indels_30783_30934_c09ed 147 Streptococcus_suis 30799 150 43M7D32M * 0 0 TAAGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCATCT U]PRKMJTRKIK___`Y`\]`\^```ZXVYMZX]`_```Z``_``WWQ]_````_]``]````````````Z_W` NM:i:7 MD:Z:43M7D32M
    suis_indels_16171_16358_c4122 147 Streptococcus_suis 16227 150 3M1D72M * 0 0 TGAGATATTTACGAATTATCTCACGCAATACAATCACCGGGATTGATAAACGATGAATTATTAATGTTGAGAGGNFMU``_`_`NTPV]^__^]YYQLX^ZK[^[[[^^``]RUX``Z]Y\Z````````^``_`]U\`^_```]```` NM:i:1 MD:Z:3M1D72M
    suis_indels_16164_16353_ce374 147 Streptococcus_suis 16222 150 8M1D67M * 0 0 CCAAATGAGATATTTACGAATTATCTCACGCAATACAATCACCGGGATTGATAAACGATGAATTATTAATGTTG[]]^[]\X[^`\ZZX`^_^\_``````\`_JL[\R]X```_`````_``_Y[]`````[\````Z```````^_H NM:i:2 MD:Z:8M1D67M
    suis_indels_30842_31012_d14fb 131 Streptococcus_suis 30781 150 61M7D14M * 0 0 CCAGATTGTTGCGGTTTATAAGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTC ```_`_`````]`````_```^_VUW^```_^_ZXZT^`````__]YNXTEKV`_YXNRW\\REHDDLSGMFURD NM:i:7 MD:Z:61M7D14M
    suis_indels_30721_30932_d9806 147 Streptococcus_suis 30797 150 45M7D30M * 0 0 TATAAGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCAT PPJ[TFRJ``````TVU`__`^^`_]_`````GFPZTWY\VX`_````__``````^`````````````````` NM:i:7 MD:Z:45M7D30M
    suis_indels_30785_30945_e17b9 147 Streptococcus_suis 30810 150 32M7D43M * 0 0 CGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCATCTTTGATTTTTGA TFWOOKDUNQZ`^RU^^`STTSVW^_``````^`_V``_``````````\[[`XR_```_W\]O_``__`_T^`` NM:i:7 MD:Z:32M7D43M
    suis_indels_30737_30928_e6336 147 Streptococcus_suis 30793 150 49M7D26M * 0 0 GGTTTATAAGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATT LMRUPOTQPY[V]]^``]TS\^`\^`UXX^S^```_^Z``\\``_```^Y^Q]```````V\]^``^``K^```` NM:i:7 MD:Z:49M7D26M
    And I'm using novoalign format where I've modified the novo2sam.pl script to produce a 'correct' CIGAR format. Most samtools operation work except for the tview

    Comment


    • #47
      I have also had a lot of problems when viewing locations with many insertions and deletions. I have been trying to find the bug in the source code, but so far no luck.

      Nils

      Comment


      • #48
        Please check out the latest version (0.1.3-9 r256) from SVN. I fixed a kind of typo but I am not sure if this also fixes this assertion failure. It is not easy to see this. Please let me know if this works. Many thanks.
        Last edited by lh3; 05-04-2009, 10:22 AM.

        Comment


        • #49
          Hi,
          The bwa aln is fast (few hours),
          but somehow bwa samse(or sampe) is slow.
          This program have been running for almost a week:
          bwa samse -n 255 mouse_genome.fa data.sai data.fq >data.sam

          My data.fq has about 80 million reads. Is that to much reads for samse(pe)? Should I split them and merge?

          Comment


          • #50
            Bwa and samtools are two separated packages. I will start a new thread about questions related to bwa.

            I answer your question here. I guess you are using reads shorter than 30bp. BWT based algorithms are generally slow for enumerating multiple hits and very short reads are more likely to have many hits. Therefore, BWA's 'sampe' and 'samse -n' are inefficient for these short reads. Probably non-BWT algorithms are better for such alignment. If you really want to align very short reads, you may consider use smaller -n with samse and smaller -o (say 200) with sampe. Using smaller -o will make pairing less effective, though.

            The memory of bwa is almost independent of the number of input reads and the speed is linear in that. You can separate input into smaller batches.
            Last edited by lh3; 05-04-2009, 11:03 AM.

            Comment


            • #51
              Originally posted by lh3 View Post
              Please check out the latest version (0.1.3-9 r256) from SVN. I fixed a kind of typo but I am not sure if this also fixes this assertion failure. It is not easy to see this. Please let me know if this works. Many thanks.
              I have checked out the 'latest' version from SVN:

              Program: samtools (Tools for alignments in the SAM format)
              Version: 0.1.3-16 (r267)


              This seems to be a later version, would it contain the fix?

              Comment


              • #52
                Feature request

                I find it cumbersome that samtools import *requires* an external index file to tell it what reference sequences are present. This is even true if there is a header in the .sam file.

                For my own sanity I knocked up this simple, albeit slow, perl script:

                Code:
                #!/usr/bin/perl -w
                
                # Reads a sam file and generates an index suitable for "samtools import".
                
                use strict;
                
                #--- Build index
                my %ref; #key=name, value=length
                while (<>) {
                    my @F=split(" ", $_);
                    my $rname = $F[2];
                
                    next if $rname eq "*";
                    my $end = $F[3] + length($F[9]);
                
                    $ref{$rname} = $end if (!exists($ref{$rname}) || $ref{$rname} < $end);
                }
                
                #--- Output input to stdout
                foreach (sort keys %ref) {
                    print "$_\t$ref{$_}\n";
                }
                It generates a file suitable for passing as the 1st argument after "samtools import". Admittedly it doesn't know the true ending of the reference sequence, but for de-novo assemblies the reference is just the consensus anyway so it's faked.

                Note that the above is an estimate. It doesn't consider the fact that the sequences may have CIGAR string possibly altering the mapped coordinates of the reference, but it's a start at least.

                Comment


                • #53
                  The latest SVN parses the header @SQ lines. You need to use the "view" command to do this.

                  Comment


                  • #54
                    On behalf of the Java API developers: the Java APIs and command-line tools are formally released here (also linked from http://samtools.sourceforge.net):



                    The Java implementation is more flexible than SAMtools-C on sorting/viewing/merging/rmdup. Developers may also find Java APIs are easier to use than the C APIs.

                    By the way, Lincoln Stein has implemented initial Perl bindings for the SAM/BAM format, based on the C APIs. This Perl module will become part of BioPerl as well as GBrowse.

                    Comment


                    • #55
                      Is there a way to convert a SAM consensus output (using -c option for pileup) to the old maq-style .cns consensus?

                      I have some maq-based pipelines I would like to use on my BWA results.

                      Comment


                      • #56
                        I read through the latest SAM Format Specification on optional fields. But I still don’t understand some cases in the .sam file I got. For instances:
                        XT:A:R NM:i:0 X0:i:30406 XM:i:0 XO:i:0 XG:i:0
                        XT:A:R NM:i:1 X0:i:2 X1:i:1 XM:i:1 XO:i:0 XG:i:0 MD:Z:3G32
                        I just can’t figure out what do those X? fileds mean? I know these are are reserved for local use and re not be formally defined. But what is the point here? What are the types A, i, Z etc for?
                        Any hint/ideas would be appreciated.

                        Comment


                        • #57
                          X? tags are for aligner specific information. You need to read the documentation of the program that generates these tags.

                          Comment


                          • #58
                            Originally posted by zee View Post
                            Is there a way to convert a SAM consensus output (using -c option for pileup) to the old maq-style .cns consensus?

                            I have some maq-based pipelines I would like to use on my BWA results.
                            Yes, I am looking for the same. Heng li cud u help out plz?

                            Comment


                            • #59
                              I would like to generate bam file from blast result of EST.

                              Firstly, I made sam file from blast result using blsat2sam.pl in samtools packages.
                              Secondly, I have tried to convert from sam file into bam file using "samtools import" but I've got following error.

                              Parse error at 1: CIGAR and sequence length are inconsistent.

                              seq1 16 chr20 18289430 255 23H426M760H * 0 0 * * AS:i:844 EV:Z:0.0

                              Could you give me any idea please ?

                              Comment


                              • #60
                                @corthay

                                You can convert with "blast2sam.pl -s" to save the sequence in SAM. Currently, samtools cannot parse SAM without sequence, although the specification allows this.

                                @jess and zee

                                Sorry. I am afraid that you cannot easily convert pileup output to maq-cns. You can call SNPs/indels directly from samtools and in SVN samtools.pl now filters SNPs/indels in the maq-like way. .cns has been replaced with glf, although they contain different information.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Advancing Precision Medicine for Rare Diseases in Children
                                  by seqadmin




                                  Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                  12-16-2024, 07:57 AM
                                • seqadmin
                                  Recent Advances in Sequencing Technologies
                                  by seqadmin



                                  Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                  Long-Read Sequencing
                                  Long-read sequencing has seen remarkable advancements,...
                                  12-02-2024, 01:49 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 12-17-2024, 10:28 AM
                                0 responses
                                33 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 12-13-2024, 08:24 AM
                                0 responses
                                48 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 12-12-2024, 07:41 AM
                                0 responses
                                34 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 12-11-2024, 07:45 AM
                                0 responses
                                46 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X