Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by seqsyd View Post
    can any 1 tell me please how can i install SAMtools on my windows operating system...Help Please...thanks
    Have you tried the provided Windows binaries on sourceforge? i.e. samtools-0.1.12a_i386-win32.zip

    [I'd suggest getting hold of a Linux (or Mac OS X) machine if you can, since not that many of the NGS tools will work on Windows]

    Comment


    • optional fields in SAM format

      Could anybody advise me the best way to parse optional fields in SAM file? I'm interested in tags X0 and XA, and the problem is that they do not have a permanent place in the output string (e.g., sometimes 13th or 14th, 18th, 19th or 20th). Certainly, I can do it myself in perl, but such parsing would take too much time and resources.
      What I'm looking for is:
      1) either a way to control output tags (their number and order)
      2) or (better) some bioperl module, like blast parser, that could parse it effectively.
      thanks!

      Comment


      • For perl, very easy and efficient:

        perl -ne 'print "$1\n" if /XA:i\d+)/'

        Comment


        • Originally posted by lh3 View Post
          For perl, very easy and efficient:

          perl -ne 'print "$1\n" if /XA:i\d+)/'
          It's exactly what I want to avoid - checking each line with IF condition.

          Comment


          • This is fast. If present, the ordering of tags are fixed. You can

            if /X0:i\d+).*XA:Z\S+)/

            to grep out everything with a single pass. You need to read and split after all. Regex matching should be of similar speed, if not faster. The bad implementation is to loop through the optional tags and then try to find matching.
            Last edited by lh3; 04-15-2011, 06:59 AM.

            Comment


            • Thank you, Heng!

              Comment


              • I converted some ELAND export files to SAM and now tried to make BAM files out of those, but it is complaining about @SQ headers not being there?!
                Does the export2sam.pl tool not create a SAM file format that is correct??

                Comment


                • I am trying to view a portion of a sorted, indexed bam file and I am getting the error:
                  [sam_header_read2] 26 sequences loaded.
                  [main_samview] random alignment retrieval only works for indexed BAM files.

                  It is already sorted and indexed.... I have no idea how to resolve this.

                  Comment


                  • @oudacontrol: You should post the command you used. It seems like the command may have been mis-formatted, but there's no way to tell if you don't include it.

                    Comment


                    • Dear all,

                      I'm had the same problem as oudacontrol above and Jeckow earlier in this thread.

                      Trying to get samtools view to extract one chromosome from a sorted and indexed bam file fails:
                      Code:
                      [sam_header_read2] 84 sequences loaded.
                      [main_samview] random alignment retrieval only works for indexed BAM files.
                      It took me a long while to figure out what caused it, so I thought I'd post my simple solution for poor sobs like me...

                      Code:
                      samtools sort in.bam in.sorted
                      Code:
                      samtools index in.sorted.bam
                      Code:
                      samtools view -bh -t human_g1k_v37.fa.fai -o in.sorted.chr9.bam in.sorted.bam 9
                      As it turned out, I should not have used the -t option. Without it, view finished as it should have, leaving me with a nice chr9-only bam file! (at least, it's a 279M file, I'll have to check later if trackster or igv will load it )

                      Cheers,
                      Bruins

                      Comment


                      • Does anyone have nice perl snippet to calculate the end point of the read in a sam file?

                        Comment


                        • maq2sam-long problems

                          Dear all,

                          Originally posted by xmluo View Post
                          I used maq2sam-long to convert maq output to sam format, but all pairing information is missed in the results: MRNM is "*", and MPOS and ISIZE are 0. Can you recommend how to get these information? Thanks, Mei

                          I run into the same questions.

                          Another question is about the statistics for the results of out.map and out.bam.

                          1. how can I calculate the mapping rate?
                          95%(1903176 / 2000000) from maq or 95.68%(1820937 / 1903176) from the converted bam?

                          2. how can I calculate the unique mapped pair-end reads number?
                          I have conceived a method: according to the sam format spcification, the TAG "H0:i:1 H1:i:0", give the Number of perfect hits (H0) and Number of 1-di erence hits (H1), so we can deduce that one pair is uniquely mapped if and only if any one end's H0:i:1 or "H0:i:0 and H1:i:1", can it work?

                          3. how to get the MRNM, MPOS and ISIZE?
                          In my case, the ISIZE is normal. I found that the flag value in the second column is normal too. The mate read also given at another line, so we can fill the MRNM and MPOS columns accordingly. can it work?

                          Code:
                          ##suppose we got the output file of "maq map" is out.map. then
                          ##got out.sm
                          samtools-0.1.18/misc/maq2sam-long out.map  > out.sam
                          ##got the sorted bam file out.bam
                          samtools-0.1.18/samtools view -ut reference.fa.fai out.sam | samtools  sort - out
                          ##got the statistics results
                          ##maq
                          maq-0.7.1/maq.pl statmap nohup.out > out.map.stat
                          samtools flagstat out.bam > out.bam.stat
                          
                          -bash-3.2$ more out.map.stat 
                          
                          -- == statmap report ==
                          
                          -- # single end (SE) reads: 0
                          -- # mapped SE reads: 0 (/ 0 = NA%)
                          -- # paired end (PE) reads: 2000000
                          -- # mapped PE reads: 1903176 (/ 2000000 = 95.15%)
                          -- # reads that are mapped in pairs: 1756127 (/ 1903176 = 92.27%)
                          -- # Q>=30 reads that are moved to meet mate-pair requirement: 2962 (/ 1756127 = 0.16%)
                          -- # Q<30 reads that are moved to meet mate-pair requirement: 203102 (11.56%)
                          
                          -bash-3.2$ more out.bam.stat 
                          1903176 + 0 in total (QC-passed reads + QC-failed reads)
                          0 + 0 duplicates
                          1820937 + 0 mapped (95.68%:nan%)
                          1903176 + 0 paired in sequencing
                          951588 + 0 read1
                          951588 + 0 read2
                          1673888 + 0 properly paired (87.95%:nan%)
                          1738698 + 0 with itself and mate mapped
                          82239 + 0 singletons (4.32%:nan%)
                          1738698 + 0 with mate mapped to a different chr
                          1471758 + 0 with mate mapped to a different chr (mapQ>=5)
                          Any suggestions are appreciated, Thank you all.

                          pengchy

                          Comment


                          • Hi,
                            I have found the solution.
                            1. the mapping rate is 1820937/2000000

                            2. the unique mapped reads can be got from:
                            for the single end: "H0:i:1 H1:i:A" or "H0:i:0 H1:i:1" or "H0:i:0 H1:i:0" ##here, "A" stand for any number
                            for the pair end: any end has above tags.

                            3. Yes, it is.
                            Last edited by pengchy; 10-21-2011, 05:51 PM. Reason: a mistake

                            Comment


                            • I have the same problem now, do you have any solution now?
                              Thanks Alexey

                              Comment


                              • Originally posted by dtc View Post
                                I have the same problem now, do you have any solution now?
                                Thanks Alexey
                                I did my own version of this in the end using CIGAR string. However it does not take into account 'X', '=', 'N' or 'P' tags.

                                Code:
                                sub last_pos {
                                  my @read = @_;
                                  local $_;
                                  my ($match,$del) = (0,0);
                                  while ($read[$CIGAR] =~ m/(\d+)M/g) {
                                    $match += $1;
                                  }
                                  while ($read[$CIGAR] =~ m/(\d+)D/g) {
                                    $del += $1;
                                  }
                                  return $read[$POS] + $match + $del -1;
                                }
                                So counting the match/mismatch and deletion tags should give the number of nucleotides it will span in reference.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                7 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                7 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                66 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X