Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by seqsyd View Post
    can any 1 tell me please how can i install SAMtools on my windows operating system...Help Please...thanks
    Have you tried the provided Windows binaries on sourceforge? i.e. samtools-0.1.12a_i386-win32.zip

    [I'd suggest getting hold of a Linux (or Mac OS X) machine if you can, since not that many of the NGS tools will work on Windows]

    Comment


    • optional fields in SAM format

      Could anybody advise me the best way to parse optional fields in SAM file? I'm interested in tags X0 and XA, and the problem is that they do not have a permanent place in the output string (e.g., sometimes 13th or 14th, 18th, 19th or 20th). Certainly, I can do it myself in perl, but such parsing would take too much time and resources.
      What I'm looking for is:
      1) either a way to control output tags (their number and order)
      2) or (better) some bioperl module, like blast parser, that could parse it effectively.
      thanks!

      Comment


      • For perl, very easy and efficient:

        perl -ne 'print "$1\n" if /XA:i\d+)/'

        Comment


        • Originally posted by lh3 View Post
          For perl, very easy and efficient:

          perl -ne 'print "$1\n" if /XA:i\d+)/'
          It's exactly what I want to avoid - checking each line with IF condition.

          Comment


          • This is fast. If present, the ordering of tags are fixed. You can

            if /X0:i\d+).*XA:Z\S+)/

            to grep out everything with a single pass. You need to read and split after all. Regex matching should be of similar speed, if not faster. The bad implementation is to loop through the optional tags and then try to find matching.
            Last edited by lh3; 04-15-2011, 06:59 AM.

            Comment


            • Thank you, Heng!

              Comment


              • I converted some ELAND export files to SAM and now tried to make BAM files out of those, but it is complaining about @SQ headers not being there?!
                Does the export2sam.pl tool not create a SAM file format that is correct??

                Comment


                • I am trying to view a portion of a sorted, indexed bam file and I am getting the error:
                  [sam_header_read2] 26 sequences loaded.
                  [main_samview] random alignment retrieval only works for indexed BAM files.

                  It is already sorted and indexed.... I have no idea how to resolve this.

                  Comment


                  • @oudacontrol: You should post the command you used. It seems like the command may have been mis-formatted, but there's no way to tell if you don't include it.

                    Comment


                    • Dear all,

                      I'm had the same problem as oudacontrol above and Jeckow earlier in this thread.

                      Trying to get samtools view to extract one chromosome from a sorted and indexed bam file fails:
                      Code:
                      [sam_header_read2] 84 sequences loaded.
                      [main_samview] random alignment retrieval only works for indexed BAM files.
                      It took me a long while to figure out what caused it, so I thought I'd post my simple solution for poor sobs like me...

                      Code:
                      samtools sort in.bam in.sorted
                      Code:
                      samtools index in.sorted.bam
                      Code:
                      samtools view -bh -t human_g1k_v37.fa.fai -o in.sorted.chr9.bam in.sorted.bam 9
                      As it turned out, I should not have used the -t option. Without it, view finished as it should have, leaving me with a nice chr9-only bam file! (at least, it's a 279M file, I'll have to check later if trackster or igv will load it )

                      Cheers,
                      Bruins

                      Comment


                      • Does anyone have nice perl snippet to calculate the end point of the read in a sam file?

                        Comment


                        • maq2sam-long problems

                          Dear all,

                          Originally posted by xmluo View Post
                          I used maq2sam-long to convert maq output to sam format, but all pairing information is missed in the results: MRNM is "*", and MPOS and ISIZE are 0. Can you recommend how to get these information? Thanks, Mei

                          I run into the same questions.

                          Another question is about the statistics for the results of out.map and out.bam.

                          1. how can I calculate the mapping rate?
                          95%(1903176 / 2000000) from maq or 95.68%(1820937 / 1903176) from the converted bam?

                          2. how can I calculate the unique mapped pair-end reads number?
                          I have conceived a method: according to the sam format spcification, the TAG "H0:i:1 H1:i:0", give the Number of perfect hits (H0) and Number of 1-di erence hits (H1), so we can deduce that one pair is uniquely mapped if and only if any one end's H0:i:1 or "H0:i:0 and H1:i:1", can it work?

                          3. how to get the MRNM, MPOS and ISIZE?
                          In my case, the ISIZE is normal. I found that the flag value in the second column is normal too. The mate read also given at another line, so we can fill the MRNM and MPOS columns accordingly. can it work?

                          Code:
                          ##suppose we got the output file of "maq map" is out.map. then
                          ##got out.sm
                          samtools-0.1.18/misc/maq2sam-long out.map  > out.sam
                          ##got the sorted bam file out.bam
                          samtools-0.1.18/samtools view -ut reference.fa.fai out.sam | samtools  sort - out
                          ##got the statistics results
                          ##maq
                          maq-0.7.1/maq.pl statmap nohup.out > out.map.stat
                          samtools flagstat out.bam > out.bam.stat
                          
                          -bash-3.2$ more out.map.stat 
                          
                          -- == statmap report ==
                          
                          -- # single end (SE) reads: 0
                          -- # mapped SE reads: 0 (/ 0 = NA%)
                          -- # paired end (PE) reads: 2000000
                          -- # mapped PE reads: 1903176 (/ 2000000 = 95.15%)
                          -- # reads that are mapped in pairs: 1756127 (/ 1903176 = 92.27%)
                          -- # Q>=30 reads that are moved to meet mate-pair requirement: 2962 (/ 1756127 = 0.16%)
                          -- # Q<30 reads that are moved to meet mate-pair requirement: 203102 (11.56%)
                          
                          -bash-3.2$ more out.bam.stat 
                          1903176 + 0 in total (QC-passed reads + QC-failed reads)
                          0 + 0 duplicates
                          1820937 + 0 mapped (95.68%:nan%)
                          1903176 + 0 paired in sequencing
                          951588 + 0 read1
                          951588 + 0 read2
                          1673888 + 0 properly paired (87.95%:nan%)
                          1738698 + 0 with itself and mate mapped
                          82239 + 0 singletons (4.32%:nan%)
                          1738698 + 0 with mate mapped to a different chr
                          1471758 + 0 with mate mapped to a different chr (mapQ>=5)
                          Any suggestions are appreciated, Thank you all.

                          pengchy

                          Comment


                          • Hi,
                            I have found the solution.
                            1. the mapping rate is 1820937/2000000

                            2. the unique mapped reads can be got from:
                            for the single end: "H0:i:1 H1:i:A" or "H0:i:0 H1:i:1" or "H0:i:0 H1:i:0" ##here, "A" stand for any number
                            for the pair end: any end has above tags.

                            3. Yes, it is.
                            Last edited by pengchy; 10-21-2011, 05:51 PM. Reason: a mistake

                            Comment


                            • I have the same problem now, do you have any solution now?
                              Thanks Alexey

                              Comment


                              • Originally posted by dtc View Post
                                I have the same problem now, do you have any solution now?
                                Thanks Alexey
                                I did my own version of this in the end using CIGAR string. However it does not take into account 'X', '=', 'N' or 'P' tags.

                                Code:
                                sub last_pos {
                                  my @read = @_;
                                  local $_;
                                  my ($match,$del) = (0,0);
                                  while ($read[$CIGAR] =~ m/(\d+)M/g) {
                                    $match += $1;
                                  }
                                  while ($read[$CIGAR] =~ m/(\d+)D/g) {
                                    $del += $1;
                                  }
                                  return $read[$POS] + $match + $del -1;
                                }
                                So counting the match/mismatch and deletion tags should give the number of nucleotides it will span in reference.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 08:47 AM
                                0 responses
                                14 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                54 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X