Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • KAP
    Member
    • Mar 2010
    • 12

    Calculate number of multi-mapped reads?

    Hi, does anyone know how to calculate the number of multi-mapped reads from your bam file?

    This seems like an obvious/silly question but the latest versions of Cufflinks no longer provides this statistic in the basic output and I have been looking everywhere (log files, samtools stats, this website, etc) but can't figure out how to find this out!

    Thanks!
  • chadn737
    Senior Member
    • Jan 2009
    • 392

    #2
    Although primarily to tally up read counts for genes, HT-seq count also adds up the multimap reads "alignment not unique."

    Comment

    • KAP
      Member
      • Mar 2010
      • 12

      #3
      Thanks! I'll look into it.

      Comment

      • polyatail
        Member
        • Dec 2010
        • 25

        #4
        Something like this might work.

        Code:
        samtools view -F 4 file.bam | awk '{printf $1"\n"}' | sort | uniq -d | wc -l
        
        Explanation
        -----------
        samtools view -F 4 file.bam
            print all lines in SAM format except those flagged as unmapped
        
        awk '{printf $1"\n"}'
            print the first field in the line (the read name)
        
        sort | uniq -d | wc -l
            sort the read names
            print only the duplicates (i.e. those appearing more than once)
            count the lines

        Comment

        • KAP
          Member
          • Mar 2010
          • 12

          #5
          Thanks, great idea and thanks for the detailed, explanation! I'll give it a try

          Comment

          • Johnwang
            Member
            • Jun 2016
            • 17

            #6
            Retrieve mapped reads to certain regions (eg. between 120-140 in a gene)?

            I have my own unannotated reference data base. With it, the reads were mapped with Bowtie, now I need to retrieve all reads which mapped to certain region between 120-140 nt/each gene, how can I do it with samtools ? Thanks in advance.

            Comment

            • dpryan
              Devon Ryan
              • Jul 2011
              • 3478

              #7
              Make a BED file with the gene locations, flank that with bedtools (or awk), and then use it with the "-L" option to "samtools view".

              Comment

              • Johnwang
                Member
                • Jun 2016
                • 17

                #8
                more specifics

                Would you please give me more details ? Thanks.

                Comment

                • dpryan
                  Devon Ryan
                  • Jul 2011
                  • 3478

                  #9
                  Which part in particular requires more detail?

                  Comment

                  • Johnwang
                    Member
                    • Jun 2016
                    • 17

                    #10
                    example

                    >Sg01_138835_A_G
                    TGCGAAGTATGCCACCAACATGTTTCACCGCATAACCTTGATCACCTTGGACATTCTGCCACAAGCAGACGACGAGTTCCTCAGAAAATTCATGGACAAATTTGGGCGCACGACCATCTA [GATCGAAAATCTTAATCCCG]AGCTAGTGTTGCATTCCATGCTCCTCGTCTTACACTCCAACAAGTTTGCAGAAAGTTTGTCAAAAATCCACTCGGTAGCATGGACGAGCTGCACGAGTGAGCCAAGGGTTACATATAGATGGAAGAAATGTCCAAATTCTGGAAAGAAGTCCGTCAAGCCGAACACAAGCACGAAAAGTGCAAAGCAAACTCCAAGGCCGACTTGCACAAGTCAAACAAGAGGCCCAATTCAGACAAGCGCTAACCTCTCCCCAAAGAGC

                    I want to retrieve reads aligned to [GATCGAAAATCTTAATCCCG] within gene above.

                    Comment

                    • dpryan
                      Devon Ryan
                      • Jul 2011
                      • 3478

                      #11
                      If you aligned to the gene itself, then "samtools view alignments.bam Sg01_138835_A_G:120-140". If you aligned to the genome then find out where in the genome that is.

                      Comment

                      • Johnwang
                        Member
                        • Jun 2016
                        • 17

                        #12
                        how about multiple genes in reference

                        >Sg01_138835_A_G
                        TGCGAAGTATGCCACCAACATGTTTCACCGCATAACCTTGATCACCTTGGACATTCTGCCACAAGCAGACGACGAGTTCCTCAGAAAATTCATGGACAAATTTGGGCGCACGACCATCTAGATCGAAAATCTTAATCCCGAGCTAGTGTTGCATTCCATGCTCCTCGTCTTACACTCCAACAAGTTTGCAGAAAGTTTGTCAAAAATCCACTCGGTAGCATGGACGAGCTGCACGAGTGAGCCAAGGGTTACATATAGATGGAAGAAATGTCCAAATTCTGGAAAGAAGTCCGTCAAGCCGAACACAAGCACGAAAAGTGCAAAGCAAACTCCAAGGCCGACTTGCACAAGTCAAACAAGAGGCCCAATTCAGACAAGCGCTAACCTCTCCCCAAAGAGC
                        >Sg01_281790_T_C
                        CATCTATCTATTGCTTCAACCATGAAAATAATTATTTTCAAAGGCTTGAGTAATAATGTTTACCAATATAATGCCTATAATAATGTTACAAAATTCTAGTTTATATACCAAGTTTGATCGATGTTTGTAAAATTTAGCATATTCTTGCCTTGAATATCCTTTTGACACACACAAATACCATATAACTAACCCACAATATTCTGACTTATATGGGTGCAGCGGATCATTTGAGCGAAGCTGTGGGCGACAACGAATCCAAGGAACTTTTTGAATCATTTCGAGTCTTCTGCAATAGAAAAAGTGTTAGTACTAATGCATTTTCATCATTCAACAATTAATATAAATATATATATATATAGAGAGAGAGAGAGAGAGAGAAGTGCACTCATGAACAAAATCC
                        >Sg01_334575_A_G
                        ATTTTCTTAGTCCGCAAAAGTTAAACAGTATGTTAAGAATATAGGTTAAGAAACTTAAAAGTAAAAATATTTTTATTAAGAGGTGAAAAATTTTGTACAATAACTTGACTTGGAAAATTCTGCAGTTCCTGTTCTTCCAAATGTAATCTATGGACTATGGCAACAACAAAACTCAGAGACGCTACGTCCATTACACTTCACTTGGACAGTAAGTTAGACATCAACCAGGGGCCAAATTCTTGATGAAAAAAGCCTGCACGAACAGCATCTGGGATTAAAACATTCCAGACCTCCTTAGCCGCATAACATTTGGATGGATCATATAATTATATGGATAGATAACTATATGGATCATAATTAATTCAGCATAATTTTTTTTAATACTCTCTAAGCGCCTCTC
                        >Sg01_402061_A_C
                        CATAACCTGATAATATATTTTATATATCTACTATGAAAATAAATAGAATATCATATATACTAAGTAAGAAACGATCTGGTGTGTATCAAACAATGAATAATAAAATGTTGTTATCCAATTCTTATATAATATCTTATTTTAAGCAACAATCAAACAATGAATATTTTAAGAACTTATTTATCATGTTCTTATGGAGGCCTCTAGGGTAAGTCGTGACTGTATAGCATAGAAAATAAAGCTGCAGGAGCGTGGTATATCTTGTCAGAGCATGTATGTGTCTTGAGTGTGAAAGAGGGTTGGAGAAAGCAAAGGAATGTTGGAGGTCATTATTGGCTTCAAGCAGAGAGTGCTTACGAACCTATTAAAGTTGGAGCAGTGAGATATATAGTATAGTTGAGCT

                        If I want retrieve reads mapped to same region on multiple (several thousands) genes, do I need to list all, like this, Sg01_138835_A_G:120-140, Sg01_281790_T_C:120-140 ... ?

                        Comment

                        • dpryan
                          Devon Ryan
                          • Jul 2011
                          • 3478

                          #13
                          Make a BED file of the regions and use that with the "-L" option.

                          Comment

                          • Johnwang
                            Member
                            • Jun 2016
                            • 17

                            #14
                            thanks.

                            I will do that.

                            Comment

                            Latest Articles

                            Collapse

                            • GATTACAT
                              Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                              by GATTACAT
                              Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                              Yesterday, 11:43 AM
                            • SEQadmin2
                              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                              by SEQadmin2


                              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                              Here are nine questions we think about, in roughly the order they matter, before...
                              06-18-2026, 07:11 AM
                            • SEQadmin2
                              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                              by SEQadmin2


                              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                              ...
                              06-02-2026, 10:05 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by SEQadmin2, 06-30-2026, 05:37 AM
                            0 responses
                            9 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-26-2026, 11:10 AM
                            0 responses
                            18 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-17-2026, 06:09 AM
                            0 responses
                            52 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-09-2026, 11:58 AM
                            0 responses
                            110 views
                            0 reactions
                            Last Post SEQadmin2  
                            Working...