Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate number of multi-mapped reads?

    Hi, does anyone know how to calculate the number of multi-mapped reads from your bam file?

    This seems like an obvious/silly question but the latest versions of Cufflinks no longer provides this statistic in the basic output and I have been looking everywhere (log files, samtools stats, this website, etc) but can't figure out how to find this out!

    Thanks!

  • #2
    Although primarily to tally up read counts for genes, HT-seq count also adds up the multimap reads "alignment not unique."

    Comment


    • #3
      Thanks! I'll look into it.

      Comment


      • #4
        Something like this might work.

        Code:
        samtools view -F 4 file.bam | awk '{printf $1"\n"}' | sort | uniq -d | wc -l
        
        Explanation
        -----------
        samtools view -F 4 file.bam
            print all lines in SAM format except those flagged as unmapped
        
        awk '{printf $1"\n"}'
            print the first field in the line (the read name)
        
        sort | uniq -d | wc -l
            sort the read names
            print only the duplicates (i.e. those appearing more than once)
            count the lines

        Comment


        • #5
          Thanks, great idea and thanks for the detailed, explanation! I'll give it a try

          Comment


          • #6
            Retrieve mapped reads to certain regions (eg. between 120-140 in a gene)?

            I have my own unannotated reference data base. With it, the reads were mapped with Bowtie, now I need to retrieve all reads which mapped to certain region between 120-140 nt/each gene, how can I do it with samtools ? Thanks in advance.

            Comment


            • #7
              Make a BED file with the gene locations, flank that with bedtools (or awk), and then use it with the "-L" option to "samtools view".

              Comment


              • #8
                more specifics

                Would you please give me more details ? Thanks.

                Comment


                • #9
                  Which part in particular requires more detail?

                  Comment


                  • #10
                    example

                    >Sg01_138835_A_G
                    TGCGAAGTATGCCACCAACATGTTTCACCGCATAACCTTGATCACCTTGGACATTCTGCCACAAGCAGACGACGAGTTCCTCAGAAAATTCATGGACAAATTTGGGCGCACGACCATCTA [GATCGAAAATCTTAATCCCG]AGCTAGTGTTGCATTCCATGCTCCTCGTCTTACACTCCAACAAGTTTGCAGAAAGTTTGTCAAAAATCCACTCGGTAGCATGGACGAGCTGCACGAGTGAGCCAAGGGTTACATATAGATGGAAGAAATGTCCAAATTCTGGAAAGAAGTCCGTCAAGCCGAACACAAGCACGAAAAGTGCAAAGCAAACTCCAAGGCCGACTTGCACAAGTCAAACAAGAGGCCCAATTCAGACAAGCGCTAACCTCTCCCCAAAGAGC

                    I want to retrieve reads aligned to [GATCGAAAATCTTAATCCCG] within gene above.

                    Comment


                    • #11
                      If you aligned to the gene itself, then "samtools view alignments.bam Sg01_138835_A_G:120-140". If you aligned to the genome then find out where in the genome that is.

                      Comment


                      • #12
                        how about multiple genes in reference

                        >Sg01_138835_A_G
                        TGCGAAGTATGCCACCAACATGTTTCACCGCATAACCTTGATCACCTTGGACATTCTGCCACAAGCAGACGACGAGTTCCTCAGAAAATTCATGGACAAATTTGGGCGCACGACCATCTAGATCGAAAATCTTAATCCCGAGCTAGTGTTGCATTCCATGCTCCTCGTCTTACACTCCAACAAGTTTGCAGAAAGTTTGTCAAAAATCCACTCGGTAGCATGGACGAGCTGCACGAGTGAGCCAAGGGTTACATATAGATGGAAGAAATGTCCAAATTCTGGAAAGAAGTCCGTCAAGCCGAACACAAGCACGAAAAGTGCAAAGCAAACTCCAAGGCCGACTTGCACAAGTCAAACAAGAGGCCCAATTCAGACAAGCGCTAACCTCTCCCCAAAGAGC
                        >Sg01_281790_T_C
                        CATCTATCTATTGCTTCAACCATGAAAATAATTATTTTCAAAGGCTTGAGTAATAATGTTTACCAATATAATGCCTATAATAATGTTACAAAATTCTAGTTTATATACCAAGTTTGATCGATGTTTGTAAAATTTAGCATATTCTTGCCTTGAATATCCTTTTGACACACACAAATACCATATAACTAACCCACAATATTCTGACTTATATGGGTGCAGCGGATCATTTGAGCGAAGCTGTGGGCGACAACGAATCCAAGGAACTTTTTGAATCATTTCGAGTCTTCTGCAATAGAAAAAGTGTTAGTACTAATGCATTTTCATCATTCAACAATTAATATAAATATATATATATATAGAGAGAGAGAGAGAGAGAGAAGTGCACTCATGAACAAAATCC
                        >Sg01_334575_A_G
                        ATTTTCTTAGTCCGCAAAAGTTAAACAGTATGTTAAGAATATAGGTTAAGAAACTTAAAAGTAAAAATATTTTTATTAAGAGGTGAAAAATTTTGTACAATAACTTGACTTGGAAAATTCTGCAGTTCCTGTTCTTCCAAATGTAATCTATGGACTATGGCAACAACAAAACTCAGAGACGCTACGTCCATTACACTTCACTTGGACAGTAAGTTAGACATCAACCAGGGGCCAAATTCTTGATGAAAAAAGCCTGCACGAACAGCATCTGGGATTAAAACATTCCAGACCTCCTTAGCCGCATAACATTTGGATGGATCATATAATTATATGGATAGATAACTATATGGATCATAATTAATTCAGCATAATTTTTTTTAATACTCTCTAAGCGCCTCTC
                        >Sg01_402061_A_C
                        CATAACCTGATAATATATTTTATATATCTACTATGAAAATAAATAGAATATCATATATACTAAGTAAGAAACGATCTGGTGTGTATCAAACAATGAATAATAAAATGTTGTTATCCAATTCTTATATAATATCTTATTTTAAGCAACAATCAAACAATGAATATTTTAAGAACTTATTTATCATGTTCTTATGGAGGCCTCTAGGGTAAGTCGTGACTGTATAGCATAGAAAATAAAGCTGCAGGAGCGTGGTATATCTTGTCAGAGCATGTATGTGTCTTGAGTGTGAAAGAGGGTTGGAGAAAGCAAAGGAATGTTGGAGGTCATTATTGGCTTCAAGCAGAGAGTGCTTACGAACCTATTAAAGTTGGAGCAGTGAGATATATAGTATAGTTGAGCT

                        If I want retrieve reads mapped to same region on multiple (several thousands) genes, do I need to list all, like this, Sg01_138835_A_G:120-140, Sg01_281790_T_C:120-140 ... ?

                        Comment


                        • #13
                          Make a BED file of the regions and use that with the "-L" option.

                          Comment


                          • #14
                            thanks.

                            I will do that.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 06:37 PM
                            0 responses
                            8 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 06:07 PM
                            0 responses
                            8 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            49 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            66 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X