Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate number of multi-mapped reads?

    Hi, does anyone know how to calculate the number of multi-mapped reads from your bam file?

    This seems like an obvious/silly question but the latest versions of Cufflinks no longer provides this statistic in the basic output and I have been looking everywhere (log files, samtools stats, this website, etc) but can't figure out how to find this out!

    Thanks!

  • #2
    Although primarily to tally up read counts for genes, HT-seq count also adds up the multimap reads "alignment not unique."

    Comment


    • #3
      Thanks! I'll look into it.

      Comment


      • #4
        Something like this might work.

        Code:
        samtools view -F 4 file.bam | awk '{printf $1"\n"}' | sort | uniq -d | wc -l
        
        Explanation
        -----------
        samtools view -F 4 file.bam
            print all lines in SAM format except those flagged as unmapped
        
        awk '{printf $1"\n"}'
            print the first field in the line (the read name)
        
        sort | uniq -d | wc -l
            sort the read names
            print only the duplicates (i.e. those appearing more than once)
            count the lines

        Comment


        • #5
          Thanks, great idea and thanks for the detailed, explanation! I'll give it a try

          Comment


          • #6
            Retrieve mapped reads to certain regions (eg. between 120-140 in a gene)?

            I have my own unannotated reference data base. With it, the reads were mapped with Bowtie, now I need to retrieve all reads which mapped to certain region between 120-140 nt/each gene, how can I do it with samtools ? Thanks in advance.

            Comment


            • #7
              Make a BED file with the gene locations, flank that with bedtools (or awk), and then use it with the "-L" option to "samtools view".

              Comment


              • #8
                more specifics

                Would you please give me more details ? Thanks.

                Comment


                • #9
                  Which part in particular requires more detail?

                  Comment


                  • #10
                    example

                    >Sg01_138835_A_G
                    TGCGAAGTATGCCACCAACATGTTTCACCGCATAACCTTGATCACCTTGGACATTCTGCCACAAGCAGACGACGAGTTCCTCAGAAAATTCATGGACAAATTTGGGCGCACGACCATCTA [GATCGAAAATCTTAATCCCG]AGCTAGTGTTGCATTCCATGCTCCTCGTCTTACACTCCAACAAGTTTGCAGAAAGTTTGTCAAAAATCCACTCGGTAGCATGGACGAGCTGCACGAGTGAGCCAAGGGTTACATATAGATGGAAGAAATGTCCAAATTCTGGAAAGAAGTCCGTCAAGCCGAACACAAGCACGAAAAGTGCAAAGCAAACTCCAAGGCCGACTTGCACAAGTCAAACAAGAGGCCCAATTCAGACAAGCGCTAACCTCTCCCCAAAGAGC

                    I want to retrieve reads aligned to [GATCGAAAATCTTAATCCCG] within gene above.

                    Comment


                    • #11
                      If you aligned to the gene itself, then "samtools view alignments.bam Sg01_138835_A_G:120-140". If you aligned to the genome then find out where in the genome that is.

                      Comment


                      • #12
                        how about multiple genes in reference

                        >Sg01_138835_A_G
                        TGCGAAGTATGCCACCAACATGTTTCACCGCATAACCTTGATCACCTTGGACATTCTGCCACAAGCAGACGACGAGTTCCTCAGAAAATTCATGGACAAATTTGGGCGCACGACCATCTAGATCGAAAATCTTAATCCCGAGCTAGTGTTGCATTCCATGCTCCTCGTCTTACACTCCAACAAGTTTGCAGAAAGTTTGTCAAAAATCCACTCGGTAGCATGGACGAGCTGCACGAGTGAGCCAAGGGTTACATATAGATGGAAGAAATGTCCAAATTCTGGAAAGAAGTCCGTCAAGCCGAACACAAGCACGAAAAGTGCAAAGCAAACTCCAAGGCCGACTTGCACAAGTCAAACAAGAGGCCCAATTCAGACAAGCGCTAACCTCTCCCCAAAGAGC
                        >Sg01_281790_T_C
                        CATCTATCTATTGCTTCAACCATGAAAATAATTATTTTCAAAGGCTTGAGTAATAATGTTTACCAATATAATGCCTATAATAATGTTACAAAATTCTAGTTTATATACCAAGTTTGATCGATGTTTGTAAAATTTAGCATATTCTTGCCTTGAATATCCTTTTGACACACACAAATACCATATAACTAACCCACAATATTCTGACTTATATGGGTGCAGCGGATCATTTGAGCGAAGCTGTGGGCGACAACGAATCCAAGGAACTTTTTGAATCATTTCGAGTCTTCTGCAATAGAAAAAGTGTTAGTACTAATGCATTTTCATCATTCAACAATTAATATAAATATATATATATATAGAGAGAGAGAGAGAGAGAGAAGTGCACTCATGAACAAAATCC
                        >Sg01_334575_A_G
                        ATTTTCTTAGTCCGCAAAAGTTAAACAGTATGTTAAGAATATAGGTTAAGAAACTTAAAAGTAAAAATATTTTTATTAAGAGGTGAAAAATTTTGTACAATAACTTGACTTGGAAAATTCTGCAGTTCCTGTTCTTCCAAATGTAATCTATGGACTATGGCAACAACAAAACTCAGAGACGCTACGTCCATTACACTTCACTTGGACAGTAAGTTAGACATCAACCAGGGGCCAAATTCTTGATGAAAAAAGCCTGCACGAACAGCATCTGGGATTAAAACATTCCAGACCTCCTTAGCCGCATAACATTTGGATGGATCATATAATTATATGGATAGATAACTATATGGATCATAATTAATTCAGCATAATTTTTTTTAATACTCTCTAAGCGCCTCTC
                        >Sg01_402061_A_C
                        CATAACCTGATAATATATTTTATATATCTACTATGAAAATAAATAGAATATCATATATACTAAGTAAGAAACGATCTGGTGTGTATCAAACAATGAATAATAAAATGTTGTTATCCAATTCTTATATAATATCTTATTTTAAGCAACAATCAAACAATGAATATTTTAAGAACTTATTTATCATGTTCTTATGGAGGCCTCTAGGGTAAGTCGTGACTGTATAGCATAGAAAATAAAGCTGCAGGAGCGTGGTATATCTTGTCAGAGCATGTATGTGTCTTGAGTGTGAAAGAGGGTTGGAGAAAGCAAAGGAATGTTGGAGGTCATTATTGGCTTCAAGCAGAGAGTGCTTACGAACCTATTAAAGTTGGAGCAGTGAGATATATAGTATAGTTGAGCT

                        If I want retrieve reads mapped to same region on multiple (several thousands) genes, do I need to list all, like this, Sg01_138835_A_G:120-140, Sg01_281790_T_C:120-140 ... ?

                        Comment


                        • #13
                          Make a BED file of the regions and use that with the "-L" option.

                          Comment


                          • #14
                            thanks.

                            I will do that.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM
                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            30 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            32 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            28 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-04-2024, 09:00 AM
                            0 responses
                            53 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X