Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Working with BAM files

    Hi everybody!

    This is my first thread in this forum.
    Recently, I started an internship in a bioinformatics research group. Unfortunately, I have only little experience regarding programming, bioinformatic data handling, ...
    I have basic programming skills in Bash, Python and R, but that's it.

    My task is to inspect three BAM files (> 1 Mio reads). The three BAM files were generated using different methods. I want to find out which BAM files contain the same reads, which reads are only in BAM file 1, which reads are missing in BAM file 3 and so on.

    Can you give me some advice how to deal with this task? Do you have experiences in BAM file handling?

    Many greetings!

  • #2
    more details

    Maybe I should add some information:
    We took one sample and generated the BAM files using three different pipelines.
    At the moment, we are only interested in the read names (first column of the BAM files) and want to find out which reads are present in all BAM files, which are present in file 1, file 2, file 3...

    Comment


    • #3
      You could simply get the names (field 1 as you already note, sort | uniq them in bash) and do a "comm" comparison of the three results. If your aim is just to find which reads are present in all three files.

      Comment


      • #4
        progress

        Thank you for the answer!

        I already extracted the read names from all files separately using:

        > samtools sort -n bam_filename | samtools view | awk -F "\t" '{print $1}' > output_filename

        Now, my supervisor supposed to use python to do the rest of the task...
        Or can you recommend another possibility?

        Greetings

        Comment


        • #5
          I looked the "comm" command up. Sounds promising, but I am not sure if this works for such big data files with > 1 Million reads. Do you have an idea for a smart python-based solution?

          Nevertheless, I will try it also using comm.

          Greetings

          Comment


          • #6
            If this is an assignment then use what you have to but comm should work (as long as you have enough RAM available). Since you are working with only read names (if you are not then you should).

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            54 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            50 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            44 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X