Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Editing BAM file

    The below is the BAM file produced by running tophat.

    samtools view -h accepted_hits.bam
    @HD VN:1.0 SO:coordinate
    @SQ SN:Mt:3452:Mt3.5.1Chr1 LN:34297684
    @SQ SN:Mt:3453:Mt3.5.1Chr2 LN:34376035
    @SQ SN:Mt:3454:Mt3.5Chr3 LN:43359096
    @SQ SN:Mt:3455:Mt3.5.1Chr4 LN:47685368
    ...
    ...
    ...
    @SQ SN:Mt:3460:AC130806.58 LN:141498
    @SQ SN:Mt:3461:AC229707.11 LN:3289
    @SQ SN:Mt:3462:AC135504.61 LN:132492
    ...
    ...
    ...
    @PG ID:TopHat VN:2.0.4 CL:/usr/local/share/tophat-2.0.4.Linux_x86_64/tophat -p 6 -o IR_t1_Mt -G Mt.gff Mt 121108_I7001139_FCD1FBHACXX_L4_GSL_0040.06_1.fq,121108_I7001139_FCD1FBHACXX_L5_GSL_0040.06_1.fq,121108_I7001139_FCD
    1FBHACXX_L6_GSL_0040.06_1.fq,121108_I7001139_FCD1FBHACXX_L7_GSL_0040.06_1.fq,121108_I7001139_FCD1FBHACXX_L8_GSL_0040.06_1.fq
    FCD1FBHACXX:5:1114:18853:38567#GCCAATA 0 Mt:3452:Mt3.5.1Chr1 150172 50 100M * 0 0 CCCGGGCGCGTATTGCCGAAAAGAAGGTTTAGCTTCAGTTGTGCTACTCTAGACCTGCGGAAGACAGGTGCTGCATGCAAAGGCCTAAGTCCCGCGATGT bbbeeeeegggggi
    hhfigiiehdfffaegdfhiihhiiifcgegggeeeeeeddbddaaaYaa`bca__bbcbcbbcccccbaabb_bcdcbaX[aaa[ AS:i:0 X
    ...
    ...
    ...

    However, because of too many colons existing between SN and LN, it seems that cufflinks does not work.
    So, I think I will have to remove "Mt:3452:" part.

    My question is that "Could I use sed command to edit BAM file?"

  • #2
    Not directly. Something like

    Code:
    samtools view data.bam | sed whatever | samtools view -bSh > corrected.bam
    Would work.

    Comment


    • #3
      Originally posted by swbarnes2 View Post
      Not directly. Something like

      Code:
      samtools view data.bam | sed whatever | samtools view -bSh > corrected.bam
      Would work.
      Dear swbarnes2,

      Oh! Really helpful!! Thank you so much!!!!

      Comment


      • #4
        Originally posted by swbarnes2 View Post
        Not directly. Something like

        Code:
        samtools view data.bam | sed whatever | samtools view -bSh > corrected.bam
        Would work.

        When I put the command below,
        samtools view acc_hits.bam | sed 's/Mt\:[0-9][0-9][0-9][0-9]\://' | samtools view -bSh > corrected.bam

        the result is the follwing:
        Usage: samtools view [options] <in.bam>|<in.sam> [region1 [...]]

        Options: -b output BAM
        -h print header for the SAM output
        -H print header only (no alignments)
        -S input is SAM
        -u uncompressed BAM output (force -b)
        -x output FLAG in HEX (samtools-C specific)
        -X output FLAG in string (samtools-C specific)
        -c print only the count of matching records
        -t FILE list of reference names and lengths (force -S) [null]
        -T FILE reference sequence file (force -S) [null]
        -o FILE output file name [stdout]
        -R FILE list of read groups to be outputted [null]
        -f INT required flag, 0 for unset [0]
        -F INT filtering flag, 0 for unset [0]
        -q INT minimum mapping quality [0]
        -l STR only output reads in library STR [null]
        -r STR only output reads in read group STR [null]
        -? longer help


        I don't think it works.
        Could you see something wrong?

        Comment


        • #5
          Originally posted by syintel87 View Post
          However, because of too many colons existing between SN and LN, it seems that cufflinks does not work.
          So, I think I will have to remove "Mt:3452:" part.
          If you are right and editing the reference names like that fixes things, you've found a bug in cufflinks - If so please report it so that cufflinks can be fixed.

          Comment


          • #6
            Originally posted by maubp View Post
            If you are right and editing the reference names like that fixes things, you've found a bug in cufflinks - If so please report it so that cufflinks can be fixed.
            Yes, while running cufflinks, the error message appeared:
            Error: sort order of reads in BAMs must be the same

            According to the tips given in another posting, I am trying editing BAM file.
            (http://seqanswers.com/forums/showthread.php?t=9304)

            Though I sent an email to cufflinks, I haven't received a reply yet.

            Comment


            • #7
              Originally posted by syintel87 View Post
              When I put the command below,
              samtools view acc_hits.bam | sed 's/Mt\:[0-9][0-9][0-9][0-9]\://' | samtools view -bSh > corrected.bam


              I don't think it works.
              Could you see something wrong?
              Have you tried doing the three steps independently instead of a single command line?

              Comment


              • #8
                Originally posted by syintel87 View Post
                When I put the command below,
                samtools view acc_hits.bam | sed 's/Mt\:[0-9][0-9][0-9][0-9]\://' | samtools view -bSh > corrected.bam

                the result is the follwing:
                Usage: samtools view [options] <in.bam>|<in.sam> [region1 [...]]

                Options: -b output BAM
                -h print header for the SAM output
                -H print header only (no alignments)
                -S input is SAM
                -u uncompressed BAM output (force -b)
                -x output FLAG in HEX (samtools-C specific)
                -X output FLAG in string (samtools-C specific)
                -c print only the count of matching records
                -t FILE list of reference names and lengths (force -S) [null]
                -T FILE reference sequence file (force -S) [null]
                -o FILE output file name [stdout]
                -R FILE list of read groups to be outputted [null]
                -f INT required flag, 0 for unset [0]
                -F INT filtering flag, 0 for unset [0]
                -q INT minimum mapping quality [0]
                -l STR only output reads in library STR [null]
                -r STR only output reads in read group STR [null]
                -? longer help


                I don't think it works.
                Could you see something wrong?
                If you're still having a problem with this, I believe that you need to specify for samtools view to take input from STDIN at the third step. This is done by a single '-' IIRC. So like this...

                samtools view acc_hits.bam | sed 's/Mt\:[0-9][0-9][0-9][0-9]\://' | samtools view -bSh - > corrected.bam

                Comment


                • #9
                  It's likely the problem is due to trying to create a BAM file from input with no header (try a "samtools view -h acc_hits.bam" at the beginning of the command). You can't create a BAM file without a header.

                  Comment


                  • #10
                    It seems that errors do not occur by running the command below.

                    samtools view -h acc_hits.bam | sed 's/Mt\:[0-9][0-9][0-9][0-9]\://' | samtools view -bSh - > corrected.bam

                    However, seeing "corrected.bam", there was no change.
                    "sed" command does not seem to be applied to editing bam file.

                    Comment


                    • #11
                      Yes, I accidentally left out the dash in the second samtools view command. If nothing is changing, there's something wrong with the sed command. In general, piping in and out of samtools is the way to edit .bam files, it's better than making gigantic intermediate sam files that you are going to recompress anyway.

                      Comment


                      • #12
                        Originally posted by swbarnes2 View Post
                        Yes, I accidentally left out the dash in the second samtools view command. If nothing is changing, there's something wrong with the sed command. In general, piping in and out of samtools is the way to edit .bam files, it's better than making gigantic intermediate sam files that you are going to recompress anyway.
                        Right.

                        Originally posted by syintel87 View Post
                        It seems that errors do not occur by running the command below.

                        samtools view -h acc_hits.bam | sed 's/Mt\:[0-9][0-9][0-9][0-9]\://' | samtools view -bSh - > corrected.bam

                        However, seeing "corrected.bam", there was no change.
                        "sed" command does not seem to be applied to editing bam file.
                        I am surprised that nothing changes. This "sed" command seems correct to me and it works on the text from your first post. Note that this "sed" command after the first pipe does not exactly edit the bam file but the text flow from the "samtools view" command.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        27 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        30 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        26 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        52 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X