Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filter BAM/SAM based on read length

    Dear All,

    I have a BAM file that contains reads with different length.
    Could you please tell me how can I filter only those reads that are longer/shorter than a given value and write it in a new BAM file?

    I have checked existing forum topics, but I did not find any that could help me.
    I also tried NGSutils, but it just does not work ( installation failed, the commands/programs i.e. bamutils are "not found").

    Unfortunately I have no programming/bioinfo background. I'm just a biologist who have to do everything , So if it is possible please be a bit more detailed.

    Thank you in advance,

  • #2
    You can do that with Reformat:

    reformat.sh in=x.sam out=y.sam minlength=50 maxlength=200

    It works with bam as well if samtools is installed and in the path. Reformat is in the BBMap package. To install that, you just unzip it and untar it.

    Comment


    • #3
      Hi,

      Thank you for the quick response.
      I downloaded the BBMap tar from here: http://sourceforge.net/projects/bbma...e=typ_redirect

      I used tar -xzf BBMap.tar.gz command.

      Do I have to install it somehow? Because when i try to run it the way you wrote is not working. (I have samtools and Java installed )

      I tried this command ( in the directory where the program was extracted):
      sh /home/user/Downloads/bbmap/reformat.sh in=/home/user/Desktop/allbams/file.bam out=/home/user/Desktop/fileout.bam minlength=1 maxlength=41
      then it gave me the following error:
      /home/user/Downloads/bbmap/reformat.sh: 4: /home/user/Downloads/bbmap/reformat.sh: Syntax error: "(" unexpected

      Do I have to put these programs to PATH? Or how exacty do I have to run them?

      Sorry for the trouble,

      Comment


      • #4
        No install is required for BBMap tools. Brian uses bash shell by default.

        Can you try:

        Code:
        $ /home/user/Downloads/bbmap/reformat.sh in=/home/user/Desktop/allbams/file.bam out=/home/user/Desktop/fileout.bam minlength=1 maxlength=41
        If you use "sh" (this is ubuntu?) then it will use dash.

        Comment


        • #5
          Thank you for the response GenoMax.
          The code helped. However, I have samtools installed but the program gave an error message with my BAM files. Now i try to convert them to SAM to see if it works.

          Comment


          • #6
            Ah yes. You have to convert the BAM file to SAM. I only focused on the error you had seen.

            Comment


            • #7
              Originally posted by krapulaxdoctor View Post
              Thank you for the response GenoMax.
              The code helped. However, I have samtools installed but the program gave an error message with my BAM files. Now i try to convert them to SAM to see if it works.
              Can you post the error message? If samtools is installed and in the path, it should work fine.

              Comment


              • #8
                Yes, Brian, here it is:

                java -ea -Xmx200m -cp /home/user/Downloads/bbmap/current/ jgi.ReformatReads in=/home/user/Desktop/allbams/RNA.bam out=/home/user/Desktop/RNAout.bam minlength=1 maxlength=41
                Executing jgi.ReformatReads [in=/home/user/Desktop/allbams/RNA.bam, out=/home/user/Desktop/RNA.bam, minlength=1, maxlength=41]

                Found samtools.
                Input is being processed as unpaired
                [E::sam_parse1] missing SAM header
                [W::sam_read1] parse error at line 3
                [main_samview] truncated file.
                Exception in thread "Thread-4" java.lang.RuntimeException: java.io.IOException: Broken pipe
                at stream.ReadStreamByteWriter.run(ReadStreamByteWriter.java:31)
                Caused by: java.io.IOException: Broken pipe
                at java.io.FileOutputStream.writeBytes(Native Method)
                at java.io.FileOutputStream.write(FileOutputStream.java:345)
                at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
                at stream.ReadStreamByteWriter.writeSam(ReadStreamByteWriter.java:533)
                at stream.ReadStreamByteWriter.processJobs(ReadStreamByteWriter.java:86)
                at stream.ReadStreamByteWriter.run2(ReadStreamByteWriter.java:41)
                at stream.ReadStreamByteWriter.run(ReadStreamByteWriter.java:27)
                Exception in thread "main" java.lang.RuntimeException: Writing to a terminated thread.
                at stream.ConcurrentGenericReadOutputStream.write(ConcurrentGenericReadOutputStream.java:198)
                at stream.ConcurrentGenericReadOutputStream.addDisordered(ConcurrentGenericReadOutputStream.java:193)
                at stream.ConcurrentGenericReadOutputStream.add(ConcurrentGenericReadOutputStream.java:97)
                at jgi.ReformatReads.process(ReformatReads.java:892)
                at jgi.ReformatReads.main(ReformatReads.java:46)

                Hope it helps.

                Comment


                • #9
                  Those errors indicate that the bam file is broken. Most of that text comes from Reformat, but these 3 lines:
                  Code:
                  [E::sam_parse1] missing SAM header
                  [W::sam_read1] parse error at line 3
                  [main_samview] truncated file.
                  ...come from samtools. Are you able to convert it to a sam successfully?

                  Comment


                  • #10
                    Yes,
                    samtools can convert them into SAM with the:
                    $ samtools view -h -o out.sam in.bam
                    BBmap can deal with these SAM files.

                    Comment


                    • #11
                      Actually, you were right. There must be a problem with the header. It can be converted to SAM but I cant convert it back to BAM. Also, I cant convert it into BED. Do you know any way to fix the headers? ( unfortunately I have only these files with the crappy header ) Or to generate new header for the SAM files from scratch?
                      Sorry if I ask too many questions.

                      Comment


                      • #12
                        Can you post the first few lines of your sam file?

                        Code:
                        $ samtools view -h your.bam | more

                        Comment


                        • #13
                          @SQ SN:chr1 LN:195471971
                          @SQ SN:chr2 LN:182113224
                          @SQ SN:chr3 LN:160039680
                          @SQ SN:chr4 LN:156508116
                          @SQ SN:chr5 LN:151834684
                          @SQ SN:chr6 LN:149736546
                          @SQ SN:chr7 LN:145441459
                          @SQ SN:chr8 LN:129401213
                          @SQ SN:chr9 LN:124595110
                          @SQ SN:chr10 LN:130694993
                          @SQ SN:chr11 LN:122082543
                          @SQ SN:chr12 LN:120129022
                          @SQ SN:chr13 LN:120421639
                          @SQ SN:chr14 LN:124902244
                          @SQ SN:chr15 LN:104043685
                          @SQ SN:chr16 LN:98207768
                          @SQ SN:chr17 LN:94987271
                          @SQ SN:chr18 LN:90702639
                          @SQ SN:chr19 LN:61431566
                          @SQ SN:chrM LN:16299
                          @SQ SN:chrX LN:171031299
                          @SQ SN:chrY LN:91744698
                          HISEQLN:122:HCW3JADXX:2:2109:6425:52017 272 chr1 3001923 0
                          43M * 0 0 TCTGATTATTATGTGTCAGGAGGAATTTCTTTTCTGGTCCAAT
                          JJJJJJJJIIIJJJJJJJJJJJJJJJJIJJHHHHHFFFFFCCC AS:i:0 XN:i:0 XM:i:0 XO:i:0
                          XG:i:0 NM:i:0 MD:Z:43 YT:Z:UU NH:i:20 CC:Z:= CP:i:48081495 XS:A:- HI:i:0

                          Comment


                          • #14
                            Well, it's missing the HD line. Try adding this to the beginning of one of the sam files:
                            "@HD VN:1.3 SO:unsorted"

                            However, that's not required, so I'm not sure what the problem is. It might help if you encapsulate what you pasted with [ code ] [ / code ] (without the spaces) to make the tabs come through:
                            Code:
                            hello	world
                            vs
                            Code:
                            hello world

                            Comment


                            • #15
                              And also, when I try to convert these BAM files into BED files, it generates a file, but the program (bedtools bamtobed)gives this error message:

                              terminate called after throwing an instance of 'std :: out_of_range'
                              what(): vector::_M_range_check
                              Aborted (core dumped)

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              27 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              26 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X