Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • krapulaxdoctor
    Member
    • May 2015
    • 22

    Filter BAM/SAM based on read length

    Dear All,

    I have a BAM file that contains reads with different length.
    Could you please tell me how can I filter only those reads that are longer/shorter than a given value and write it in a new BAM file?

    I have checked existing forum topics, but I did not find any that could help me.
    I also tried NGSutils, but it just does not work ( installation failed, the commands/programs i.e. bamutils are "not found").

    Unfortunately I have no programming/bioinfo background. I'm just a biologist who have to do everything , So if it is possible please be a bit more detailed.

    Thank you in advance,
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    #2
    You can do that with Reformat:

    reformat.sh in=x.sam out=y.sam minlength=50 maxlength=200

    It works with bam as well if samtools is installed and in the path. Reformat is in the BBMap package. To install that, you just unzip it and untar it.

    Comment

    • krapulaxdoctor
      Member
      • May 2015
      • 22

      #3
      Hi,

      Thank you for the quick response.
      I downloaded the BBMap tar from here: http://sourceforge.net/projects/bbma...e=typ_redirect

      I used tar -xzf BBMap.tar.gz command.

      Do I have to install it somehow? Because when i try to run it the way you wrote is not working. (I have samtools and Java installed )

      I tried this command ( in the directory where the program was extracted):
      sh /home/user/Downloads/bbmap/reformat.sh in=/home/user/Desktop/allbams/file.bam out=/home/user/Desktop/fileout.bam minlength=1 maxlength=41
      then it gave me the following error:
      /home/user/Downloads/bbmap/reformat.sh: 4: /home/user/Downloads/bbmap/reformat.sh: Syntax error: "(" unexpected

      Do I have to put these programs to PATH? Or how exacty do I have to run them?

      Sorry for the trouble,

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        No install is required for BBMap tools. Brian uses bash shell by default.

        Can you try:

        Code:
        $ /home/user/Downloads/bbmap/reformat.sh in=/home/user/Desktop/allbams/file.bam out=/home/user/Desktop/fileout.bam minlength=1 maxlength=41
        If you use "sh" (this is ubuntu?) then it will use dash.

        Comment

        • krapulaxdoctor
          Member
          • May 2015
          • 22

          #5
          Thank you for the response GenoMax.
          The code helped. However, I have samtools installed but the program gave an error message with my BAM files. Now i try to convert them to SAM to see if it works.

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            Ah yes. You have to convert the BAM file to SAM. I only focused on the error you had seen.

            Comment

            • Brian Bushnell
              Super Moderator
              • Jan 2014
              • 2709

              #7
              Originally posted by krapulaxdoctor View Post
              Thank you for the response GenoMax.
              The code helped. However, I have samtools installed but the program gave an error message with my BAM files. Now i try to convert them to SAM to see if it works.
              Can you post the error message? If samtools is installed and in the path, it should work fine.

              Comment

              • krapulaxdoctor
                Member
                • May 2015
                • 22

                #8
                Yes, Brian, here it is:

                java -ea -Xmx200m -cp /home/user/Downloads/bbmap/current/ jgi.ReformatReads in=/home/user/Desktop/allbams/RNA.bam out=/home/user/Desktop/RNAout.bam minlength=1 maxlength=41
                Executing jgi.ReformatReads [in=/home/user/Desktop/allbams/RNA.bam, out=/home/user/Desktop/RNA.bam, minlength=1, maxlength=41]

                Found samtools.
                Input is being processed as unpaired
                [E::sam_parse1] missing SAM header
                [W::sam_read1] parse error at line 3
                [main_samview] truncated file.
                Exception in thread "Thread-4" java.lang.RuntimeException: java.io.IOException: Broken pipe
                at stream.ReadStreamByteWriter.run(ReadStreamByteWriter.java:31)
                Caused by: java.io.IOException: Broken pipe
                at java.io.FileOutputStream.writeBytes(Native Method)
                at java.io.FileOutputStream.write(FileOutputStream.java:345)
                at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
                at stream.ReadStreamByteWriter.writeSam(ReadStreamByteWriter.java:533)
                at stream.ReadStreamByteWriter.processJobs(ReadStreamByteWriter.java:86)
                at stream.ReadStreamByteWriter.run2(ReadStreamByteWriter.java:41)
                at stream.ReadStreamByteWriter.run(ReadStreamByteWriter.java:27)
                Exception in thread "main" java.lang.RuntimeException: Writing to a terminated thread.
                at stream.ConcurrentGenericReadOutputStream.write(ConcurrentGenericReadOutputStream.java:198)
                at stream.ConcurrentGenericReadOutputStream.addDisordered(ConcurrentGenericReadOutputStream.java:193)
                at stream.ConcurrentGenericReadOutputStream.add(ConcurrentGenericReadOutputStream.java:97)
                at jgi.ReformatReads.process(ReformatReads.java:892)
                at jgi.ReformatReads.main(ReformatReads.java:46)

                Hope it helps.

                Comment

                • Brian Bushnell
                  Super Moderator
                  • Jan 2014
                  • 2709

                  #9
                  Those errors indicate that the bam file is broken. Most of that text comes from Reformat, but these 3 lines:
                  Code:
                  [E::sam_parse1] missing SAM header
                  [W::sam_read1] parse error at line 3
                  [main_samview] truncated file.
                  ...come from samtools. Are you able to convert it to a sam successfully?

                  Comment

                  • krapulaxdoctor
                    Member
                    • May 2015
                    • 22

                    #10
                    Yes,
                    samtools can convert them into SAM with the:
                    $ samtools view -h -o out.sam in.bam
                    BBmap can deal with these SAM files.

                    Comment

                    • krapulaxdoctor
                      Member
                      • May 2015
                      • 22

                      #11
                      Actually, you were right. There must be a problem with the header. It can be converted to SAM but I cant convert it back to BAM. Also, I cant convert it into BED. Do you know any way to fix the headers? ( unfortunately I have only these files with the crappy header ) Or to generate new header for the SAM files from scratch?
                      Sorry if I ask too many questions.

                      Comment

                      • GenoMax
                        Senior Member
                        • Feb 2008
                        • 7142

                        #12
                        Can you post the first few lines of your sam file?

                        Code:
                        $ samtools view -h your.bam | more

                        Comment

                        • krapulaxdoctor
                          Member
                          • May 2015
                          • 22

                          #13
                          @SQ SN:chr1 LN:195471971
                          @SQ SN:chr2 LN:182113224
                          @SQ SN:chr3 LN:160039680
                          @SQ SN:chr4 LN:156508116
                          @SQ SN:chr5 LN:151834684
                          @SQ SN:chr6 LN:149736546
                          @SQ SN:chr7 LN:145441459
                          @SQ SN:chr8 LN:129401213
                          @SQ SN:chr9 LN:124595110
                          @SQ SN:chr10 LN:130694993
                          @SQ SN:chr11 LN:122082543
                          @SQ SN:chr12 LN:120129022
                          @SQ SN:chr13 LN:120421639
                          @SQ SN:chr14 LN:124902244
                          @SQ SN:chr15 LN:104043685
                          @SQ SN:chr16 LN:98207768
                          @SQ SN:chr17 LN:94987271
                          @SQ SN:chr18 LN:90702639
                          @SQ SN:chr19 LN:61431566
                          @SQ SN:chrM LN:16299
                          @SQ SN:chrX LN:171031299
                          @SQ SN:chrY LN:91744698
                          HISEQLN:122:HCW3JADXX:2:2109:6425:52017 272 chr1 3001923 0
                          43M * 0 0 TCTGATTATTATGTGTCAGGAGGAATTTCTTTTCTGGTCCAAT
                          JJJJJJJJIIIJJJJJJJJJJJJJJJJIJJHHHHHFFFFFCCC AS:i:0 XN:i:0 XM:i:0 XO:i:0
                          XG:i:0 NM:i:0 MD:Z:43 YT:Z:UU NH:i:20 CC:Z:= CP:i:48081495 XS:A:- HI:i:0

                          Comment

                          • Brian Bushnell
                            Super Moderator
                            • Jan 2014
                            • 2709

                            #14
                            Well, it's missing the HD line. Try adding this to the beginning of one of the sam files:
                            "@HD VN:1.3 SO:unsorted"

                            However, that's not required, so I'm not sure what the problem is. It might help if you encapsulate what you pasted with [ code ] [ / code ] (without the spaces) to make the tabs come through:
                            Code:
                            hello	world
                            vs
                            Code:
                            hello world

                            Comment

                            • krapulaxdoctor
                              Member
                              • May 2015
                              • 22

                              #15
                              And also, when I try to convert these BAM files into BED files, it generates a file, but the program (bedtools bamtobed)gives this error message:

                              terminate called after throwing an instance of 'std :: out_of_range'
                              what(): vector::_M_range_check
                              Aborted (core dumped)

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                                Here are nine questions we think about, in roughly the order they matter, before...
                                Today, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 06:09 AM
                              0 responses
                              16 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              37 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              42 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...