Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Oh, wait, is says "truncated" so presumably the problem is at the end of the file. Can you run "tail" on the file and post the last two lines?

    Comment


    • #17
      Originally posted by Brian Bushnell View Post
      Oh, wait, is says "truncated" so presumably the problem is at the end of the file. Can you run "tail" on the file and post the last two lines?
      How do I do this " tail " ?
      Sorry im a beginner...

      Comment


      • #18
        "tail file.sam"

        That will print the last 10 lines to the console.

        Comment


        • #19
          HISEQHI:525:HCYWJADXX:2:2213:8924:55099 256 * 942639 0 43M * 0 0 CAAAGGGCTGAGAAGCACTTGAAAAAATGTTCAACATCCTTAA CCCFFFFFHHHHHJJJJJJJJJJJJJJJJIIJJJJJJJJJJJJ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:43 YT:Z:UU NH:i:20 CC:Z:chrX CP:i:128687718 XS:A:+ HI:i:17
          HISEQLN:122:HCW3JADXX:2:2207:7052:25724 272 * 944767 0 43M * 0 0 TACTTACATATAATAAATAAATAAATAAATATTTTTTAAAAAA IFIIGJIJIIIGGIJIJIGFFCIHGIGIIHDHFFHFFDDF@@@ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:43 YT:Z:UU NH:i:11 CC:Z:chr6 CP:i:52981629 XS:A:- HI:i:9
          HISEQLN:121:HCYV3ADXX:1:1203:18633:64996 0 * 949324 043M * 0 0 CAGAACCCCTGAAATTGGCAAGATAGACGTCAGTGTTAGCAGA CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ AS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:5G37 YT:Z:UU NH:i:20 CC:Z:chr6 CP:i:6419658 XS:A:+ HI:i:12
          HISEQLN:122:HCW3JADXX:1:1112:13385:80114 272 * 949722 043M * 0 0 GGTGTCCGCTAGTGTCCTGAGGCCTGAGCGAGGGGCTCCTCTC ##A7'?DFD;BD:3GGDDDIHG@EFFEFADB?<7DD::@=1 AS:i:-2 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:11T31 YT:Z:UU NH:i:20 CC:Z:chr6 CP:i:71166409 XS:A:- HI:i:15
          This is the last few lines...

          Comment


          • #20
            Assuming all of the things that look like spaces are actually tabs (sorry, tabs often get replaced by spaces on the console), I don't see anything wrong with the sam file and I don't know what the problem is. It may have something to do with a negative number being detected where a positive number is expected, but I'm just speculating.

            You could try Picard rather than Samtools, and see if you have better luck. Or, try the most recent version of Samtools, or else v0.1.19. Sometimes there's a problem with a specific version.

            Comment


            • #21
              OK , I'll have a try. Thank you for all your help.

              Comment


              • #22
                What version of samtools are you using?

                Comment


                • #23
                  Hi,

                  I am using:
                  Version: 1.2 (using htslib 1.2.1)

                  Comment


                  • #24
                    Hi,

                    Sorry to revive this thread, but I have a similar desire to filter based on length and was excited to learn about reformat!

                    I've run into some issue, but I'm pretty dumb so I'm sure I've just confused something simple.

                    I've downloaded bbmap and have tried to get reformat to work but I'm not having any luck.

                    When I try the following:

                    sh ~/tools/bbmap/reformat.sh in=input.bam out=output.bam minlength=1 maxlength=100

                    I get the following error message:

                    Found samtools.
                    Input is being processed as unpaired
                    [samopen] SAM header is present: 84 sequences.
                    java.lang.AssertionError
                    at stream.SamLine.toShortMatch(SamLine.java:1257)
                    at stream.SamLine.toRead(SamLine.java:1879)
                    at stream.SamLine.toRead(SamLine.java:1749)
                    at stream.SamReadInputStream.toReadList(SamReadInputStream.java:119)
                    at stream.SamReadInputStream.fillBuffer(SamReadInputStream.java:90)
                    at stream.SamReadInputStream.nextList(SamReadInputStream.java:74)
                    at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:656)
                    at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:635)
                    Input: 110600 reads 16384426 bases
                    Short Read Discards: 110034 reads (99.49%) 16340390 bases (99.73%)
                    Output: 566 reads (0.51%) 44036 bases (0.27%)

                    Time: 1.287 seconds.
                    Reads Processed: 110k 85.94k reads/sec
                    Bases Processed: 16384k 12.73m bases/sec
                    Exception in thread "main" java.lang.RuntimeException: ReformatReads terminated in an error state; the output may be corrupt.
                    at jgi.ReformatReads.process(ReformatReads.java:1098)
                    at jgi.ReformatReads.main(ReformatReads.java:43)


                    I'm still really excited by the potential of reformat, any advice would be greatly appreciated.

                    Comment


                    • #25
                      Do you still get an error if you remove the minlength=1 directive?

                      Comment


                      • #26
                        Wow! Thanks for the quick reply GenoMax!

                        Sadly that doesn't alleviate my issue:

                        Exception in thread "main" java.lang.RuntimeException: ReformatReads terminated in an error state; the output may be corrupt.
                        at jgi.ReformatReads.process(ReformatReads.java:1098)
                        at jgi.ReformatReads.main(ReformatReads.java:43)

                        Comment


                        • #27
                          It appears that there was some problem processing the line's MD tag. In this case, since you are just filtering based on length, that should not matter and you can just add the flag "-da" to ignore the error, which does not affect the output in this case. I added code to print out the problematic line when that happens in the future. If it's a very small bam file you could email it to me so I can see what the problem is.

                          Comment


                          • #28
                            Brian,

                            Would it be possible to use reformat.sh to filter on the fragment length rather than the read length? I'm looking for a way to split paired-end ATAC-Seq .sam files into "nucleosome-free" and "nucleosome-bound" regions based on size of the fragment, and the proposed solutions I've found elsewhere have been a dead end. Thanks!

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 03-27-2024, 06:37 PM
                            0 responses
                            12 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-27-2024, 06:07 PM
                            0 responses
                            11 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            53 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            68 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X