SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
blasr read length filter? ewilbanks Bioinformatics 1 07-17-2014 09:28 AM
Is there an easy way to filter SAM/BAM files with NM>0 dontkme Bioinformatics 3 04-24-2014 05:44 AM
How to filter a SAM/BAM file by bp alisrpp Bioinformatics 5 01-17-2014 12:11 PM
how to filter CCS by number of passes (not by long read length)? metheuse Pacific Biosciences 1 08-29-2013 12:27 PM
Filter paired end BAM file based on iSize Leif Bergsagel Bioinformatics 2 12-16-2010 11:50 AM

Reply
 
Thread Tools
Old 05-11-2015, 12:35 PM   #21
krapulaxdoctor
Member
 
Location: Netherlands

Join Date: May 2015
Posts: 20
Default

OK , I'll have a try. Thank you for all your help.
krapulaxdoctor is offline   Reply With Quote
Old 05-11-2015, 01:46 PM   #22
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,780
Default

What version of samtools are you using?
GenoMax is offline   Reply With Quote
Old 08-29-2015, 02:23 AM   #23
krapulaxdoctor
Member
 
Location: Netherlands

Join Date: May 2015
Posts: 20
Default

Hi,

I am using:
Version: 1.2 (using htslib 1.2.1)
krapulaxdoctor is offline   Reply With Quote
Old 10-28-2016, 04:29 AM   #24
DumbOrchid
Junior Member
 
Location: Brisbane

Join Date: Oct 2016
Posts: 2
Default

Hi,

Sorry to revive this thread, but I have a similar desire to filter based on length and was excited to learn about reformat!

I've run into some issue, but I'm pretty dumb so I'm sure I've just confused something simple.

I've downloaded bbmap and have tried to get reformat to work but I'm not having any luck.

When I try the following:

sh ~/tools/bbmap/reformat.sh in=input.bam out=output.bam minlength=1 maxlength=100

I get the following error message:

Found samtools.
Input is being processed as unpaired
[samopen] SAM header is present: 84 sequences.
java.lang.AssertionError
at stream.SamLine.toShortMatch(SamLine.java:1257)
at stream.SamLine.toRead(SamLine.java:1879)
at stream.SamLine.toRead(SamLine.java:1749)
at stream.SamReadInputStream.toReadList(SamReadInputStream.java:119)
at stream.SamReadInputStream.fillBuffer(SamReadInputStream.java:90)
at stream.SamReadInputStream.nextList(SamReadInputStream.java:74)
at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:656)
at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:635)
Input: 110600 reads 16384426 bases
Short Read Discards: 110034 reads (99.49%) 16340390 bases (99.73%)
Output: 566 reads (0.51%) 44036 bases (0.27%)

Time: 1.287 seconds.
Reads Processed: 110k 85.94k reads/sec
Bases Processed: 16384k 12.73m bases/sec
Exception in thread "main" java.lang.RuntimeException: ReformatReads terminated in an error state; the output may be corrupt.
at jgi.ReformatReads.process(ReformatReads.java:1098)
at jgi.ReformatReads.main(ReformatReads.java:43)


I'm still really excited by the potential of reformat, any advice would be greatly appreciated.
DumbOrchid is offline   Reply With Quote
Old 10-28-2016, 04:42 AM   #25
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,780
Default

Do you still get an error if you remove the minlength=1 directive?
GenoMax is offline   Reply With Quote
Old 10-28-2016, 05:32 AM   #26
DumbOrchid
Junior Member
 
Location: Brisbane

Join Date: Oct 2016
Posts: 2
Default

Wow! Thanks for the quick reply GenoMax!

Sadly that doesn't alleviate my issue:

Exception in thread "main" java.lang.RuntimeException: ReformatReads terminated in an error state; the output may be corrupt.
at jgi.ReformatReads.process(ReformatReads.java:1098)
at jgi.ReformatReads.main(ReformatReads.java:43)
DumbOrchid is offline   Reply With Quote
Old 10-28-2016, 08:28 AM   #27
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

It appears that there was some problem processing the line's MD tag. In this case, since you are just filtering based on length, that should not matter and you can just add the flag "-da" to ignore the error, which does not affect the output in this case. I added code to print out the problematic line when that happens in the future. If it's a very small bam file you could email it to me so I can see what the problem is.
Brian Bushnell is offline   Reply With Quote
Old 05-29-2018, 10:13 AM   #28
andrewbcaldwell
Junior Member
 
Location: Encinitas, CA

Join Date: Apr 2018
Posts: 1
Default

Brian,

Would it be possible to use reformat.sh to filter on the fragment length rather than the read length? I'm looking for a way to split paired-end ATAC-Seq .sam files into "nucleosome-free" and "nucleosome-bound" regions based on size of the fragment, and the proposed solutions I've found elsewhere have been a dead end. Thanks!
andrewbcaldwell is offline   Reply With Quote
Reply

Tags
bam read filter

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:18 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO