SEQanswers (
-   Bioinformatics (
-   -   Trim BAM file reads from 5' ends (

FiReaNG3L 05-25-2012 04:10 AM

Trim BAM file reads from 5' ends
For a specific application I need to display the 5' ends of reads in a genome browser - so far I haven't found a way to easily trim all the reads in a BAM file down to the first basepairs. I obviously need to retain mapping information so I can't just trim the reads in FASTQ.

arvid 05-25-2012 04:37 AM

Define "retain mapping information"; do you need the information provided by the CIGARs? If you don't, such trimming is straightforward with a simple script chopping off the bases and qualities in the BAM (pipe "samtools view -h" into a script which skips over the headers, then for each line cuts down columns 10 and 11 to the desired length, and just replaces the CIGAR field with a *, piped into "samtools view -bS").
If you need to retain the information in the CIGARs properly, it becomes more messy, as you'd want to think about how to handle soft-clippings and indels.

FiReaNG3L 05-25-2012 04:47 AM

Ok, so something along the lines of what's suggested in

I don't think SeqMonk cares about the CIGAR information, so it should be fine.

arvid 05-25-2012 05:01 AM

Yes, basically. I'd really replace the CIGAR with a * though, to make the SAM/BAM file standardized. You might want to check how to decide on which side of the read sequence in the BAM is 5'/3' (in the read, not alignment sense as I understand the SAM spec) however, didn't think about that.

All times are GMT -8. The time now is 06:39 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.