Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • modified Bam to bed algorithm

    I changed the Bam to Bed code I found here http://sourceforge.net/apps/mediawik...s_.28Picard.29 removing the commandline stuff to make it into simple conversion function that takes and passes a file back. Anything wrong with it? I'm new to Java so it may be riddled with errors.

    Calling code

    Code:
           //ALT if BAM 
            if (bamButton.isSelected())
            	File selectedFileBamHolder = BamToBed.doWork(selectedFile); //selectedFileBamHolder holds BAM since dunno if selectedFile could equal itself being processed by doWork
            	File selectedFile = selectedFileBamHolder;
            
            //ALT if not BAM just continue with other stuff


    Conversion code

    Code:
    package gui;
    
    
    import net.sf.picard.io.IoUtil;
    import net.sf.samtools.*;
    
    import java.io.File;
    import java.util.Iterator;
    
    /**
     * method for converting bam or sam files to bed files.
     */
    public class BamToBed {
    
        File convertedBAM; //ALT File to pass back
        
        /** Whether the user provided sequence, start, and end args on the command line */
        protected boolean rangeArgsProvided = false;
    
        /** This method contains the main logic of the application */
        protected File doWork(File INPUT) { //ALT takes the (BAM) file inputted and processes it
            IoUtil.assertFileIsReadable(INPUT);
    
            final SAMFileReader reader = new SAMFileReader(INPUT);
    
            Iterator<SAMRecord> iterator = null;
            if(!rangeArgsProvided )
            {
                iterator = reader.iterator();
            }
            else
            {
                iterator = reader.queryOverlapping(SEQUENCE, START, END);
            }
    
            
            while (iterator.hasNext()) {
                final SAMRecord record = iterator.next();
                if (record.getReadUnmappedFlag()) {
                    continue;
                }
    
                //Output is redirected from System.out to a File to be passed back to calling function
                FileWriter fstream = new FileWriter(convertedBAM);
                BufferedWriter out = new BufferedWriter(fstream);
                out.write(record.getReferenceName() + "\t" +
                        (record.getAlignmentStart() - 1) + "\t" + //subtract 1 to shift from one-based to zero-based
                        (record.getAlignmentEnd() - 1 + 1) + "\t" + //subtract 1 to shift from one-based to zero-based, and
                                                                    // then add 1 to shift from inclusive to exclusive
                        record.getReadName() + "\t" +
                        record.getMappingQuality() + "\t" +
                        (record.getReadNegativeStrandFlag()? "-": "+") );
                out.close();
                   
                
            }
            reader.close();
    
            return convertedBAM;
        }
    
    
    }

  • #2
    It seems like you should have to deal with the cigar notation stuff in there. Spliced alignments, deletions, etc. bed files have a way of annotating those as well. Either that or sometimes a single BAM line will have to be split into multiple BED lines.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Today, 08:47 AM
    0 responses
    10 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    59 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    53 views
    0 likes
    Last Post seqadmin  
    Working...
    X