Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intersect BAM files from alignments to human and mouse

    Hi,

    I have searched the forums (closest thing was this: seqanswers.com/forums/showthread.php?t=31625 but no replies) and done a lot of trial and error on my own, but can't come up with a good solution to this so I'm hoping someone here will have an idea!

    I'm working with xenograft models so have aligned my reads (paired end) separately to human and mouse as I want to get an idea of the levels of contamination from mouse, and devise the best strategy to deal with mouse reads from there.

    So I have two BAM files (1 for human, 1 for mouse); for each of those I've extracted the mapped and unmapped reads using samtools (-f4 and -F4 options).

    I now want to, for example, compare/intersect the reads that map to both human and mouse for a given sample. A sort of intersectBed but with 2 BAM files (bedtools only seems to accept one BAM file + 1 bed file);

    I have tried using the CompareSAMs function in Picard tools but it just tells me for each read that they're not the same in each file "read name ceases agreeing" (doesn't seem to do any searching):

    Code:
    java -jar CompareSAMs.jar mapped_to_human.sorted.bam mapped_to_mouse.sorted.bam
    Any hints would be much appreciated!
    Thanks

    PS: I'm also using the Xenome tools in parallel, but want to do this manually as well as a form of sanity check.

  • #2
    CmpBams

    I wrote a tool named CmpBams ( https://github.com/lindenb/jvarkit/wiki/CmpBams ) that might help you. It takes two or more BAM and show the differences for each read F/R.

    Code:
    #READ-Name	tmp1.sam tmp2.sam|tmp1.sam tmp3.sam|tmp2.sam tmp3.sam	tmp1.sam	tmp2.sam	tmp3.sam
    HWI-1KL149:20:C1CU7ACXX:1:1101:17626:32431/1	EQ|EQ|EQ	K01:2136=83/43M3I54M	K01:2136=83/43M3I54M	K01:2136=83/43M3I54M
    HWI-1KL149:20:C1CU7ACXX:1:1101:17626:32431/2	EQ|EQ|EQ	K01:2059=163/100M	K01:2059=163/100M	K01:2059=163/100M
    HWI-1KL149:20:C1CU7ACXX:1:1102:16831:71728/1	EQ|EQ|EQ	K01:2133=83/100M	K01:2133=83/100M	K01:2133=83/100M
    HWI-1KL149:20:C1CU7ACXX:1:1102:16831:71728/2	EQ|EQ|EQ	K01:2059=163/100M	K01:2059=163/100M	K01:2059=163/100M
    HWI-1KL149:20:C1CU7ACXX:1:1105:3309:27760/1	EQ|EQ|EQ	K01:2213=83/100M	K01:2213=83/100M	K01:2213=83/100M
    HWI-1KL149:20:C1CU7ACXX:1:1105:3309:27760/2	EQ|EQ|EQ	K01:2081=163/100M	K01:2081=163/100M	K01:2081=163/100M
    HWI-1KL149:20:C1CU7ACXX:1:1106:2914:12111/1	EQ|EQ|EQ	K01:2136=83/43M3I54M	K01:2136=83/43M3I54M	K01:2136=83/43M3I54M
    HWI-1KL149:20:C1CU7ACXX:1:1106:2914:12111/2	EQ|EQ|EQ	K01:2059=163/100M	K01:2059=163/100M	K01:2059=163/100M
    HWI-1KL149:20:C1CU7ACXX:1:1107:11589:17295/1	EQ|EQ|EQ	K01:2123=83/56M3I41M	K01:2123=83/56M3I41M	K01:2123=83/56M3I41M
    HWI-1KL149:20:C1CU7ACXX:1:1107:11589:17295/2	EQ|EQ|EQ	K01:1990=163/100M	K01:1990=163/100M	K01:1990=163/100M
    HWI-1KL149:20:C1CU7ACXX:1:1110:14096:95943/1	EQ|EQ|EQ	K01:2123=83/56M3I41M	K01:2123=83/56M3I41M	K01:2123=83/56M3I41M
    HWI-1KL149:20:C1CU7ACXX:1:1110:14096:95943/2	EQ|EQ|EQ	K01:1990=163/100M	K01:1990=163/100M	K01:1990=163/100M
    HWI-1KL149:20:C1CU7ACXX:1:1110:15369:59046/1	EQ|EQ|EQ	K01:2213=83/100M	K01:2213=83/100M	K01:2213=83/100M
    you could pipe the output in awk to select the reads that have been (un)mapped in on or more genome.

    Comment


    • #3
      Thanks, I'll give that a go!

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Advancing Precision Medicine for Rare Diseases in Children
        by seqadmin




        Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
        12-16-2024, 07:57 AM
      • seqadmin
        Recent Advances in Sequencing Technologies
        by seqadmin



        Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

        Long-Read Sequencing
        Long-read sequencing has seen remarkable advancements,...
        12-02-2024, 01:49 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 12-17-2024, 10:28 AM
      0 responses
      33 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-13-2024, 08:24 AM
      0 responses
      49 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-12-2024, 07:41 AM
      0 responses
      34 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-11-2024, 07:45 AM
      0 responses
      46 views
      0 likes
      Last Post seqadmin  
      Working...
      X