Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • sam file integrity

    Hi,

    Is there away to check a sam file integrity? I generated one after mapping with bwa, but have problems on rebuilding a bam file from it.
    In fact, when I use
    $ samtools view -F12 -Sb /input_file.sam > new_bams/output_file.bam
    the bam created is empty (4 kb, coming from a 113 Mb sam).
    I tried many different ways, so I am wondering now if the problem is with the sam file.

    Info:
    -The starting bam file I used are paired-end ones of about 50 Mb and 500000 reads.
    -The creation of fastq file goes well, generating 2 files for the two reads, with the same number of reads of the bam, and size of 53 Mb.
    -The sai file generated from the mapping process with BWA are around 1 Mb.
    -The new sam file is generated through this command:
    $ bwa sampe /ref_genome.fa /bwa_read1.sai /bwa_read2.sai /read1.fastq /read2.fastq > /output.sam

    -The header and first lines of the sam are these:

    @SQ SN:gi|9633069|ref|NC_000898.1| LN:162114
    @SQ SN:gi|224020395|ref|NC_001664.2| LN:159322
    @SQ SN:gi|82503188|ref|NC_007605.1| LN:171823
    @SQ SN:gi|139424470|ref|NC_009334.1| LN:172764
    @PG ID:bwa PN:bwa VN:0.6.1-r104
    SRR360611.2626045 77 * 0 0 * * 0 0 GATGGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGAC 7:C;7@?9C7A@@C@D@A>/;CCGBBADBBBBBEAEE9GHEEFHGKGKJIJKHIIIJKIFJIIGJCC?JKKLKKKJKHJKKJCKCKKJKKKIGHG>EEDB
    SRR360611.2626045 141 * 0 0 * * 0 0 TGGTTCCTACTTCAGGGCCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAATAAGACATCACGATGGATCACAGGTCTATCACCCTATTAACCACTCA 8DDEEGHIGGGGHIIEIIJIHHJJJJJKIJJGDJGKKIFHCCIIJKKKKIJKADJIGJFKI>JIAJGGEDB&:0EB>@7C8B6=;.>FACACB:@E=G@:
    SRR360611.18087377 77 * 0 0 * * 0 0 ATGGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACG <<???@A@ACC?<<C<B??D?GFBDEBAECBEEFAIBEKDJJHJIJHGIJKGIFIJKIF@IIIHCJKIJKKLKKGIJJKKJDJCKIKLIJIHHH?FDDB;
    SRR360611.18087377 141 * 0 0 * * 0 0 GGAAGCTTTCTGTTGGCTCACATTTGGTTTATTGATGTAATGTATTGATGCTTCCCATAACGCCCTAAGTTCACACATCAACTGCAACTCCAAAGCCACC 8ECFGGFFFHIHF<ADIJIGDHHHBIIGFF@C?G@CE6BEGGEB>GDEEGCF>HK<F?<F&653&=2:?5?6/5=0B><:GCAGBEB+GF?D:@BB=B;@
    SRR360611.8887247 77 * 0 0 * * 0 0 ATGGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACG 5<;:=@A@ACA?=<C>FA:ABBFBDEB<BCBBECAIBHKGJHF?FJKGIJKIGIIJKIGIGIGHAF<IJKKLKKHIIGKKJDKCKKKLKKJHHH?FEDA;



    -I am working on a Mac Terminal (OSX 10.5, or OSX 10.6, or on Linux based cluster)

    Thanks in advance!
    Last edited by marco12345; 03-28-2013, 04:31 AM.

  • #2
    Try this: http://genome.sph.umich.edu/wiki/BamUtil:_validate

    Comment


    • #3
      Well, why don't you start by believing what your data tells you. You are filtering away unmapped reads, and almost nothing comes out at the end, maybe nothing mapped?

      77 = 64+8+4+1. That didn't map.

      141=128 + 8+4+1, also didn't map.

      It looks like you are aligning to herpes virus, but your reads seem to align to human mitochondria.

      Comment


      • #4
        I agree, believing the datas should be the first answer, but unfortunately I am quite sure the problem isn't that.
        I am using full adult human genomes, and aligning to 4 viruses, each of which is present in 98% of adult population. I tried this with almost 10 individuals and had the same result, so unless I found a really weird group of people, I guess I made a mistake somewhere, just can't understand where!

        Comment


        • #5
          I am using full adult human genomes
          Are you 100% sure the mitochrondria are there? Because those reads look like mitochrondria, and they didn't map.

          Have you sat down and spot checked a bunch of reads, to see that they are about what you expect?

          Have you used grep to see if you can find some examples of the reads you are looking for?

          Your .bam is telling you that at most, a tiny % of your reads align to herpes virus. Do you have empirical evidence telling you this is not actually true?

          bwa and samtools didn't crash, and they are not returning errors. The fault is very unlikely to be in your files. Either your libraries are not what you believed them to be, or the alignment was not carried out the way you think it was.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          22 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          47 views
          0 likes
          Last Post seqadmin  
          Working...
          X