Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GSNAP and SAM header

    Dear all,

    I'm currently trying to align reads on a reference sequence with gsnap with this code :

    gsnap --gunzip -D ./ref/cristata -d cristata ./data/cristata/Pig-GC01_AGTTCC_L005_R1.fastq.gz ./data/cristata/Pig-GC01_AGTTCC_L005_R2.fastq.gz -t 8 -B 4 -A sam --nofails --clip-overlap --split-output GC01
    I'm working with very short reads, it's why i'm using the clip-overlap command.

    Work is finishing without error message, but my sam header is not complete, so I can't use my results files.

    For example :

    head ./GC01.concordant_uniq

    @SQ SN:GC01-(1934) LN:17090
    HWI-ST314:2581WBYACXX:5:1101:3036:2447 99 GC01-(1934) 11568 40 58M42H = 11626 116 GACCCTTCCTACCCACATCACGTCCATCCAAAATTCAACCACACGAGAGCACCTCCTT CCCFFFFFHHHHHJJJJJJIJJHIIIGJJJJJIJJJJJJJIJJJGJJJJJJJJJJJIJ MD:Z:58 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0
    [...]
    I only have the @SQ line, and no @HD line in the header.

    Do you see something I missed ?

    Thanks a lot in advance for your help !!

    Jade

  • #2
    You can just add it when converting to BAM. Just make a file with the @HD line and then
    Code:
    cat HD.txt file.sam | samtools view -Sbo file.bam -

    Comment


    • #3
      The @HD line is just the header and not mandatory for any downstream applications (as far as I know). BWA is also not creating it. If you need it, just add it like dpryan suggested, if not, just leave it.

      Comment


      • #4
        Ok. Thanks a lot for your answers.

        If I don't need it, it's ok. But I thought my next error message was due to this lack.

        In fact, when I do this :

        for file in ./data/cristata/GC01.*.sam
        do
        samtools view -Sb $file > $file.bam
        done
        or this :

        for file in ./data/cristata/GC01.*.sam
        do
        samtools view -Sb -T ./ref/cristata/cristata.fasta $file > $file.bam
        done
        My job fails, with such a message :

        [samopen] SAM header is present: 1 sequences.
        [sam_read1] reference 'SN:GC01-(1934) LN:17090
        ' is recognized as '*'.
        [main_samview] truncated file.
        [...]
        But it doesn't seem to be truncated :

        tail -n 3 GC01.concordant_uniq

        HWI-ST314:2581WBYACXX:5:2316:14327:100908 163 GC01-(1934) 3724 40 52M48H = 3776 105 ACCAACTTATACACCTTCTCTGAAAGAGCTTCTTGCCACTAACACTGGCACT @C@FFFFFHGHHHJJGIIIIJJGHIJJJIJJJIJGIJIFHIICGIIIII)?D MD:Z:49C2 NH:i:1 HI:i:1 NM:i:1 SM:i:40 XQ:i:40 X2:i:0
        HWI-ST314:2581WBYACXX:5:2316:17260:100910 83 GC01-(1934) 2574 40 44H56M = 2518 -112 GTTCGTTTGTTCAACGATTAATAGTCCTACGTGATCTGAGTTCAGACCGGAGTAAT IJJGJJJJJJJJJIHGDFDIIIJJJJJJIIHHHIGJIJIJIGIFHHGHFDFFFCCC MD:Z:56 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0
        HWI-ST314:2581WBYACXX:5:2316:17260:100910 163 GC01-(1934) 2518 40 56M44H = 2574 112 GGTTTACGACCTCGATGTTGGATCAGGACATCCTAATGGTGCAGCAGCTGTTAAGG @CBFFFFFHHHHHJGGIHJJJEHHIIJGIIIJJJIGJJIDGGHIGGHII3=EFGHI MD:Z:49A6 NH:i:1 HI:i:1 NM:i:1 SM:i:40 XQ:i:40 X2:i:0
        Any ideas ?

        Comment


        • #5
          What samtools version are you using?
          Was there something like "missing EOF marker" in the "error message"?

          This is a quite common bug in version 0.1.19 (I think it has been resolved in the newer ones?!). It's actually just a warning and means nothing if you know that your input file was not truncated

          Comment


          • #6
            I'm actually using the 0.1.19-44428cd one.

            But, I don't have something like "missing EOF marker", only lines as above (for files which are not empty), and "Your job has been killed."

            I have bam files finally, but I'm not sure I can use them...

            Comment


            • #7
              Ok, maybe the following helps...

              1) Is there a particular reason for using the "-T" parameter. I never use it and I actually don't what it is useful for. Although it don't think it will change anything, you could try the conversion without that.

              2) Your "GC01.concordant_uniq" file is a sam file created from one of the bam files using samtools view -h file.sam > GC01.concordant_uniq, isn't it? A more accurate comparison would be to use "diff file1 file2". If the output is empty, your files are equal.

              3) If (1) doesn't help and (2) doesn't give you enough confidence, upgrade samtools to version 1.1 and check if the warning remains.

              Comment


              • #8
                @WhatsOEver: -T is useful when a file is completely missing a header. Otherwise, it doesn't do much.

                @JadeB: The "[sam_read1] reference 'SN:GC01-(1934) LN:17090" means that samtools is seeing part of your header as a read. This is pretty unusual, but you're correct that this will only happen if it's not parsing the header correctly (internally, sam_hdr_read() moves the file pointer to just past the header). We're up to samtools 1.1 now, so go ahead and upgrade in any case, though I suspect there's something wrong with the header (maybe spaces rather than tabs) that will also prevent the newer versions from working. If 1.1 also doesn't work, post the first 100 or so lines (just "head -n 100 file.sam > file.txt") or so as a text file and I can have a look.

                Comment


                • #9
                  Originally posted by WhatsOEver View Post
                  2) Your "GC01.concordant_uniq" file is a sam file created from one of the bam files using samtools view -h file.sam > GC01.concordant_uniq, isn't it? A more accurate comparison would be to use "diff file1 file2". If the output is empty, your files are equal.
                  No, my sam files are gsnap output, not made with samtools view, then I'm trying to have bam files...

                  I'm currently trying an other thing : error could maybe be linked with different names between my fasta file and inside my fasta file, in the header. I homogenize everything, and if it doesn't change anything, I'll ask for an update of samtools version to the bioinformatics server I use...

                  Thanks again !

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 03-27-2024, 06:37 PM
                  0 responses
                  13 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-27-2024, 06:07 PM
                  0 responses
                  11 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  53 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  69 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X