Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting ELAND export -> SAM -> BAM

    Hi everyone,

    I apologize because a variation of this question has been asked several times before, but I still can't seem to figure out what's wrong.

    I have ELAND export files that I converted to SAM (I believe), using Samtools 0.1.18 and the command

    perl samtools-0.1.18/misc/export2sam.pl --read1=exportfile > exportfile.sam

    When I try to make a bam file using

    samtools view -Sbh exportfile.sam > exportfile.bam

    I get the following error message:
    [samopen] no @SQ lines in the header.
    [sam_read1] missing header? Abort!

    This is day 1 for me, so I'd appreciate any pointers.

    Thanks

  • #2
    My guess is that the sam didn't get made with headers. Find out what they are supposed to look like, and add them yourself.

    Comment


    • #3
      Are these unaligned reads? If so it is legitimate to have a SAM file with no @SQ lines.

      Comment


      • #4
        They are aligned reads (using ELAND). I am now trying to make my own header, as suggested, but additionally I suppose I would like to understand WHY there is no @SQ header -- is there something I need to supply to the export2sam.pl script that I am not supplying?

        Comment


        • #5
          I don't believe eland2sam.pl will give the header. For one, it doesn't know the length of the sequences - so at best it could give you a length as long as the largest alignment position. Also, it would need to go through the Eland file twice: once to calculate the header and the second to write the reads.

          This post suggests using the FASTA reference file to get the headers. One issue would be if your data has splice junctions - since the "chromosome" in the Eland alignment would be represented as a concatenation of exon coordinates. I think you would be able to create a legitimate BAM file, but the splice junctions will have coordinates relative to the exon junction rather than the chromosome.

          Justin

          Comment


          • #6
            Thank you, that is very helpful. This is DNA sequence, so I think there should be no issues.

            So what I need is a ref.fa file. Do I make this myself by concatenating the FASTA files for all chromosomes?

            Comment


            • #7
              Originally posted by knostrov View Post
              Thank you, that is very helpful. This is DNA sequence, so I think there should be no issues.

              So what I need is a ref.fa file. Do I make this myself by concatenating the FASTA files for all chromosomes?
              Yes, you can do that. You shouldn't need a single reference fasta to make the header.

              Is it possible that lying around somewhere is an old sample that someone analyzed the right way? You can grab the headers off of that. It's something you'll only need once.

              Comment


              • #8
                So this is basically a shot in the dark, but what I did is to make a ref.fa file by concatenating fasta files for all chromosomes.
                Then I ran

                samtools faidx chromFa/ref.fa

                and got back a file that looked like this:
                chr1 249250621 6 50 51
                chr10 135534747 254235647 50 51
                chr11 135006516 392481096 50 51
                chr12 133851895 530187750 50 51
                chr13 115169878 666716690 50 51
                chr14 107349540 784189973 50 51
                chr15 102531392 893686511 50 51
                chr16 90354753 998268538 50 51
                chr17 81195210 1090430394 50 51
                chr18 78077248 1173249516 50 51

                etc.

                Then I ran

                samtools view -t chromFa/ref.fa.fai -bSh exportfile.sam -o exportfile.bam

                and got back this very impressive error message (verbatim!):
                [sam_header_read2] 24 sequences loaded.
                ?1`?-!??2?p??yF??b?ZQ?_TbT???W???ge?g?g???c?Z?ZT???X?Z?_?????VR?X??T\???S?k?g?gh???Y???b???nQjb????_???a?s?{??{????y|?!T????u?@??Qd?`?|??
                ߳~??x?
                aQw?`<?48ϔ??Q?3cpܒ?
                [main_samview] random alignment retrieval only works for indexed BAM files.
                ?3??p?Ƴ`p?]?Y2L?ic?:̈??m?aF
                ?x?
                ????`<#??0m?
                rS?rC9&
                )'R`S??\P??5?1?cΐ?\? ?X0????r,??˂q"?:”E2???

                Really!

                If there are any comments, thank you. Unfortunately I don't have any files to steal headers from...

                Comment


                • #9
                  Then I ran

                  samtools view -t chromFa/ref.fa.fai -bSh exportfile.sam -o exportfile.bam
                  Why? What you need to do is make header lines, and append them to the top of your .sam file.

                  For each chromosome in your reference, you need a line like

                  @SQ SN:chr10 LN:135534747
                  You can make the header file, and then cat it to your .sam file.

                  Comment


                  • #10
                    I think you are on the home stretch. According to the samtools reference, you can use the .fai file as input for the -t option. I think the message you see being printed out is actually the BAM output.

                    I would try this command instead

                    PHP Code:
                    samtools view -t chromFa/ref.fa.fai -bSh exportfile.sam exportfile.bam 
                    To check the bam file, you can do

                    PHP Code:
                    samtools view -h exportfile.bam less -
                    You should see the header lines (starting with the @ symbol) followed by the alignments.

                    Justin

                    Comment


                    • #11
                      Yes! It worked! Thank you very much!

                      Ah, the satisfaction...

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin




                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                        04-22-2024, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 11:49 AM
                      0 responses
                      15 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-24-2024, 08:47 AM
                      0 responses
                      16 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      61 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      60 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X