Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hi,

    I am quite new to NGS data here and I work with a commercial software from CLCbio which also offers a mapping algorithm of its own, called Genomic Workbench.

    I would want to convert my SAM output from the software to BAM to allow using the samtools function like pileup.

    I get the following error when i ran the command in Ubuntu OS

    >./samtools view -huS -o DATA/test.bam DATA/s_2_1_sequence_SS200_LAwMM.sam
    [samopen] SAM header is present: 24 sequences.
    Parse error at line 113: CIGAR and sequence length are inconsistent
    Aborted

    I read somewhere in this thread that currently the samtools does not allow sam file processing without the reference sequence, so is the whats giving the problem? If so can anyone point me to a place to generate the correct reference sequence file, I tried reading through the manual but there is nowhere telling me how the reference file should be formatted. And I am looking at the whole human reference genome with 24 gbk files from NCBI.

    Any help is appreciated.

    Thanks
    Sol

    Comment


    • Originally posted by Solyris View Post
      Hi,

      I am quite new to NGS data here and I work with a commercial software from CLCbio which also offers a mapping algorithm of its own, called Genomic Workbench.

      I would want to convert my SAM output from the software to BAM to allow using the samtools function like pileup.

      I get the following error when i ran the command in Ubuntu OS

      >./samtools view -huS -o DATA/test.bam DATA/s_2_1_sequence_SS200_LAwMM.sam
      [samopen] SAM header is present: 24 sequences.
      Parse error at line 113: CIGAR and sequence length are inconsistent
      Aborted

      I read somewhere in this thread that currently the samtools does not allow sam file processing without the reference sequence, so is the whats giving the problem? If so can anyone point me to a place to generate the correct reference sequence file, I tried reading through the manual but there is nowhere telling me how the reference file should be formatted. And I am looking at the whole human reference genome with 24 gbk files from NCBI.

      Any help is appreciated.

      Thanks
      Sol
      samtools performs some sanity checks in the CIGAR string and it is telling you something is not right. Have you looked to that particular alignment to confirm if the CIGAR is correct?
      -drd

      Comment


      • why do deletions in the pileup-file have a quality attached

        Hi guys,

        Does anyone know why deletions in the pileup file have an quality attached??? How can a deletion have a quality?
        And how is this calculated??

        For example:

        YHet 23690 N 1 a-1n Q
        YHet 23691 N 1 * [
        YHet 23692 N 1 c [


        or

        YHet 25409 N 5 AAA-2NNa-2nnA-2NN VTW`a
        YHet 25410 N 5 A$A$*** USR`a
        YHet 25411 N 3 *** SG`


        best ro

        Comment


        • If an insertion or deletion occurs at the end of the pileup read bases string, they don't seem to the extra character after the '\+[0-9]+[ACGTNacgtn]+' pattern.

          For example:
          chr1 2263 C 4 ,$.$.,+1t CC9C FFFF.

          Am I missing something? The pattern is described here: pileup format, and it mentions the in/del pattern '\+[0-9]+[ACGTNacgtn]+' but there appears to be an extra character in the examples given on the page:

          seq2 156 A 11 .$......+2AG.+2AG.+2AGGG <975;:<<<<<

          That extra character appears to be missing if the in/del occurs at the end of the read bases string. Including that extra character as part of the insertion/deletion it makes the read_bases match with the read number.
          Last edited by jeffhsu3; 04-05-2010, 12:03 PM. Reason: Made more clear and added examples.

          Comment


          • So, is it already possible to convert soap aligner output format to SAM or BAM formats.
            Best.
            Javi

            Originally posted by lh3 View Post
            To corthay:

            You are quick. I am planning a new bwa release as I realized that I could improve it a little without much work (PS: the new version is released now). Wgsim, wgsim_eval.pl and converters for soap and bowtie are available from SVN only:

            svn co https://samtools.svn.sourceforge.net...s/dev/samtools samtools

            Comment


            • FLAGS for fusion detection

              Lets say I have RNA-Seq data (Paired-End) and I want to find out if the mates are mapped > 1 Mb on the same chromosome or map to 2 different chromosomes. How do I determine that from the FLAGS?

              Comment


              • Originally posted by RockChalkJayhawk View Post
                Lets say I have RNA-Seq data (Paired-End) and I want to find out if the mates are mapped > 1 Mb on the same chromosome or map to 2 different chromosomes. How do I determine that from the FLAGS?
                You can use the MRNM and MPOS fields in the SAM file.

                Comment


                • Originally posted by nilshomer View Post
                  You can use the MRNM and MPOS fields in the SAM file.
                  So in that case, my MRNM does not equal "=" OR MRNM equals "=" and the difference between POS and MPOS > 1 million.

                  Is this correct?
                  Last edited by RockChalkJayhawk; 04-13-2010, 01:17 PM. Reason: Incorrect assumption

                  Comment


                  • Originally posted by RockChalkJayhawk View Post
                    So in that case, my MRNM does not equal "=" OR MRNM equals "=" and the difference between POS and MPOS > 1 million.

                    Is this correct?
                    Perfect!

                    Comment


                    • Originally posted by nilshomer View Post
                      Perfect!
                      Thanks Nils! Youre the best!

                      Comment


                      • non-unique reads

                        Hello,
                        In my sam file I have both unique and non-unique reads. What happens to non-unique reads when I call SNPs from the sam file? Are they included in the SNP calling process?

                        thanks

                        Comment


                        • denovo on sam format

                          Dear all,

                          I have alignment results in bam file which includes pair-end, mate-pair reads in different length (101 and 35, 36bp). Does anybody know that Soap or other denovo program can handle with bam format directly or I have to use the raw reads files?

                          Many thanks!

                          Comment


                          • Hello All,

                            Does anybody know how can I sort the .sam file on the basis of the first column? That is the column containing the unique read identifiers? Right now its sorted on the 3rd.

                            Thanks
                            Abhijit

                            Comment


                            • Originally posted by gen2prot View Post
                              Hello All,

                              Does anybody know how can I sort the .sam file on the basis of the first column? That is the column containing the unique read identifiers? Right now its sorted on the 3rd.

                              Thanks
                              Abhijit
                              SAMtools and Picard will both sort by read name. See their documentation.

                              Comment


                              • Hello nilshomer,

                                I downloaded picard. I have the .jar files on MAC osx 10.6. Yet these jar files won't open. I have them saved on the Desktop. How do I run it?

                                Thanks
                                Abhijit

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                25 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                24 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X