Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • [bwa_read_seq] the maximum barcode length is 15

    Hi

    I am trying to align some Sanger sequencing reads to the D. simulans assembly with BWA and I'm getting the error "the maximum barcode length is 15" when converting from BWA output format to SAM. Actually, I'm trying to align the same reads from when this genome was first assembled back unto the droSim1 reference assembly (downloadable from UCSC Genome Browser).

    Here are the commands I've used to align the reads and convert to SAM format:

    bwa bwasw -t 16 -f c1674_clean.sam droSim1bwaidx c1674_clean.fq
    bwa samse -f c1674_clean.sam droSim1bwaidx c1674_clean.sai c1674_clean.fq


    It is the last command that outputs
    [bwa_read_seq] the maximum barcode length is 15
    I happens almost immediately as well once the command is executed. I don't think it is an out-of-memory issue like other posts seem to suggest.


    I'm using BWA version 0.5.9-r16 on RHEL5.5 machine with 16 processors and 48GB of RAM. As far as I can tell no barcoding was done with Sanger sequencing.

    The input files are large (659MB), but I can put it up somewhere for download temporarily.

    Any help is greatly appreciated.
    Thanks,
    David

  • #2
    Originally posted by jaavedm View Post
    Hi

    I am trying to align some Sanger sequencing reads to the D. simulans assembly with BWA and I'm getting the error "the maximum barcode length is 15" when converting from BWA output format to SAM. Actually, I'm trying to align the same reads from when this genome was first assembled back unto the droSim1 reference assembly (downloadable from UCSC Genome Browser).

    Here are the commands I've used to align the reads and convert to SAM format:

    bwa bwasw -t 16 -f c1674_clean.sam droSim1bwaidx c1674_clean.fq
    bwa samse -f c1674_clean.sam droSim1bwaidx c1674_clean.sai c1674_clean.fq


    It is the last command that outputs
    [bwa_read_seq] the maximum barcode length is 15
    I happens almost immediately as well once the command is executed. I don't think it is an out-of-memory issue like other posts seem to suggest.


    I'm using BWA version 0.5.9-r16 on RHEL5.5 machine with 16 processors and 48GB of RAM. As far as I can tell no barcoding was done with Sanger sequencing.

    The input files are large (659MB), but I can put it up somewhere for download temporarily.

    Any help is greatly appreciated.
    Thanks,
    David

    David,

    Perhaps I am overlooking something but shouldn't the sequence of commands be this (<db.fasta> is your "reference"):

    1. bwa index -a bwasw <db.fasta> (*build index*)
    2. bwa aln -t 16 <db.fasta> c1674_clean.fq > c_1674_clean.sai (*do alignment*)
    3. bwa samse <db.fasta> c1674_clean.sai c1674_clean.fq > c_1674_clean.sam (convert to sam)

    Comment


    • #3
      GenoMax,

      Thanks for your response. I omitted the indexing step. Here are the three bwa commands I executed for completeness:

      Code:
      bwa index -p droSim1bwaidx -a bwtsw droSim1.fa
      bwa bwasw -t 16 -f c1674_clean.sam droSim1bwaidx c1674_clean.fq
      bwa samse -f c1674_clean.sam droSim1bwaidx c1674_clean.sai c1674_clean.fq
      Also of note, I tried redirecting the output from "bwa bwasw" and "bwa samse" like you recommended but that also fails. Earlier posts on Seqanswers.com suggested that this type of error that I'm seeing might be attributed to the ">" redirection symbol, hence my use of the explicit "-f" option in my commands.

      Best,
      David

      Comment


      • #4
        Originally posted by jaavedm View Post
        GenoMax,

        Thanks for your response. I omitted the indexing step. Here are the three bwa commands I executed for completeness:

        Code:
        bwa index -p droSim1bwaidx -a bwtsw droSim1.fa
        bwa bwasw -t 16 -f c1674_clean.sam droSim1bwaidx c1674_clean.fq
        bwa samse -f c1674_clean.sam droSim1bwaidx c1674_clean.sai c1674_clean.fq
        Also of note, I tried redirecting the output from "bwa bwasw" and "bwa samse" like you recommended but that also fails. Earlier posts on Seqanswers.com suggested that this type of error that I'm seeing might be attributed to the ">" redirection symbol, hence my use of the explicit "-f" option in my commands.

        Best,
        David
        Have you tried putting the command in a file and then execute that file instead of the full command?

        I do that with LSF since the ">" poses a problem. I guess you are not using a queue manager since this appears to be a standalone server.

        If you want to PM me with download info for a small subset of your sequences, I can try to replicate this locally.

        Comment


        • #5
          The files are temporarily available at http://compgen.bscb.cornell.edu/~jm8.../c1674.tar.bz2
          The problem reproduces fairly quickly with "bwa samse".

          This archive should include the:
          1. c1674_clean.fq file
          2. c1674_clean.sai file
          3. BWA droSim1bwaidx* files
          4. BWA executable (for x86_64 Linux machines)
          5. droSim1.fa (if index needs to be rebuilt)


          Thanks,
          David
          Last edited by jaavedm; 07-01-2011, 09:16 AM.

          Comment


          • #6
            Originally posted by jaavedm View Post
            The files are temporarily available at http://compgen.bscb.cornell.edu/~jm889/perm/c1674.tar.bz2
            The problem reproduces fairly quickly with "bwa samse".

            This archive should include the:
            1. c1674_clean.fq file
            2. c1674_clean.sai file
            3. BWA droSim1bwaidx* files
            4. BWA executable (for x86_64 Linux machines)
            5. droSim1.fa (if index needs to be rebuilt)


            Thanks,
            David
            This link requires credentials to complete the download. Can you send those via a PM?

            Comment


            • #7
              Sorry. Can you try the following address instead:

              http://compgen.bscb.cornell.edu/~jm8.../c1674.tar.bz2

              Comment


              • #8
                Originally posted by jaavedm View Post
                Sorry. Can you try the following address instead:

                http://compgen.bscb.cornell.edu/~jm8.../c1674.tar.bz2
                That worked. Thanks.

                Comment


                • #9
                  Documenting the solution for someone doing a search in future:

                  BWA implements two separate alignment algorithms. One is for short reads, requiring "aln" and "samse/sampe" combination.

                  Other ("bwasw") is for long reads. Invoking bwa with "bwasw" makes the .sam file in one step and only works for single-end reads.

                  Comment


                  • #10
                    bwa bwasw output format is sam!

                    Hi GenoMax, thank you very much. I was stuck at this step (bwa bwasw) since this behavior (sam output) is not documented in bwa website.

                    Comment


                    • #11
                      This is a post on the minimum & maximum size a barcode can be:

                      http://beforeitsnews.com/business/2013/02/how-big-or-small-can-my-barcode-label-be-2487584/barcode-image

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin




                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                        04-22-2024, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Today, 08:47 AM
                      0 responses
                      12 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      60 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      59 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      54 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X