Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • [bwa_read_seq] the maximum barcode length is 15

    Hi

    I am trying to align some Sanger sequencing reads to the D. simulans assembly with BWA and I'm getting the error "the maximum barcode length is 15" when converting from BWA output format to SAM. Actually, I'm trying to align the same reads from when this genome was first assembled back unto the droSim1 reference assembly (downloadable from UCSC Genome Browser).

    Here are the commands I've used to align the reads and convert to SAM format:

    bwa bwasw -t 16 -f c1674_clean.sam droSim1bwaidx c1674_clean.fq
    bwa samse -f c1674_clean.sam droSim1bwaidx c1674_clean.sai c1674_clean.fq


    It is the last command that outputs
    [bwa_read_seq] the maximum barcode length is 15
    I happens almost immediately as well once the command is executed. I don't think it is an out-of-memory issue like other posts seem to suggest.


    I'm using BWA version 0.5.9-r16 on RHEL5.5 machine with 16 processors and 48GB of RAM. As far as I can tell no barcoding was done with Sanger sequencing.

    The input files are large (659MB), but I can put it up somewhere for download temporarily.

    Any help is greatly appreciated.
    Thanks,
    David

  • #2
    Originally posted by jaavedm View Post
    Hi

    I am trying to align some Sanger sequencing reads to the D. simulans assembly with BWA and I'm getting the error "the maximum barcode length is 15" when converting from BWA output format to SAM. Actually, I'm trying to align the same reads from when this genome was first assembled back unto the droSim1 reference assembly (downloadable from UCSC Genome Browser).

    Here are the commands I've used to align the reads and convert to SAM format:

    bwa bwasw -t 16 -f c1674_clean.sam droSim1bwaidx c1674_clean.fq
    bwa samse -f c1674_clean.sam droSim1bwaidx c1674_clean.sai c1674_clean.fq


    It is the last command that outputs
    [bwa_read_seq] the maximum barcode length is 15
    I happens almost immediately as well once the command is executed. I don't think it is an out-of-memory issue like other posts seem to suggest.


    I'm using BWA version 0.5.9-r16 on RHEL5.5 machine with 16 processors and 48GB of RAM. As far as I can tell no barcoding was done with Sanger sequencing.

    The input files are large (659MB), but I can put it up somewhere for download temporarily.

    Any help is greatly appreciated.
    Thanks,
    David

    David,

    Perhaps I am overlooking something but shouldn't the sequence of commands be this (<db.fasta> is your "reference"):

    1. bwa index -a bwasw <db.fasta> (*build index*)
    2. bwa aln -t 16 <db.fasta> c1674_clean.fq > c_1674_clean.sai (*do alignment*)
    3. bwa samse <db.fasta> c1674_clean.sai c1674_clean.fq > c_1674_clean.sam (convert to sam)

    Comment


    • #3
      GenoMax,

      Thanks for your response. I omitted the indexing step. Here are the three bwa commands I executed for completeness:

      Code:
      bwa index -p droSim1bwaidx -a bwtsw droSim1.fa
      bwa bwasw -t 16 -f c1674_clean.sam droSim1bwaidx c1674_clean.fq
      bwa samse -f c1674_clean.sam droSim1bwaidx c1674_clean.sai c1674_clean.fq
      Also of note, I tried redirecting the output from "bwa bwasw" and "bwa samse" like you recommended but that also fails. Earlier posts on Seqanswers.com suggested that this type of error that I'm seeing might be attributed to the ">" redirection symbol, hence my use of the explicit "-f" option in my commands.

      Best,
      David

      Comment


      • #4
        Originally posted by jaavedm View Post
        GenoMax,

        Thanks for your response. I omitted the indexing step. Here are the three bwa commands I executed for completeness:

        Code:
        bwa index -p droSim1bwaidx -a bwtsw droSim1.fa
        bwa bwasw -t 16 -f c1674_clean.sam droSim1bwaidx c1674_clean.fq
        bwa samse -f c1674_clean.sam droSim1bwaidx c1674_clean.sai c1674_clean.fq
        Also of note, I tried redirecting the output from "bwa bwasw" and "bwa samse" like you recommended but that also fails. Earlier posts on Seqanswers.com suggested that this type of error that I'm seeing might be attributed to the ">" redirection symbol, hence my use of the explicit "-f" option in my commands.

        Best,
        David
        Have you tried putting the command in a file and then execute that file instead of the full command?

        I do that with LSF since the ">" poses a problem. I guess you are not using a queue manager since this appears to be a standalone server.

        If you want to PM me with download info for a small subset of your sequences, I can try to replicate this locally.

        Comment


        • #5
          The files are temporarily available at http://compgen.bscb.cornell.edu/~jm8.../c1674.tar.bz2
          The problem reproduces fairly quickly with "bwa samse".

          This archive should include the:
          1. c1674_clean.fq file
          2. c1674_clean.sai file
          3. BWA droSim1bwaidx* files
          4. BWA executable (for x86_64 Linux machines)
          5. droSim1.fa (if index needs to be rebuilt)


          Thanks,
          David
          Last edited by jaavedm; 07-01-2011, 09:16 AM.

          Comment


          • #6
            Originally posted by jaavedm View Post
            The files are temporarily available at http://compgen.bscb.cornell.edu/~jm889/perm/c1674.tar.bz2
            The problem reproduces fairly quickly with "bwa samse".

            This archive should include the:
            1. c1674_clean.fq file
            2. c1674_clean.sai file
            3. BWA droSim1bwaidx* files
            4. BWA executable (for x86_64 Linux machines)
            5. droSim1.fa (if index needs to be rebuilt)


            Thanks,
            David
            This link requires credentials to complete the download. Can you send those via a PM?

            Comment


            • #7
              Sorry. Can you try the following address instead:

              http://compgen.bscb.cornell.edu/~jm8.../c1674.tar.bz2

              Comment


              • #8
                Originally posted by jaavedm View Post
                Sorry. Can you try the following address instead:

                http://compgen.bscb.cornell.edu/~jm8.../c1674.tar.bz2
                That worked. Thanks.

                Comment


                • #9
                  Documenting the solution for someone doing a search in future:

                  BWA implements two separate alignment algorithms. One is for short reads, requiring "aln" and "samse/sampe" combination.

                  Other ("bwasw") is for long reads. Invoking bwa with "bwasw" makes the .sam file in one step and only works for single-end reads.

                  Comment


                  • #10
                    bwa bwasw output format is sam!

                    Hi GenoMax, thank you very much. I was stuck at this step (bwa bwasw) since this behavior (sam output) is not documented in bwa website.

                    Comment


                    • #11
                      This is a post on the minimum & maximum size a barcode can be:

                      http://beforeitsnews.com/business/2013/02/how-big-or-small-can-my-barcode-label-be-2487584/barcode-image

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      22 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      24 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      20 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      52 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X