Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa paired read FR issue

    Hi,

    I have been trying to run the following bwa mem command:

    bwa mem -M -t 16 w_ind.fa SRR412532_1.fastq.gz SRR412532_2.fastq.gz | samtools view -Sb - | samtools sort - SRR412532.sorted && samtools index SRR412532.sorted.bam

    And for each iteration get the following:

    [M:rocess] read 1777778 sequences (160000020 bp)...
    [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 2, 0, 0)
    [M::mem_pestat] skip orientation FF as there are not enough pairs
    [M::mem_pestat] skip orientation FR as there are not enough pairs
    [M::mem_pestat] skip orientation RF as there are not enough pairs
    [M::mem_pestat] skip orientation RR as there are not enough pairs
    [M::mem_process_seqs] Processed 1777778 reads in 75.678 CPU sec, 22.324 real sec

    This shoud be a paired read dataset and when I check the FASTQ files I see:

    gzcat SRR412532_1.fastq.gz | head
    @SRR412532.1 FCC02L0ACXX:3:1101:1130:1956/1
    NGAGCTTCAGGCCCAGGCCAAGGCTTACTTTGAGAAGACGCAGGAGCAGCTAACACCCCTGGTCAAGAAGGCCGGAACTGACCTGATCAA
    +
    #1=DDDDEHHHGGIGHI=FHGGGIG>BAFGHC>=BCGG>:?DG?CB;FECE>CC@G1=5,,;?;@ACA;>;2=;'820>@>>>C<>@@A>
    @SRR412532.2 FCC02L0ACXX:3:1101:1035:1967/1
    NACAAAAAGCAAAATGAATCTAGCTGTCCCTGTCCTGGCCGGTATTCCATCTTCTAGAACCTGTTTCCGTGTTTTTCCTGGAGTGTCTGC
    +
    #1:BDFFFF?FHHIGEHIB@AHIGICFDEGGIHGEEHIIGEG?FFEDHIGGGIIIG;CGGGEGGHGA3;.;.?B@=@AC>>55(5>5;A#
    @SRR412532.3 FCC02L0ACXX:3:1101:1464:1954/1
    NTGCTTTTCTGCCCTGGAAGTTGATGAGGCATATGTTCCCAAAGAGTTTAACGCTGAAACATTCACCTTCCATGCAGACTTATGCACACT

    gzcat SRR412532_2.fastq.gz | head
    @SRR412532.1 FCC02L0ACXX:3:1101:1130:1956/2
    GTTGGGAGAGGGCTCCAGACCTGGCCAGTGGGGGTTCTAGGGGCCAGCAGGGGAGGGAAGACAATGGTCTGGACGCCTCACTGGGTGGCA
    +
    @B@?DDDDAHFHHIFHIJIGIGCHHIGI3?FGII7;FE@GHIBHAEB?@E;>>8?@2'588ACCCDC(44?9@AB><B9(8@A#######
    @SRR412532.2 FCC02L0ACXX:3:1101:1035:1967/2
    GGGTTTAGGGACCTCCCTGGGTGGAACAATCTACACGGTTGAAGCACTAGCAGGGAGGTTCTAGTGGGATCACACGTTTATTAACATGCT
    +
    ?@@=BDFFGHDFFGGHIGIHCFEHHGGHGHGGGGGHIDBFHIJCFHCAGIFGHGIHEB?;BBC>;AC6A>?=CDCB?A<BDDCDDCA@CC
    @SRR412532.3 FCC02L0ACXX:3:1101:1464:1954/2
    GGCTTGTGTTTCACCAGCTCAGCAAGTGCAGATTGTTTCTTGACTTGTTTCTCAGCCTCAGGAAGTGTGCATAAGACTGCATGGAAGGTG

    I am not sure why there are not enough FR pairs as the reads seem to be in the same order with matching headings... Am probably missing something obvious!!! Any ideas??

    All help much appreciated!!

    Thanks in advance

  • #2
    Are you using current versions of bwa/samtools?

    Comment


    • #3
      bwa version 0.7.12 / samtools 1.2

      Have tried a couple of other SRA datasets with this bwa install and they map really nicely. Bit stumped as to what's different about this one...

      Comment


      • #4
        Have you done a quick count on the two files to make sure that there is an equal number of reads in them? (grep for "^@SRR412532" and do a line count on both files).

        Comment


        • #5
          Ooh no I hadn't thought of that! Just given that a try now but both showing same count (26,073,197)

          Comment


          • #6
            But it appears that bwa is only reading 1,777,778 sequences based on the log above so something is wrong with the files. Perhaps you can try to download them again?

            You can find the fastq files at EBI SRA: http://www.ebi.ac.uk/ena/data/view/SRR412532 if you don't want to deal with SRA.

            Comment


            • #7
              Have just downloaded the FASTQs from the EBI again (sorry, I should have mentioned I bypassed SRA) and still the same issue. Is it worth downloading them as SRAs with fastq-dump even though it will take longer?

              Comment


              • #8
                Since files at EBI have this problem it may be worth a shot getting them using fastq-dump from SRA.

                There are instances where the original data as submitted may have a problem. If the files from SRA don't work either then you should contact SRA support and/or the original submitter to let them know.
                Last edited by GenoMax; 01-21-2016, 04:57 AM.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                17 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                22 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                46 views
                0 likes
                Last Post seqadmin  
                Working...
                X