Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • all_your_base
    Member
    • Mar 2012
    • 40

    Problem with cmpfastq, can't process my .fastq /1 and /2 files

    Hi,

    I am having a problem using cmpfastq, even if I've been using it reliably for months.

    Normally, I can grab my trimmed 1_1.fastq and 1_2.fastq, process it through cmpfastq, and get out my .common.out and .unique.out files for downstream processes. However, a couple data sets are really giving my trouble... the cmpfastq spits out all error messages for every line of .fastq and fails to generate the appropriate files.

    Here is a sample of the output data:

    BEGIN cmpfastq3 on TpruniS3_1.trimmed TpruniS3_2.trimmed at Wed Oct 10 15:00:08 EDT 2012
    Could not match the sequence ID from the name: @M00649:2:000000000-A1721:1:1101:17085:1532/2
    Could not match the sequence ID from the name: TACTCCTACTGCGCAGCAATATTATTCTTTCGTTAGAGCTAAAAGGCAGAGTGGGAATCGAACCCACTTCGTTAGATTTGCAATC
    Could not match the sequence ID from the name: +
    Could not match the sequence ID from the name: 555??BBDDDDDDBDCCFFFFEFI;BEFHIIHFHFHH@@GHHIFHHHFEFH8CD@@BFD@EFHCEEHECFFHIIFHHDFGHIIHH
    Could not match the sequence ID from the name: @M00649:2:000000000-A1721:1:1101:16787:1535/2
    Could not match the sequence ID from the name: TAGACGTTTAAGTGACACCGAAAGAAGAAAGAGCTTTGTAGATGCTTAGCGCGGTCTACGAGCCTGGCGGATCAGAAAGCGGAAG
    Could not match the sequence ID from the name: +
    Could not match the sequence ID from the name: 5<?????DDDDDBDBFFFFFFHDACFHFHHB=CFDGHHHEDGGFGFGGHIHHC>EDEHHHHHHHB@?DHHCHHFFHHD=F;A@EE
    Could not match the sequence ID from the name: @M00649:2:000000000-A1721:1:1101:14795:1537/2
    Could not match the sequence ID from the name: AACGGAGCGAAGGATTTTAGCTTCACGAATTTCCCAAACTTGGCGAGGTCCTGTGTCGATTCCCGGACTTCCTTGGTCTTTGCGCC
    Could not match the sequence ID from the name: +
    Could not match the sequence ID from the name: 5<????@DDDDBDDBFFFFFFIIIHIIHHEHIHIIIFHHH/AFFCH++?EE?EFGGHHFF-CA-5CEEAGH,CCDF@DBGDFFCEE


    Does anyone have an idea?

    Thanks for the help!
  • Torst
    Senior Member
    • Apr 2008
    • 275

    #2
    Neverending Illumina format changes

    I don't really know anything about 'cmpfastq' but I've had a look at the source code:


    From what I can tell, it expects the ID line to match this pattern /^@(.*)#.*/
    which means an @ followed by some chars, then a # followed by some chars.

    Your IDs do not fit this pattern, because you don't have the #xxxxx part.

    Illumina used to use #AGCTCG to denote barcodes in multiplex samples. These days it uses a different format, or doesn't print it at all.

    To make it work with your data, change it to /^@(.*)(#.*)?/ or /^@(.*)/

    Good luck.

    Comment

    • all_your_base
      Member
      • Mar 2012
      • 40

      #3
      Thank you very much for the reply. You have correctly identified the problem, and I can now resolve it to work with MiSeq reads. Thanks again for the insight!

      Comment

      • safina
        Member
        • May 2014
        • 19

        #4
        Hello. Im having the same probl;em and i tried changing the pattern to match my header but it posted all my reads to a unique file where as common files remains empty. please help

        Comment

        • Brian Bushnell
          Super Moderator
          • Jan 2014
          • 2709

          #5
          What exactly are you trying to do? I have a program called "filterbyname" that can probably do it...

          Comment

          • safina
            Member
            • May 2014
            • 19

            #6
            Pairing of fastq files(F/R)

            Im trying to pair my fastq files after quality filtering and trimming of those files via FASTQC. My files look like these:

            mexD1B_filt_trim_1.fastq <==
            @MexD1BSRR1562087.10.1/1
            GAGCTAGATCAGCACCATATATTACACGATGATCAGCTGTAACATTTACCTGCATCTGGTTCTTCATTCCTATCCGACCATCCTTGG
            +SRR1562087.10.1/1
            JJJJJJIIJJJJJJJJIJJJJJJJJJJJJJJJIJJJJJJJGIIJJJJIJJJJJJJJJIJJJJDHIHHHHHHHFDFFDDDDDDDDD>C
            @MexD1BSRR1562087.11.1/1
            AGGTTGACTATGGTCCAGGCCATGCCAGGAGAGCAACCGAAAACAGAGAGAACGGTAAGCCAGGAGAAGAACAGTATGAGTATATAG
            +SRR1562087.11.1/1
            IJJGHIJIIIFIBHHGAFHGGIHJIJGJEGIGGGHGIJJJJHHGFEFEDACEEDDBDBCCCDDDDDDBDDDCDDCADDDCCCDDDDD
            @MexD1BSRR1562087.15.1/1
            TAACATCCACAATCTCCTTCTACCCAAGAAGTCTGGAACTTCAGCATCAAAGGCTGGTGATGACGACAACTAATCCATTTACTGAAT



            ==> mexD1B_filt_trim_2.fastq <==
            @MexD1BSRR1562087.7.2/2
            CCTGTAGATATACGTACTGCCAAAGGGTAGATAGTTGCCCATCTCAGAAAACACAACTTCAACAGCCAAGATTAATATCCATGTGAT
            +SRR1562087.7.2/2
            IJJJGGJBHIJJGHHHIIHJJGJGJIIDFHIJIJJJGHJJJJJJJIJGIGH@FHJIJIHIIIHHH=BDFFAEECCEEFDEDDCDCA>
            @MexD1BSRR1562087.9.2/2
            GTAATCCAAATAAGGTATACTCACTCATCGGAGGATTTTGTGCTTCCCCTGTGAATTTCCACGCTAAGGATGGCTCCGGCTATAAAT
            +SRR1562087.9.2/2
            JIJIIJJJGGIIJIBC@FH@HHJGIJGCHGIEGIFHDFHJIJIJIHHIIIIJGGHHHHHCDDFDDDBDDDDDDDCDBDDBD@CDCEE
            @MexD1BSRR1562087.11.2/2
            GAAACACTGATTGGTTCACGTATCCAGGTGTATGGACCACCTATATACTCATACTGTTCTTCTCCTGGCTTACCGTTCTCTCTGTTT

            Comment

            • GenoMax
              Senior Member
              • Feb 2008
              • 7142

              #7
              @safina: You should use a program called repair.sh that is part of BBMap package. Brian has an example posted here: http://seqanswers.com/forums/showpos...0&postcount=45

              Your command would look something like this:
              Code:
              $ repair.sh in1=mexD1B_filt_trim_1.fastq in2=mexD1B_filt_trim_2.fastq out1=mexD1B_filt_trim_1_fixed.fq out2=mexD1B_filt_trim_2_fixed.fq outsingle=single.fq

              Comment

              Latest Articles

              Collapse

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Yesterday, 11:58 AM
              0 responses
              9 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              25 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              35 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              57 views
              0 reactions
              Last Post SEQadmin2  
              Working...