Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fastq with reads missing 3rd line

    Hi all,
    I have a set of fastq files. Some of the fastq files have reads which are missing the 3rd line (which begins with +).

    Code:
    @HWI-ST750:151:C1C6AACXX:5:2316:17997:100881 1:N:0:
    AGCGGTNCGCAATATTTTAGTAGCTCGTTACAGTCCGGTGCGTTTTTGGTTTTTTGAAAGTGCGTCTTCAGAGCGCTTTTGGTTTTCAAAAGCGCTCTGAAGTT+
    1:+B+0#2<DDDDIIIIIIFIIIIIIIIIIIIIEEIID?DDDBBADIII@DDDDD@@AAAAAAA??A?ADAAAA?????A>?8<>?>AAAAA8>>?><AAAA:4
    @HWI-ST750:151:C1C6AACXX:5:2316:19129:100793 1:N:0:
    AGCGCTNCTTGATATCAATCAACTGCTAGACAAATCCAATAGTAAATTGGGTAAACCAAATCTCGATATCGACAGCAAAGTATCACAATATGCCTATAACTACA+
    ;1=DD?#2ACDDDIDEEIIIEIEIEIEEIIIIDEIEIIIDEDDCEDDDEID?BBDIDIIIIEECDIDA@DD=A?D@DAAAA@DDDA>AAABE>AAAA>A>A>AA
    @HWI-ST750:151:C1C6AACXX:5:2316:19695:100854 1:N:0:
    AGCGCTNCACCGCGGTAAGCTTTAGCAGATCTCACTTTGTCTAGCGTTTGAACCATGTTTTCAAGGATATTGGCTCTAAGTTGTGGGTATTTTTCGATCACTTC+
    @<1DDD#2<DFDDGI@CEEGHIIIIIIIEGIIHCGHIIIHGGIGIIAFFHFHAHHIG?CCHFHEEBBC@CDCCCCACCCC5>CCBBB'>ACDECCBBDB7?CC>
    And also sequence line contains the + at the end. I guess 3rd line has been concatenated to the end of 2nd line.
    Any thoughts on how to proceed with this kind of data?? Any scripts to change it into proper format ??

  • #2
    If this problem is consistent throughout the file

    Code:
    sed "s/+$/\\`echo -e '\n\r'`+/g" bad.fastq > good.fastq
    should do the trick.

    EDIT: never remember adding a newline with sed to be complicated like that, but just tested on a mac and this was required. Maybe on linux it is simpler but I do not have a system here to test.
    Last edited by jiaco; 12-22-2012, 12:16 AM.

    Comment


    • #3
      Thanks for the snippet jiaco.
      Even I had tried this before but this aslso messes up with quality score line which are ending with +

      And also there are reads having empty lines in between. My question is what is the source of this kind of output?? Is this some sort of of sequencing error ??

      Code:
      @@1D4A#2AFHHFIHIIIIIIIIIIIIIIIIIBHHIIIIIIIIIIIIIIIIIIIIIIIDEHIIIHFFHFEEBDEEECCCBBBCC?CB?CCCBBBBB@BBBBBBB/1
      
      @HWI-ST750:151:C1C6AACXX:5:2316:9996:50328/1
      GGCCCCNATACATTTACTGATTCATCCTCAGCGGACTCTGATATGACATCCACTAAAAAATATGTCAGACCACCACCAATGTTAACCTCACCTAATGACTTTCC+
      =71?A@#23CDCD@E@ED?FEFCEI<ECFEA>CDDD6?BDEEC9<DBEEIC<BEEIE3@8?;=>?BA>A:(;;@;=???3:>>D####################/1
      
      @HWI-ST750:151:C1C6AACXX:5:2316:9999:44022/1
      GGCCACNATCTCGATAATTATAAGATATCTTTAGCACAGGCAAATTGGAACGCAAGCGAAGTTTCGAAAAAGCTAGTAAATATTCAAACAGATGGGTCTATTTC+
      ???D;B#2ADDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIEIIIIIIIIIIID?CDEDDD@AAA?AAAADEAEEDDDEDBA?AAAAA?A>?ADDD3/1

      Comment


      • #4
        You could expand the expression to match
        Code:
        /^[ACGT].*+$/
        to avoid quality lines, but I have no idea where you got the file, let alone how it got corrupted.

        EDIT: saw your new example just now, there is an issue with this file. Maybe someone else has seen it before.
        But I would not try to fix this mess. You need to re-acquire the data.

        Comment


        • #5
          Yes, sequence files were given to me by our sequence provider, which I demultiplexed. But after demultiplexing this is the result. May be there is an issue with this. Anyways I will contact them. Thanks for the suggestion.

          Comment


          • #6
            How do the original files look like? Format?
            How did you multiplex? What program?

            Sven

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-27-2024, 06:37 PM
            0 responses
            13 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-27-2024, 06:07 PM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            69 views
            0 likes
            Last Post seqadmin  
            Working...
            X