Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Splitting concatenated PE fastq to two files for respect reads

    I have a fastq file that is read1 and read2 split, concatenated and shuffled to form 1 file. However, I need it as read1 and read2 files separate for bwa alignment, does anyone know how to do this?
    I don't have the GERALD file, this processed fastq file came from a core facility so I'm stuck at this point. Can anyone help?
    Last edited by JayM; 11-04-2010, 04:24 AM.

  • #2
    You might be able to grep it out. Something like:

    grep -A 3 pattern_that_is_only_in_read_1_sample_name combined_file.fq > read1.fq

    Comment


    • #3
      Assuming single.

      Code:
      $ cat ./single.fq | ruby -ne 'BEGIN{@i=0} ; @i+=1; puts $_  if @i.to_s =~ /[1234]/; @i = 0 if @i == 8' > one.fq && cat single.fq | ruby -ne 'BEGIN{@i=0} ; @i+=1; puts $_  if @i.to_s =~ /[5678]/; @i = 0 if @i == 8' > two.fq
      Also, you may want to check the fastx package. It should include that feature.
      -drd

      Comment


      • #4
        Originally posted by drio View Post
        Assuming single.

        Code:
        $ cat ./single.fq | ruby -ne 'BEGIN{@i=0} ; @i+=1; puts $_  if @i.to_s =~ /[1234]/; @i = 0 if @i == 8' > one.fq && cat single.fq | ruby -ne 'BEGIN{@i=0} ; @i+=1; puts $_  if @i.to_s =~ /[5678]/; @i = 0 if @i == 8' > two.fq
        Also, you may want to check the fastx package. It should include that feature.
        I take it 'assume single' here refers to assume single [input] file with read1 and read2.

        Comment


        • #5
          Originally posted by swbarnes2 View Post
          You might be able to grep it out. Something like:

          grep -A 3 pattern_that_is_only_in_read_1_sample_name combined_file.fq > read1.fq
          But how do you grep for read1 and not read2 from a paired end fastq given that essentially the whole name is identical except one character at the end and there are millions of such scenarios in the file...?
          I'm just thinking about which pattern that could be.

          Comment


          • #6
            Originally posted by drio View Post
            Assuming single.

            Code:
            $ cat ./single.fq | ruby -ne 'BEGIN{@i=0} ; @i+=1; puts $_  if @i.to_s =~ /[1234]/; @i = 0 if @i == 8' > one.fq && cat single.fq | ruby -ne 'BEGIN{@i=0} ; @i+=1; puts $_  if @i.to_s =~ /[5678]/; @i = 0 if @i == 8' > two.fq
            Also, you may want to check the fastx package. It should include that feature.
            Wow! Thanks, it worked and an arbitrary inspection of the respective reads seems to confirm a perfect split into read1 and read2.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            27 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            30 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            26 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X