Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • zszong@hotmail.com
    Member
    • Sep 2012
    • 17

    How to remove the newlines in pacific biosciences fastq file

    Hi All,

    Hope someone could help me out here.
    I am trying to analyze a pacbio data set. Because of long reads, the sequences and quality scores have multiple lines with 51 characters per line. When I ran this through fastqc to check quality and statistics, it complains about the format because there are multiple lines. My question is how I can concatenate the sequence into one line and quality score into another line.

    Thank you very much!
  • Richard Finney
    Senior Member
    • Feb 2009
    • 701

    #2
    Please post an example of the data you need to fix.

    Comment

    • zszong@hotmail.com
      Member
      • Sep 2012
      • 17

      #3
      Thank! Richard.

      An example sequence is below. For the sequence and qualtiy score, there are 5 lines each. I am trying to concatenate them separately.

      Orignal format:

      @chlamy1234
      ATGTGGGCCCAATTTATGTGGGCCCAATTTATGTGGGCC
      CAATTTATGTGGGCCCAATTTATGTGGGCCCAATTTATG
      GGGCCCAATTTATGTGGGCCCAATTTATGTGGGCCCAAT
      TTATGTGGGCCCAATTTATGTGGGCCCAATTTATGTGGG
      CCCAATTTATGTGGGCCCAATTTATGTGGGCCCAATTT
      +
      %^&*$%^&^%^&*$%^&^%^&*$%^&^%^&*$%
      ^&^%^&*$%^&^%^&*$%^&^%^&*$%^&^%^&
      *$%^&^%^&*$%^&^%^&*$%^&^%^&*$%^&^
      %^&*$%^&^%^&*$%^&^%^&*$%^&^%^&*$%
      ^&^%^&*$%^&^%^&*$%^&^%^&*$%^&^$)


      format to be converted to:
      @chlamy1234
      ATGTGGGCCCAATTTATGTGGGCCCAATTTATGTGGGCCCAATTTATGTGGGCCCAATTATGTGGGCCCAATTTATGGGGCCCAATTTATGTGGGCCCAATTTATGTGGGCCCAATTTATGTGGGCCCAATTTATGTGGGCCCAATTTATGTGGGCCCAATTTATGTGGGCCCAATTTATGTGGGCCCAATTT
      +
      %^&*$%^&^%^&*$%^&^%^&*$%^&^%^&*$%^&^%^&*$%^&^%^&*$%^&^%^&*$%^&^%^&*$%^&^%^&*$%^&^%^&*$%^&^%^&*$%^&^%^&*$%^&^%^&*$%^&^%^&*$%^&^%^&*$%^&^%^&*$%^&^%^&*$%^&^%^&*$%^&^$)

      Comment

      • Richard Finney
        Senior Member
        • Feb 2009
        • 701

        #4
        cat filename.fastq| awk '{p=(NR%12); printf "%s",$0 ; if ((p==1)||(p==6)||(p==7)||(p==0)) printf "\n"}' > newfilename.fastq

        Comment

        • zszong@hotmail.com
          Member
          • Sep 2012
          • 17

          #5
          Thank you Richard! As you can tell that I am new in this area. Apparently, it works for this particular example. But I am dealing with about 1 million reads. The length of every reads varies, which means one has five lines long (as this example) and another has 20 lines long. I think there must be a better way to decide which lines needs to be concatenated.

          Your help is greatly appreciated.

          Stuart

          Comment

          • flxlex
            Moderator
            • Nov 2008
            • 412

            #6
            seqtk to the rescue: https://github.com/lh3/seqtk

            Code:
            seqtk seq -l 0 infile.fastq > outfile.fastq
            should do it...

            Comment

            • zszong@hotmail.com
              Member
              • Sep 2012
              • 17

              #7
              Thank you flxlex. will try it out and let you know if it works.

              Comment

              • zszong@hotmail.com
                Member
                • Sep 2012
                • 17

                #8
                It worked perfectly. thanks, flxlex

                Comment

                Latest Articles

                Collapse

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 06-09-2026, 11:58 AM
                0 responses
                30 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-05-2026, 10:09 AM
                0 responses
                38 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-04-2026, 08:59 AM
                0 responses
                43 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 12:03 PM
                0 responses
                64 views
                0 reactions
                Last Post SEQadmin2  
                Working...