Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge multiple fq read files

    I have multiple Illumina Hi-seq 2000 fastq.gz files for each individual as follows;

    sample1 lane1 read 1_001
    sample1 lane1 read 1_002
    sample1 lane1 read 2_001
    sample1 lane1 read 2_002

    sample1 lane2 read 1_001
    sample1 lane2 read 1_002
    sample1 lane2 read 2_001
    sample1 lane2 read 2_002

    sample1 lane3 read 1_001
    sample1 lane3 read 1_002
    sample1 lane3 read 2_001
    sample1 lane3 read 2_002

    they are all for a single individual. what script in Linux console would be best to merge all files in a final merged file? the idea is to do analysis in Galaxy.

    THank you all.
    Last edited by hosseinv; 01-27-2015, 10:18 PM. Reason: Typos

  • #2
    They are paired-end reads.
    Last edited by hosseinv; 01-27-2015, 10:20 PM. Reason: Wrong info

    Comment


    • #3
      Galaxy can concatenate files together, and might actually be easier to upload each file one by one to Galaxy (gzip them first), since uploading a single large file is harder.

      If you do want to concatenate files at the command line, you can use the command 'cat', as in:

      cat fileA fileB fileC > combinedfileABC

      Comment


      • #4
        Thanks Peter for your answer
        the thing that I want to know is the order of putting files in my cat command. if you have a look on my example files you'll see I've got sequences of one individual (sample 1) in 3 lanes and different number of reads both for read 1s (001 and 002) and read 2s (001 nad 002) in each lane.
        plus I know how to merge two files in Galaxy, but don't know how to merge multiple files.

        Thanks again.
        Last edited by hosseinv; 01-27-2015, 10:22 PM.

        Comment


        • #5
          Not sure why you have duplicates of sample1 lane1 read 1_002, sample1 lane2 read 1_002, and sample1 lane3 read 1_002. Typo from the sequencing lab? If all you want to do is QC (via FastQC for example) I don't think it matters what order they are in. Also, if you do want to do FastQC you should assess each lane separately. If you want to do something else to them, the order or merging (or whether to merge them at all) depends on what that "something else" is.

          Also, you can merge multiple files in GALAXY, use "concatenate head-to-tail". I just use cat, it's quicker.
          Last edited by DFJ111; 08-16-2012, 07:20 PM.

          Comment


          • #6
            Thanks DFJ111

            It's not a typo. The thing that I want to do is to map the sequences against the reference genes and find the polymorphism.

            Comment


            • #7
              Which tool are you planning to use for the mapping, and does it require the paired reads in any specific order (e.g. interleaved in one file) or as separate files (forward and reverse reads)?

              Comment


              • #8
                Hi maubp

                I will be using BWA for mapping. and I think I'll hav to treat each of the reads individually and at some stage I can pool the BAM or SAM files together.

                Thanks for your attention .

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-27-2024, 06:37 PM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-27-2024, 06:07 PM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                68 views
                0 likes
                Last Post seqadmin  
                Working...
                X