Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sequence Alignment

    I have separate read 1 and read 2 fastq whole genome data and want go for their alignment. Just want to clear, shall I align them separately or first I will concatenate them and then make the alignment together?
    Thanks,

  • #2
    Hi dear,
    The data you have is the paired end data, with read 1 and read 2 fastq files are the short read data files with some insert size, most of the times it is 200-400 bps. Ask for the insert size of your data. You can use bowtie2, MAQ or BWA softwares for mapping the data onto the reference genome, choice of mapping software will depend on the read length/insert size. If you want to run bowtie, you may consider the following commands:

    You will first need to create a bowtie index of your "reference genome". Run bowtie-build on the genome fasta file:

    bowtie2-build Genome.fa Genome.build
    This will create six new files which constitute the bowtie index necessary for bowtie:

    Genome.build.1.ebwt Genome.build.4.ebwt
    Genome.build.2.ebwt Genome.build.rev.1.ebwt
    Genome.build.3.ebwt Genome.build.rev.2.ebwt

    We can now map paired reads in fastq format to the Genome.build reference sequence:

    bowtie2 -S -q --solexa1.3-quals -p 1 -I 100 -X 600 --fr Genome.build -1 read1.fastq -2 read2.fastq OUT.sam

    -S: output will be in SAM format.
    -q: Quality scores of your data.
    -p: Number of processors you want to use.
    -l: Minimum insert size.
    -X: maximum possible insert size.
    -fr: read files are in forward-reverse order.
    -1: read 1 fastq file
    -2: read 2 fastq file
    OUT.sam is the output file

    Hope this will be helpful.

    Best wishes,
    Rahul
    Rahul Sharma,
    Ph.D
    Frankfurt am Main, Germany

    Comment


    • #3
      Thanks Rahul,

      That means there is no need to combine read 1 with read 2 for alignment.
      Thanks,

      Comment


      • #4
        Yes, there is no need to concatenate the read files, you will lose the pair information. And many of the reads will not map, as they will have insert in-between. Look at the following example:

        |=================================| (Reference genome)
        (1)-------> <-------- (2)
        (1)--------> <--------(2)
        (1)--------> <--------(2)

        === is the reference genome.
        (1)-----> read 1, your read 1.fastq file will contain all the (1)-----> reads.
        <-------(2) read 2, your read 2.fastq file will contain all the <------(2) reads.

        Please refer these terms: Paired end data, mate pairs, insert size, read quality scores, read coverage. It would help you in your analysis.
        Best wishes,
        Rahul
        Rahul Sharma,
        Ph.D
        Frankfurt am Main, Germany

        Comment


        • #5
          Distance in between (1)------> <-------(2) is called insert size.
          Rahul Sharma,
          Ph.D
          Frankfurt am Main, Germany

          Comment


          • #6
            Can we concatenate read1 with read1, read2 with read2 (same sample) as I have few samples which I have run two times because of low coverage?
            Thanks,

            Comment


            • #7
              I suggest that you do not concatenate anything. Read pairs are random fragments. Concatenating one read1 with another read1 would be useless. You would lose the map-ability of the read and lose valuable information

              Comment


              • #8
                Then how can I go for their alignment as I have around 30 fastq files for one sample generated by CASAVA?
                Thanks,

                Comment


                • #9
                  Oh sorry my mistake, from your last post I thought you wanted to actually physically join each read.

                  If you have many 30 fastq files then yes you can just concatenate them together to create one fastq read library for each read pair.

                  ie.

                  Code:
                  cat *R1.fastq > all.reads.R1.fastq
                  cat *R2.fastq > all.reads.R2.fastq

                  Comment


                  • #10
                    Hi,
                    Ya you can concatenate the files, but not reads. Please check the Id's in all of your files,
                    they should be unique before "cat" command.
                    You may also consider some data preprocessing methods: To trim the adapters and primers, discard low quality reads(with its pair), discard reads with more than 5%-10% N's in it. This can improve your analysis. Please check the quality of your reads with FASTQC or FASTX tools.
                    Best wishes,
                    Rahul
                    Rahul Sharma,
                    Ph.D
                    Frankfurt am Main, Germany

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Genetic Variation in Immunogenetics and Antibody Diversity
                      by seqadmin



                      The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                      11-06-2024, 07:24 PM
                    • seqadmin
                      Choosing Between NGS and qPCR
                      by seqadmin



                      Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                      10-18-2024, 07:11 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 11-08-2024, 11:09 AM
                    0 responses
                    205 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 11-08-2024, 06:13 AM
                    0 responses
                    151 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 11-01-2024, 06:09 AM
                    0 responses
                    80 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-30-2024, 05:31 AM
                    0 responses
                    26 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X