Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • >90% aligned concordantly 0 times ChIP-seq Bowtie2

    Hi,

    I know this question have raised many times here and in other forums but I've tried everything other people suggested in previous posts and still can't figure it out where is the problem...... This is the output summary of Bowtie2 (I checked several times the target genome, hg19 and it's correct):

    21404130 reads; of these:
    21404130 (100.00%) were paired; of these:

    21196512 (99.03%) aligned concordantly 0 times
    104527 (0.49%) aligned concordantly exactly 1 time
    103091 (0.48%) aligned concordantly >1 times
    ----
    21196512 pairs aligned 0 times concordantly or discordantly; of these: 42393024 mates make up the pairs; of these:

    906397 (2.14%) aligned 0 times
    28296440 (66.75%) aligned exactly 1 time
    13190187 (31.11%) aligned >1 times
    97.88% overall alignment rate

    I have paired-end ChIP-seq data, 50 bp reads. These are my steps (in Galaxy):

    1) I groomed the fastq files to get fastqsanger (I checked if it's correct: Input FASTQ quality scores type --> Sanger & Illumina 1.8+)

    2) FastQC is ok for all samples, some adapter contamination.

    3) I trimmed using either TrimGalore or Trimmomatic (I get the same result using both) using forward and reverse files for each sample. I checked and I didn't missed up the forward and reverse samples (I run Bowtie2 using first file 1 and file 2 and also using first file 2 and second file 1 and I get the same result...)

    4) I mapped the trimmed files to hg19 genome using Bowtie2 with option -fr and I get the result shown above. No matter what I try, I always get the same with the different samples that I have. All of them have around 32% of reads with adapters which I trimmed. I have no idea what to do with this.

    Any suggestions or idea of where is the mistake? Am I missing something?

    Thank you very much in advance.

    Gema

  • #2
    It appears that your reads are not mapping concordantly. One possible reason is when you did the trimming you may not have used both files (R1/R2) at the same time which could have lead to reads being out of order in the two files. You should always trim paired-end data files together.
    Last edited by GenoMax; 07-13-2017, 07:44 AM.

    Comment


    • #3
      Hi GenoMax, thanks for your reply.

      Both files were trimmed together with the option paired-end library... that's why I don't know the reason of disconcordant reads....

      Comment


      • #4
        Have you examined the BAM files using a genome browser (e.g. IGV) to see if the two reads are mapping within expected distance (it appears that they perhaps are not, if the concordance message is to be believed).

        Comment


        • #5
          Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and so much more from users.


          this is an example of how the reads look like

          Comment


          • #6
            Any idea??

            Comment


            • #7
              It is difficult to know from that image since unlike IGV the two reads that belong to a fragment are not identifiable (IGV can display them as pairs). Default values for -I and -X are 0 and 500. Did you change either of them?

              If you have the read files available locally can you check the insert sizes using BBMap program (it is pure Java and will run on PC/Mac)? See this answer to get the commands. You can add reads=1000000 to the command to use 1 million reads (instead of the full dataset).

              Comment


              • #8
                for -I, I used 0, for -X I tried 500 and 1000 with no difference.

                Comment


                • #9
                  This is an example of the first reads from TrimGalore output files 1 and 2 (as shown in Galaxy with the "eye" button):

                  File1:

                  @HWI-ST1018:141:H0HVBADXX:1:1101:1235:1969 1:N:0:CAGATC CCCANGATCTGTTCCACAGGAGATAAGCAGATCTTACTCCAGAGACCACTG + BB<f#0<b<fbffbfbbbff<b&lt;<ffffiibfffbf<fbbb<fb<ffiff7< p="">

                  @HWI-ST1018:141:H0HVBADXX:1:1101:1134:1970 1:N:0:CAGATC TGGCNTATGAAGTTCAGTGTTCTTTGGCTTGTTAGTCAGAACTGTTGC + BBBF#0BFFFFFFIFFFIIIIIFIIIFFIIIIIIIIIIIIIIFIIIII

                  @HWI-ST1018:141:H0HVBADXX:1:1101:1153:1984 1:N:0:CAGATC ACCTCTGTCTCCCAGGTTCAAGCGATTCTCCTGCCTCAGCCTCCCGAGT + BBBFFF<0B<b<fbfb&lt;0<ffb0fbb70bbbb0bff&lt;<bb<bfbfbfbb< p="">

                  File2:

                  @HWI-ST1018:141:H0HVBADXX:2:1101:1184:1971 1:N:0:CAGATC CTGATCAGAGGAGGAACATGACTAATCTATGGGCAGCCTACACTGAAGGC + BBBFFFFFFFFFFIIIIIIIIIIIIIFIIIIIIIIIIIIIIIIIIIIIII

                  @HWI-ST1018:141:H0HVBADXX:2:1101:1423:1966 1:N:0:CAGATC AATGGGTAGGTAAATGGATGGCTGGGTGAATGGATGGGTGGGTGGATTGGC + BBBFFFFFFFFFFIIIIFFIIFIIIIFFFIIIIFFFIIBFFIBFFFIIII7

                  @HWI-ST1018:141:H0HVBADXX:2:1101:1596:1990 1:N:0:CAGATC GATATCTTTTGTTTGTAGATATCTTTTCTAAGGCCCACATTCAGTGCAGAC + BBBFFBFFFFBBBBF<bfiiiiiiiiiiiifiifffiifiiiff0b<f7ff< p="">

                  Now I realize... it seems that after TrimGalore, the reads are not sorted?? TrimGalore is supposed to sort the reads right?

                  Comment


                  • #10
                    Something does not appear to be right. Did you get two output files (R1 trimmed and R2 trimmed) from the two (R1/R2 original) that you used as input for the trimming? Your file 2 does not contain any reads with 2 as read identifier (see example below).

                    e.g. @HWI-ST1018:141:H0HVBADXX:1:1101:1235:1969 1:N:0:CAGATC in File 1
                    should have a corresponding
                    @HWI-ST1018:141:H0HVBADXX:1:1101:1235:1969 2:N:0:CAGATC, in File 2. (note the bold read numbers).

                    Your example has no reads that show that property, if you are posting data from R1/R2 files for the same sample.

                    This is likely the reason why you are seeing discordant alignments.

                    You can either re-trim the data using a paired-end data aware aligner (you could use bbduk.sh from BBMap suite) or use "repair.sh" from BBMap to re-pair the trimmed files. Unfortunately you would need to do this outside galaxy.
                    Last edited by GenoMax; 07-13-2017, 10:19 AM.

                    Comment


                    • #11
                      But in file 2, there are no reads with 2:N:0 tag, the number 2 is here (in bold):

                      @HWI-ST1018:141:H0HVBADXX:2:1101:1184:1971 1:N:0:CAGATC CTGATCAGAGGAGGAACATGACTAATCTATGGGCAGCCTACACTGAAGGC + BBBFFFFFFFFFFIIIIIIIIIIIIIFIIIIIIIIIIIIIIIIIIIIIII

                      I already used a paired-end data aware aligner, Trim Galore, and those examples are the beginning of the two output files. Is it possible that the reads from the original FASTQ files are in the same orientation instead of -fr?

                      Comment


                      • #12
                        The 2 that you highlight in your example above refers to the lane number where this sample ran on a flowcell (see this).

                        Do you actually have paired-end data? i.e. Paired-end data should have two files per sample that have names like Sample_R1.fq.gz and Sample_R2.fq.gz. Can you post the names of your original files?

                        The two examples you posted above seem to contain data from one sample (or two independent samples) that ran in lanes 1 and 2. They are not paired-end data files.
                        Last edited by GenoMax; 07-13-2017, 10:34 AM.

                        Comment


                        • #13
                          aha! ok thanks for the clarification!

                          an example of original fastq file names:

                          INPUT_1_130506_BH0HVBADXX_P382_108_index8_1.fastq
                          INPUT_2_130506_BH0HVBADXX_P382_108_index8_1.fastq

                          Comment


                          • #14
                            I don't think these are paired-end data. These appear to be single-end data files for two samples (INPUT_1 and INPUT_2) and should be aligned independently of each other.

                            Comment


                            • #15
                              Maybe that's the reason of the strange things.... I was confused by the numbers 1 and 2 in the sample name!! Thanks a lot for your help and time!!!

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X