Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Once you finished the methylation extraction you can simply take the C-context output and simply concatenate it, e.g.

    cat CpG*.txt > CpG_context_merged.txt
    or
    zcat CpG*.txt.gz > CpG_context_merged.txt

    and use this merged output as input for bismark2bedGraph. The only thing that is somewhat inconvenient is that you don't get one nice mapping report / html file because the alignment process was split up into 3 processes...

    Comment


    • #17
      Makes sense. I think it's up and running now.

      Thank you for the assistance!

      Comment


      • #18
        Hi all,

        I have a little problem with the alignment of a paired end library with Bismark/Bowtie2. We have obtained paired-end bisulfite sequencing fastq files. Files were quality clipped on phredscore 26, and interlaced. When I align with the paired-end option of Bismark I get 0,02% aligned reads. Then I have done the alignment of the single-end files separately, the forward and reverse libraries separately. The forward library alignment efficiency is ~40%, but the reverse library alignment efficiency is ~0.05%. This is reproducible with all our libraries. Do I get something wrong? Has anybody observed the same behaviour?
        I would like to know if we need to change alignment parameters for reverse library?

        Many thanks for your ideas!

        Comment


        • #19
          Reverse reads, i.e. Read 2 of paired-end libraries, would align to the strands complementary to the original top or bottom strands (CTOT and CTOB), but these alignments are not carried out in the default mode (only OT and OB strands). If you want to align Reads 2 as single-end files you need to specify --pbat to get what you want. Cheers, Felix

          Comment


          • #20
            Hi Felix, Can I ask you a question? I plan to start a RRBS sequencing but I don't know if I should choose single-end or paired-end sequencing. Would you like to give me advice which one is better currently? Thank you in advance! Zhuofei

            Comment


            • #21
              I don't think there is a best answer to this question, because paired-end data might give you (some) information that single-end data can't provide (due to the read length, namely on the other side of the MspI fragment). On the other hand, RRBS fragments are often quite short so you are likely to get a lot of overlapping reads yielding redundant information which needs to be excluded later. We have tried to summarise the main facts about RRBS in this RRBS User Guide. From a processing point of view I would prefer single-end data (its just much faster and less to worry about), but PE has its benefits... In a nutshell:

              Pros SE:
              - cheaper
              - trimming/mapping more straight forward and faster

              Pros PE:
              - possibly slightly higher mapping efficiency (not very much these days because reads are already fairly long)
              - might yield data (at the end of MspI fragments) that SE doesn't reach
              Cons SE:
              - mapping somewhat slower
              - overlapping reads (thus 2 reads != 2x data, this is taken care of in the methylation extractor)

              Comment


              • #22
                Thanks a lot for your quick and useful reply, Felix!

                Comment


                • #23
                  A quick question!

                  where do I put '--score-min' option in the command?

                  I tried below but got an error 'unknown option'...

                  Code:
                  bismark -u 1000 --score-min L,0,-0.6 --non_directional --bowtie2 -p 8 ../mm10/ -1 MEF_R1_val_1.fq -2 MEF_R2_val_2.fq
                  Code:
                  Unknown option: score-min
                  Please respecify command line options
                  Thank you!

                  Comment


                  • #24
                    The option is called --score_min (with an underscore _). Apart from that you can put it anywhere in the command line.

                    Comment


                    • #25
                      Hi Felix,

                      I'm mapping some 50bp PE BS-seq reads to hg38 using Bismark. The reads had some adapter contamination at 3' end, suggesting readthrough for some reads. So they were trimmed before mapping using strigent conditions:

                      java -jar trimmomatic-0.32.jar PE ... CROP:42 HEADCROP:2 ILLUMINACLIP:adapters/TruSeq3-PE-2.fa:0:40:1 LEADING:30 TRAILING:30 SLIDINGWINDOW:4:15 MINLEN:16

                      The trimming was successful according to fastQC. The adapter content was almost zero, and C and G content was close to zero throughout the whole read for read 1 and 2, respectively.

                      I tried both Bowtie1 and Bowtie2 using -n 1 option. Both gave mapping efficiency as low as 8%. The mapping efficiency was 75% and 37% respectively when I mapped read 1 reads using SE mode or read 2 reads using -pbat. This was not because of the trimming because pre-trimming reads also gave a 8% mapping efficiency when run in PE mode.

                      Any suggestion?
                      Last edited by chxu02; 02-15-2015, 02:21 PM.

                      Comment


                      • #26
                        Hmm, is there a chance that the two reads got out of sync during the trimming procedure? If you align just a small fraction of the untrimmed reads (e.g. -u 100000 would only try to align the first 100 reads), do you see a similarly low mapping efficiency?

                        Right sorry didn't read the last line properly... Do read 2 alignments also sport a rather high mapping percentage? Maybe there is something wrong with read 2 (technically) that would cause this? If you could zip up a few reads (say 100-500K) and send them to me by mail I could take a look myself. Cheers, Felix
                        Last edited by fkrueger; 02-15-2015, 12:28 PM.

                        Comment


                        • #27
                          Originally posted by fkrueger View Post
                          Hmm, is there a chance that the two reads got out of sync during the trimming procedure? If you align just a small fraction of the untrimmed reads (e.g. -u 100000 would only try to align the first 100 reads), do you see a similarly low mapping efficiency?

                          Right sorry didn't read the last line properly... Do read 2 alignments also sport a rather high mapping percentage? Maybe there is something wrong with read 2 (technically) that would cause this? If you could zip up a few reads (say 100-500K) and send them to me by mail I could take a look myself. Cheers, Felix
                          Hi Felix, I updated the mapping result of read 2 reads in my original post. And please see email for some of my reads.

                          Comment


                          • #28
                            Hi Youyou,

                            I have tried a number of different things with the 250K sequences you have sent and I am sorry but I can't exactly say what is going on. The things I have seen included:

                            - a smallish fraction of the reads still contain tags 1:Y: and 2:Y: ... these reads did not pass the Illumina quality filter and should be removed prior to aligning
                            - even though both R1 and R2 look fine (quality wise) with FastQC, R1 and R2 vary a lot in their mapping efficiency
                            - since there is a big discrepancy between R1 and R2 I don't think the problem has to do with the paired-end nature of the files
                            - on top of the problem with mapping efficiency the methylation values obtained from R1 (~51% in CpG context) and R2 (~17-26% in CpG context) vary wildly as well
                            - trimming 10bp off the 5' end of R2 increased its mapping efficiency to ~50% (with --bowtie2). Also, allowing more mismatches by means of relaxing the score_min function to L,0,-0.6 also increased the mapping efficiency noticably
                            - trimming 10bp off the 3' end of R2 also increased the mapping efficiency but not to the same extent as the 5' end trimming (by the way I used Trim Galore for the trimming)
                            - I have tried to align R2 to several usual suspect contaminations (mouse, PhiX, E. coli, Arabidopsis, Lambda, M13) but couldn't find any significant results.

                            So far, it looks like R2 is acting up but for a not very obvious reason. Have there been any problems reported for the run in the sequencing facility? As it stands I would probably lean towards only aligning R1 in Single-End mode because R2 clearly reduces the mapping efficiency to next-to-nothing.

                            Sorry if this wasn't very comprehensive but it is getting late now... Cheers, Felix

                            Comment


                            • #29
                              Hi Felix,
                              Thanks for your troubleshooting. Anyway, I think the extraordinary small insert size is the main reason for the failure in mapping. I'm preparing a library with longer insert size now. One quick question. How to filter those Y-tagged reads out of the pairs of reads without disrupting the synchrony? Is this supposed to be done by the sequencing facility usually? By no means these reads are going to be used.

                              Comment


                              • #30
                                I only did a grep for 1:Y: and 2:Y:, and the two files seemed to have the same number of Ys, so presumably the entire read pair is tagged. And yes these reads are often filtered by the sequence facility, else if I were you I would write a quick script to remove them from both files. Good luck!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                7 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                7 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                66 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X