Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #76
    Hi Zeam,

    you are right that Bismark doesn't dynamically trim reads at the moment, so you would have to run the sequence file through an appropriate trimmer prior to alignments.

    Best,
    Felix

    Comment


    • #77
      The bismark_to_SAM_v3.pl script appears to have a small bug where the negative strand alignments are off by 1 base. This can be rectified in the script by adding 1 to the "-" strand alignment start position.

      Comment


      • #78
        Hi Zee,

        Is this in the single-end or paired-end analysis?

        Cheers,
        Oliver

        Comment


        • #79
          Hi Oliver,

          We have picked it up in paired-end analysis. However we have seen the same bit of code in the single-end subroutine code so it will probably be the same thing because these are "GA" alignments.

          Z

          Comment


          • #80
            Hi Z,

            I made the changes. Could you test whether it works properly now?

            Thanks heaps for picking it up.

            Cheers,
            Oliver
            Attached Files

            Comment


            • #81
              Yep it works fine now. It would be good if you rather created a BAM file with a pipe to "samtools view -t <chr.sizes> -bS -" for space saving. Just a suggestion because I like working with BAM files that take less disk space.

              Comment


              • #82
                Hi Z,

                That's not a bad idea at all. I'll look into that.

                Thanks heaps.

                Cheers,
                Oliver

                Comment


                • #83
                  Hi Oliver,

                  Attached is a version we modified for our work with novoalign-novomethyl and bismark. We output an extra SAM tag ZB:Z:GA or ZB:Z:CT so that we're able to split these out later in our pipeline.
                  Attached Files

                  Comment


                  • #84
                    Hi Zee,

                    Would it be OK if I stuck your bismark2bam.pl script up on our website?

                    Felix

                    Comment


                    • #85
                      Sure thing it's open to anybody who wants to use it.

                      Comment


                      • #86
                        Hey guys,
                        Recently ,I was using bismark to process my methyC-seq data, but the efficiency of mapping is not so good.And I know the species I work on is transposon rich which is greter than 60%.In most papers,their mapping efficiency is greater than 70%.But for my methylome data,it's about 40% for single end reads, and 68% for paired ends reads.

                        Does anyone have encounter the similar mapping problem?Or someone can give me some suggestions about the mapping strategy.Half of my reads are single end.

                        Comment


                        • #87
                          Hi Zeam,

                          could you give us a few more details about your actual experiment? A paired-end mapping efficiency of 68% for BS-Seq data sounds quite good to me, but 40% for SE is indeed a bit low.

                          What was the read length you used for the single end files, what were the mapping parameters and what did the QC of the FastQ files look like?

                          Also, Bismark should produce the stats:

                          Sequences with no alignments under any condition: 123
                          Sequences did not map uniquely: 73591

                          which should give you a feel whether sequences just fail to align (too many errors, residual adapter sequence or the like) or if they get rejected because they align in too many places (this could be indicative of a high repetitive element content). Another possibility could be that your genome of interest contains something like properly sequenced but unplaced scaffolds which could share a high sequence similarity to other chromosomes. These might also result in a high number of sequences being rejected due to non-unique mapping.

                          If you like you could send me the Bismark mapping report and the FastQC report (the zipped file) to take a look, maybe it tells us more.

                          Best,
                          Felix

                          Comment


                          • #88
                            Report from bismark mapping

                            Hi Felix,
                            The two attachments are bismark mapping report for a PE and SE lane respectively,and the fastqC report will be emailed to you because of its file size.

                            The raw reads' length is 100bp base pair.And after trimmed by q 13,small propotion was less than 100 bp.

                            Thanks for replying!

                            Best wishes,
                            Zeam
                            Attached Files

                            Comment


                            • #89
                              Hi Zeam,

                              thanks for the attachments. By just briefly looking at the mapping report you seem to have 45 million alignments which got rejected because of ambiguous mappings. These mismappings do not only mean that the reads map somewhere else, but they map at least twice with the same number of lowest mismatches.

                              To me this looks like you are using a newly assembled plant genome that contains either a lot of smaller scaffolds that could not be placed into the main genome or something like an unmapped_chromosome. The only solution there is for such a problem is re-indexing your genome but removing very small scaffold first. We once had a similar problem with Chlamydomonas, and if I recall it correctly removing unplacable scaffolds worked like a charm.

                              A quick word about your mapping parameters. 100bp reads are really quite long for BS-Seq, using -n 2 -l 28 (which are the default settings) is tolerating quite a lot of errors. If I were you I would be much more stringent about the parameters, maybe even use something like -n 2 -l 70 or so, as sequencing errors can not only allow mismappings but will also lead to false methylation calls. Also with reads this long you are likely to read into the adapter on the other side, so you might want to use an adapter trimmer on the reads as well.

                              Please let me know if I can be of more help,

                              Best,
                              Felix

                              Comment


                              • #90
                                I would like to announce that Bismark v0.5.0 has been released today.


                                The 3 main modifications are:

                                - paired-end alignments should now be performed correctly irrespective of the sequence ID format in the FastQ file. This hopefully means that the new format which will be output by the Illumina Casava version 1.8 will no longer cause Bismark to stop.

                                - the alignment output will now also include extra column(s) for sequence basecall quality scores (both for single and paired-end data). This should facilitate filtering on qualities later on if desired.

                                - fixed a bug with paired-end alignments where alignments to the CTOT strand were accidentially assigned to the CTOB strand and vice versa.

                                All associated files can be obtained from:



                                I hope the modifications do not break too many downstream analysis scripts ... If you spot any flaws please let me know.

                                Best,
                                Felix

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                30 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                32 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                53 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X