Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • I'm trying to do the genome conversion with this line:

    bismark_genome_preparation --bowtie2 --verbose /gpfs_share/hlibyar/hlibyar/genome_files/ --path_to_bowtie /usr/local/apps/bowtie2/



    "...
    Step II - Genome bisulfite conversions - completed


    Bismark Genome Preparation - Step III: Launching the Bowtie 2 indexer
    Please be aware that this process can - depending on genome size - take several hours!

    Preparing indexing of CT converted genome in /gpfs_share/hlibyar/hlibyar/genome_files/Bisulfite_Genome/CT_conversion/
    Parent process: Starting to index C->T converted genome with the following command:

    /usr/local/apps/bowtie2/bowtie2-build -f genome_mfa.CT_conversion.fa BS_CT

    Can't exec "/usr/local/apps/bowtie2/bowtie2-build": No such file or directory at /usr/local/apps/bismark/v0.14.5/bismark_genome_preparation line 163, <IN> line 3352281.
    Preparing indexing of GA converted genome in /gpfs_share/hlibyar/hlibyar/genome_files/Bisulfite_Genome/GA_conversion/
    Child process: Starting to index G->A converted genome with the following command:

    /usr/local/apps/bowtie2/bowtie2-build -f genome_mfa.GA_conversion.fa BS_GA

    (starting in 10 seconds)
    Can't exec "/usr/local/apps/bowtie2/bowtie2-build": No such file or directory at /usr/local/apps/bismark/v0.14.5/bismark_genome_preparation line 178, <IN> line 3352281."

    Is there sth wrong with my bowtie path?
    Thanks.

    Comment


    • Hmm, when you type:

      Code:
      /usr/local/apps/bowtie2/bowtie2-build
      on the command line you need to see the bowtie2 indexing options. Does that that happen? If not you need to supply the exact path to where the executable is...

      Comment


      • Originally posted by fkrueger View Post
        Hmm, when you type:

        Code:
        /usr/local/apps/bowtie2/bowtie2-build
        on the command line you need to see the bowtie2 indexing options. Does that that happen? If not you need to supply the exact path to where the executable is...
        I see, let me try

        Comment


        • Hi Felix,
          I'm having a special need. I indexed each alignment in my BAM file as an extra column. Is it feasible to also print this information into the .txt file when I run bismark_methylation_extractor? Basically I want to know which alignment each cytosine is from, and group the cytosines from the same alignment together.

          Cheers,
          Youyou

          Comment


          • Originally posted by chxu02 View Post
            Hi Felix,
            I'm having a special need. I indexed each alignment in my BAM file as an extra column. Is it feasible to also print this information into the .txt file when I run bismark_methylation_extractor? Basically I want to know which alignment each cytosine is from, and group the cytosines from the same alignment together.

            Cheers,
            Youyou
            Hi Youyou. The methylation extractor output still has the read ID printed in each line, so at this stage it is still possible to tell which C came from which read. If you proceed to the bedGraph stage or beyond this information will be lost unfortunately.

            Comment


            • Hi Felix,

              I have four lanes of data for each biological sample. Should I add them together before trimming and Bismark? Or is it better to do the trimming and Bismark run on the individual lane of reads? Thanks.

              Regards,
              BBM

              Comment


              • Originally posted by bbm View Post
                Hi Felix,

                I have four lanes of data for each biological sample. Should I add them together before trimming and Bismark? Or is it better to do the trimming and Bismark run on the individual lane of reads? Thanks.

                Regards,
                BBM
                For me personally merging them before mapping is the preferred way because this enables you to do the deduplication a single sample and it is just more convenient overall. If time is of the essence you could also align them separately and merge them before the deduplication, but that is really a matter of taste (also if you have paired-end sequences and use samtools merge for merging BAM files you need to make sure to use samtools sort -n before trying to deduplicate because samtools sort does not guarantee to keep mates together).

                Comment


                • Hi Felix,

                  I have recently started using Bismark. While going through the detailed log, I could not get the same pourcentages of methylated cytosines (before and after extraction) as the one shown in the report. Here's my cytosine methylation report before extraction to illustrate:

                  Final Cytosine Methylation Report
                  =================================
                  Total number of C's analysed: 928829076

                  Total methylated C's in CpG context: 43871230
                  Total methylated C's in CHG context: 16835845
                  Total methylated C's in CHH context: 25330412
                  Total methylated C's in Unknown context: 58

                  Total unmethylated C's in CpG context: 106252080
                  Total unmethylated C's in CHG context: 136818631
                  Total unmethylated C's in CHH context: 599720878
                  Total unmethylated C's in Unknown context: 408

                  C methylated in CpG context: 29.2%
                  C methylated in CHG context: 11.0%
                  C methylated in CHH context: 4.1%
                  C methylated in Unknown context (CN or CHN): 12.4%

                  For me, Total methylated C's in CpG context = (43871230/928829076)*100 = 4.72% (in the report, I see 29.2%)

                  What am I missing??

                  Cheers,

                  Amira

                  Comment


                  • Hi Amira,

                    For C in CpG context you need to use only the Cs methylated and unmethylated in CpG context, and not all Cs found in total. So here it would be:

                    Total methylated C's in CpG context = (43871230/106252080)*100 = 29.2%.

                    Cheers, Felix

                    Comment


                    • Originally posted by fkrueger View Post
                      Hi Amira,

                      For C in CpG context you need to use only the Cs methylated and unmethylated in CpG context, and not all Cs found in total. So here it would be:

                      Total methylated C's in CpG context = (43871230/106252080)*100 = 29.2%.

                      Cheers, Felix
                      Thank you for the quick reply!

                      Okay, so it's relative to the context. In your illustration, I think you meant:

                      Total methylated C's in CpG context = (43871230/ (106252080+ 43871230) )*100 = 29.2%.

                      Thanks again,
                      Amira

                      Comment


                      • Thanks for spotting that, and yes you are absolutely right it needs to be methylated / (methylated + unmethylated) *100.

                        Comment


                        • Originally posted by fkrueger View Post
                          Hi Dipro,

                          This was indeed a typo which will be fixed in the next release which is actually due out today or tomorrow (and will finally support parallel alignments – so stay tuned!).

                          A couple of things about the command you used:

                          bismark_methylation_extractor -s -o --samtools_path --bedGraph --counts --remove_spaces --buffer_size --cytosine_report --genome_folder

                          'Failed to read from file /path/to/file_fq.gz_bismark_bt2.bismark.cov: No such file or directory'
                          Sorry if it is a stupid question, but did you change the ‘/path/to/file’ by a valid path of the file on your system?

                          -s: not necessary (will be determined automatically)
                          -o /requires/path/to/output/folder
                          --samtools_path /requires/path/to/samtools/executable
                          --counts: not necessary (used by default)
                          --remove_spaces: only use this if really necessary, will otherwise cost time and temporary space
                          --buffer_size: requires input, e.g. 10G
                          --genome_folder /requires/path/to/genome/folder

                          input file is required

                          If you still struggle can you just send me the onscreen-text via email? This would make spotting mistakes in the command much easier. Cheers, Felix
                          I also have a similar error message:
                          gzip: output_folder/input.bismark.cov.gz: No such file or directory
                          No last chromosome was defined, something must have gone wrong while reading the data in (e.g. specified wrong file path for a gzipped coverage file?). Please check your command!

                          However, I can see the input_folder.bismark.cov.gz file exist in my output_folder/.

                          The command I used as follows:
                          bismark_methylation_extractor -p --no_overlap --bedGraph --counts --buffer_size 10G --cytosine_report --CX --split_by_chromosome -o output_folder/ --genome_folder genome_bowtie1/ --multicore 6 input_folder/input_folder.sam

                          Interestingly, when I removed the -o output_folder/ (i.e. to the current directory), the script will finish properly. Does anyone have similar experience? The version of bismark is the latest one. Thanks for help.

                          Comment


                          • Hi hsiehph,

                            I believe this problem has been fixed by now in this issue. You can get the latest development version of Bismark by cloning it from Github.

                            Comment


                            • Originally posted by fkrueger View Post
                              Hi hsiehph,

                              I believe this problem has been fixed by now in this issue. You can get the latest development version of Bismark by cloning it from Github.
                              Thanks Felix. I will test it with the latest development version.

                              Comment


                              • Hi Felix,

                                I am working on analyzing some pair ended non-directional RRBS libraries with Bismark and have come across a few confusions. I begin by trimming the pair end files with Trim Galore (making use of the -rrbs -non-directional options and everything else default) and then aligning with bismark. What strikes me the most from the alignment report is that I get a strange ratio of mapped reads between OT, OB, CTOT, CTOB:

                                Final Alignment report
                                ======================
                                Sequence pairs analysed in total: 12664432
                                Number of paired-end alignments with a unique best hit: 4529206
                                Mapping efficiency: 35.8%
                                Sequence pairs with no alignments under any condition: 5506764
                                Sequence pairs did not map uniquely: 2628462
                                Sequence pairs which were discarded because genomic sequence could not be extracted: 0

                                Number of sequence pairs with unique best (first) alignment came from the bowtie output:
                                CT/GA/CT: 608018 ((converted) top strand)
                                GA/CT/CT: 1626505 (complementary to (converted) top strand)
                                GA/CT/GA: 1673560 (complementary to (converted) bottom strand)
                                CT/GA/GA: 621123 ((converted) bottom strand)


                                Final Cytosine Methylation Report
                                =================================
                                Total number of C's analysed: 117899629

                                Total methylated C's in CpG context: 7232502
                                Total methylated C's in CHG context: 362290
                                Total methylated C's in CHH context: 1218612
                                Total methylated C's in Unknown context: 2


                                Total unmethylated C's in CpG context: 13074777
                                Total unmethylated C's in CHG context: 29946985
                                Total unmethylated C's in CHH context: 66064463
                                Total unmethylated C's in Unknown context: 3


                                C methylated in CpG context: 35.6%
                                C methylated in CHG context: 1.2%
                                C methylated in CHH context: 1.8%
                                C methylated in unknown context (CN or CHN): 40.0%


                                Also, the way the library was given to me is that each sample is split into 2 pairs (sample1_pair1_forward, sample1_pair1_reverse, sample1_pair2_forward, sample1_pair2_reverse). If I wanted to run the methylation extractor on the samples, would it be a problem if I simply gave it the concatenated output of the alignment pairs?

                                i.e
                                sample1_pair1_aligned + sample1_pair2_aligned > sample1_aligned
                                methylation extractor sample1_aligned ...other samples

                                Thanks

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Today, 08:47 AM
                                0 responses
                                12 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                59 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                54 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X