Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by shadow19c View Post
    There is difference with the option --directional for mthylation extractor?
    The methylation extractor does not care how the files were analysed, and will therefore make files for all possible strands. An ' rm *CTO[BT]* ' will get rid of all empty complementary strands if you don't need them.

    Comment


    • Hello, thank you fkrueger for your answer.

      So to resume, so to analyse data from BS-seq , so I started with a fastqc analysis, anf after I did the mapping with bismark with default parameters for paire-end.
      The problem is the next step the deduplication (I did not see the command line for it) and the downstepanalysis (how to do the coverage : is it good to do the horizontal coverage or vertical? and If is it the vertical how you do that? Any method or script? )

      Thanks

      Comment


      • Originally posted by shadow19c View Post
        Hello, thank you fkrueger for your answer.

        So to resume, so to analyse data from BS-seq , so I started with a fastqc analysis, anf after I did the mapping with bismark with default parameters for paire-end.
        The problem is the next step the deduplication (I did not see the command line for it) and the downstepanalysis (how to do the coverage : is it good to do the horizontal coverage or vertical? and If is it the vertical how you do that? Any method or script? )

        Thanks
        If you want to deduplicate the alignment output you can download a deduplication script here, just type --help to see all options.

        As I said we personally use SeqMonk for downstream analysis. SeqMonk is a mapped read genome browser which has extensive capabilities to visualize, quantitate and export data; what we do for BS-Seq is mainly to first run a sliding window read coverage analysis to exlude regions which a too high read coverage (mainly caused by repetitive reads that are not part of the genome assembly) and then use the "Bisulfite methylation over Feature pipeline" to calculate percentage methylation values for different genomic features of interest (this pipeline allows you to filter on read coverage per position (vertical coverage) as well as events per feature (horizontal coverage)). If you are interested in using SeqMonk may I refer you to the Standard and Advanced course manuals which explain a great deal of its functionality.

        Comment


        • Hello,
          Thank for your answer so I discover Bismark methylation extractor,
          There is a difference if I do the deduplication before to do the methylation extractor?
          I want to understand more HOW I can analyze the file after?
          Last edited by shadow19c; 10-18-2012, 07:10 AM.

          Comment


          • Originally posted by shadow19c View Post
            Hello,
            Thank for your answer so I discover Bismark methylation extractor,
            There is a difference if I do the deduplication before to do the methylation extractor?
            I want to understand more HOW I can analyze the file after?
            The deduplication only works on the mapping output; thus, you can run the methylation extractor either on the raw mapping output (containing duplicates) or on the deduplicated output (obviously not containing duplicates).

            There are lots of ways of looking at and interpreting methylation data afterwards, and it very much depends on what you are confident/familiar with and what the biological questions are you would like to answer. I already mentioned that we mainly use SeqMonk for our data analysis, but there are numerous tools out there that are specifically designed to perform analyses of methylation data such as methylKit.

            Christoph Bock has very recently published a nice review in Nature Genetics on this topic which is probably a good starting point (Analysing and interpreting DNA methylation data).

            Comment


            • Thank you for your prompt answer.
              So I tried the deduplicate_bismark_alignment_output.pl but it takes a lot of memory and stop the server.

              ANd it writed a lot of 0 in the file, so I want to know if there is an error in the script?

              Thanks
              Last edited by shadow19c; 10-19-2012, 04:40 AM.

              Comment


              • I will tried to take a simple data to try the deduplicate script.

                Comment


                • Can you maybe email me some more details about your experiment to [email protected].

                  It would be useful to know how many reads you have in total, whether it is single-end or paired end etc, the Bismark output format, the parameters you used and maybe the exact error message.

                  Felix

                  Comment


                  • Ok I sent you the email.
                    Thanks

                    Comment


                    • Hi Felix,

                      genome_methylation_bismark2bedGraph_v4.pl breaks when the --split_by_chromosome argument is used and the chromosome name contains special characters. For example, I have aligned the test_data.fastq reads against a genome that contains the hg19 human reference genome as well as a contig for the unmethylated cl857 Sam7 Lambda genome that is often used as spike in controls in BS-seq experiments.

                      The contig name of the lambda phage in the fasta file is gi|215104|gb|J02459.1|LAMCG and genome_methylation_bismark2bedGraph_v4.pl doesn't seem to like this. Specifically, I think it's the '|' characters in the contig name that isn't properly being escaped; I've attached the output below.

                      As the '|' character is not uncommon in the naming of FASTA contigs is it possible to fix this in the genome_methylation_bismark2bedGraph_v4.pl script?

                      Thanks,
                      Pete

                      binfbig1 514 % genome_methylation_bismark2bedGraph_v4.pl --counts --s CpG_context_test_data.fastq_bismark.txt > bloop
                      Now generating individual files for each chromosome (sorting very large files might fail otherwise...)
                      Finished writing out individual chromosome files
                      Collecting temporary chromosome file information...
                      processing the following input file(s):
                      chrchr1.meth_extractor.temp
                      chrchr10.meth_extractor.temp
                      chrchr11.meth_extractor.temp
                      chrchr12.meth_extractor.temp
                      chrchr13.meth_extractor.temp
                      chrchr14.meth_extractor.temp
                      chrchr15.meth_extractor.temp
                      chrchr16.meth_extractor.temp
                      chrchr17.meth_extractor.temp
                      chrchr18.meth_extractor.temp
                      chrchr19.meth_extractor.temp
                      chrchr2.meth_extractor.temp
                      chrchr20.meth_extractor.temp
                      chrchr21.meth_extractor.temp
                      chrchr22.meth_extractor.temp
                      chrchr3.meth_extractor.temp
                      chrchr4.meth_extractor.temp
                      chrchr5.meth_extractor.temp
                      chrchr6.meth_extractor.temp
                      chrchr7.meth_extractor.temp
                      chrchr8.meth_extractor.temp
                      chrchr9.meth_extractor.temp
                      chrchrM.meth_extractor.temp
                      chrchrUn_gl000220.meth_extractor.temp
                      chrchrX.meth_extractor.temp
                      chrchrY.meth_extractor.temp
                      chrgi|215104|gb|J02459.1|LAMCG.meth_extractor.temp

                      Sorting input file chrchr1.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr1.meth_extractor.temp

                      Sorting input file chrchr10.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr10.meth_extractor.temp

                      Sorting input file chrchr11.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr11.meth_extractor.temp

                      Sorting input file chrchr12.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr12.meth_extractor.temp

                      Sorting input file chrchr13.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr13.meth_extractor.temp

                      Sorting input file chrchr14.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr14.meth_extractor.temp

                      Sorting input file chrchr15.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr15.meth_extractor.temp

                      Sorting input file chrchr16.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr16.meth_extractor.temp

                      Sorting input file chrchr17.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr17.meth_extractor.temp

                      Sorting input file chrchr18.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr18.meth_extractor.temp

                      Sorting input file chrchr19.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr19.meth_extractor.temp

                      Sorting input file chrchr2.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr2.meth_extractor.temp

                      Sorting input file chrchr20.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr20.meth_extractor.temp

                      Sorting input file chrchr21.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr21.meth_extractor.temp

                      Sorting input file chrchr22.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr22.meth_extractor.temp

                      Sorting input file chrchr3.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr3.meth_extractor.temp

                      Sorting input file chrchr4.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr4.meth_extractor.temp

                      Sorting input file chrchr5.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr5.meth_extractor.temp

                      Sorting input file chrchr6.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr6.meth_extractor.temp

                      Sorting input file chrchr7.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr7.meth_extractor.temp

                      Sorting input file chrchr8.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr8.meth_extractor.temp

                      Sorting input file chrchr9.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchr9.meth_extractor.temp

                      Sorting input file chrchrM.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchrM.meth_extractor.temp

                      Sorting input file chrchrUn_gl000220.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchrUn_gl000220.meth_extractor.temp

                      Sorting input file chrchrX.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchrX.meth_extractor.temp

                      Sorting input file chrchrY.meth_extractor.temp by positions
                      Successfully deleted the temporary input file chrchrY.meth_extractor.temp

                      Sorting input file chrgi|215104|gb|J02459.1|LAMCG.meth_extractor.temp by positions
                      sort: open failed: chrgi: No such file or directory
                      sh: LAMCG.meth_extractor.temp: command not found
                      sh: gb: command not found
                      sh: J02459.1: command not found
                      sh: 215104: command not found
                      Died at /usr/local/bioinf/bin/genome_methylation_bismark2bedGraph_v4.pl line 162.

                      Comment


                      • Hi Pete,

                        The '|' characters in the file name do indeed seem to redirect the output instead of merely creating a temporary file name. I have amended the script to replace pipe characters with underscores now (attached to this note), hope it works.
                        As a side note, the same problem is likely to also affect the latest version of the Bismark methylation extractor; I shall have this fixed and put up on the Bismark project page upon my return from annual leave.
                        Attached Files

                        Comment


                        • Originally posted by JCrooks
                          Bismark is an amazing tool for mapping and analyzing bisulfite-seq. I have used it by myself it is very helpful in this concern. I Will also suggest it for new users and researchers, because it could really help them a great deal.
                          Thanks for all the fish!

                          Comment


                          • Originally posted by fkrueger View Post
                            Hi Pete,

                            The '|' characters in the file name do indeed seem to redirect the output instead of merely creating a temporary file name. I have amended the script to replace pipe characters with underscores now (attached to this note), hope it works.
                            As a side note, the same problem is likely to also affect the latest version of the Bismark methylation extractor; I shall have this fixed and put up on the Bismark project page upon my return from annual leave.
                            No hurry, and thanks!

                            Comment


                            • Hello,
                              I want to know how can I have the description of each position of the genome:
                              I have used Bsseeker before and the output I had

                              1.Read ID (from the header columns in Solexa seq/fastq/qseq/fasta file, or a serial number of the original input)

                              2.Number of mismatches between the genomic seq and the BS read list in columns 6 and 7. The bisulfite converted sites between read Ts to genomic Cs are not included.

                              3.The strand which the read may be from (+FW, +RC, -RC, -FW)

                              4.The coordinate of the mapped position: the first 2 digits indicate the chromosome, the "+" or "-" indicate the mapped strand. The last 10 digits are the 0-based, 5'-end coordinate of the mapped genomic sequence on the Watson strand.

                              5.The genomic sequence of the mapped region plus +2 and -2 bps.

                              6.BS read sequences from 5' to 3': if the reads are uniquely mapped as they were FW reads, the original reads are shown. If the reads are uniquely mapped as they were RC reads, their reverse complements are shown.

                              7.Summarised sequence of methylated sites: the methylated CG/CHG/CHH sites are marked as X/Y/Z (upper case), whereas the unmethylated CG/CHG/CHH sites are marked as x/y/z (lower case). This column is summarised directly from Columns 6 and 7.

                              8.Index=1 if three consecutive methylation non-CG sites appear. Index =0, otherwise.

                              Is similar as Bismark with vanilla output.

                              I developped a script whick I can obtain the number of unmethylated or methylated reads in a curretn stranbd , and also the total number of reads in the current strand and methylation level.

                              To resume it's like a coverage files which give me the mean or median of coverga cytosine and uncoverage cytosine!!!!

                              Any idea?

                              Comment


                              • Originally posted by shadow19c View Post
                                Hello,
                                I want to know how can I have the description of each position of the genome:
                                I have used Bsseeker before and the output I had

                                1.Read ID (from the header columns in Solexa seq/fastq/qseq/fasta file, or a serial number of the original input)

                                2.Number of mismatches between the genomic seq and the BS read list in columns 6 and 7. The bisulfite converted sites between read Ts to genomic Cs are not included.

                                3.The strand which the read may be from (+FW, +RC, -RC, -FW)

                                4.The coordinate of the mapped position: the first 2 digits indicate the chromosome, the "+" or "-" indicate the mapped strand. The last 10 digits are the 0-based, 5'-end coordinate of the mapped genomic sequence on the Watson strand.

                                5.The genomic sequence of the mapped region plus +2 and -2 bps.

                                6.BS read sequences from 5' to 3': if the reads are uniquely mapped as they were FW reads, the original reads are shown. If the reads are uniquely mapped as they were RC reads, their reverse complements are shown.

                                7.Summarised sequence of methylated sites: the methylated CG/CHG/CHH sites are marked as X/Y/Z (upper case), whereas the unmethylated CG/CHG/CHH sites are marked as x/y/z (lower case). This column is summarised directly from Columns 6 and 7.

                                8.Index=1 if three consecutive methylation non-CG sites appear. Index =0, otherwise.

                                Is similar as Bismark with vanilla output.

                                I developped a script whick I can obtain the number of unmethylated or methylated reads in a curretn stranbd , and also the total number of reads in the current strand and methylation level.

                                To resume it's like a coverage files which give me the mean or median of coverga cytosine and uncoverage cytosine!!!!

                                Any idea?
                                Hi Mohamed,

                                I am afraid I don't have a script that converts the Bismark output to BS-Seeker output so you can use your pre-existing pipeline; however pretty much all the points mentioned above are contained within the Bismark, methylation extractor, the full cytosine context output or several of them.

                                Specifically, the full genome cytosine report seems to be what you are looking for. The output can either be for all CpG positions or optionally for all genomic cytosines (all contexts). The genome-wide cytosine methylation output file (optional) is tab-delimited in the following format:

                                <chromosome> <position> <strand> <count methylated> <count non-methylated> <C-context> <trinucleotide context>

                                Please read the methylation extractor documentation or type 'bismark_methylation_extractor --help'.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                30 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                32 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                53 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X