Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • I didn't because the manual says --gzip is for zipping the temp files, not for unzipping the input fastq.gz files. And without --gzip, the running was successful until the end.

    Comment


    • Originally posted by chxu02 View Post
      Hi Felix,

      I'm reporting a bug from v0.14.0. When I used fastq in gz format to run bismark --multicore, in the end bismark failed to assemble all separate files into one. The files were named in *.fastq.gz_* initially, but in the end of running, bismark unambiguously tried to assemble files with name *.fastq_*. Obviously it failed. Hope it helps.

      Youyou
      Hmm, in case you didn't use --gzip I don't think I quite understand the error you are reporting then. Both running files ending in .fastq or .fastq.gz works fine for me here. Would you mind sending me the entire error message you are seeing as email?

      Attached is the latest development version of Bismark which should also understand the option --gzip.
      Attached Files

      Comment


      • Sorry Felix, I'm bad. I checked my running history yesterday and found I used --gzip. But why did it happen though, if the purpose of --gzip is just to zip temp conversion files?

        Comment


        • Ah good that explains it. As a said a few posts before --gzip was a corner case that wasn't handled properly, so it was not intended that the merging went wrong... If you use the development version I attached in the last post --gzip should be working now.

          Comment


          • Hello Felix,

            Thank you for your response. I have installed samtools. I found another problem. From Biostar, I found it that I should generate 2 different fastq files for paired end reads. So, I use fastq-dump --split files to extract from my SRA. I got 2 files of fastq and seems I have no problem so far (before, I only dump it to 1 files and bismark found duplicate ID error). The only problem is, I only got 1 files of BAM from Bismark. The file name is the same as the first fastq file with bam extension so I assume it should have the bam file with the second file name,But it's only one, for the first fastq. Is it normal or wrong? I use bismark <options> -1 first_1.fq -2 second_2.fq. The result is only first_1.bam.

            This is my Final ALignment report :

            Sequence pairs analysed in total: 29829521
            Number of paired-end alignments with a unique best hit: 4156425
            Mapping efficiency: 13.9%
            Sequence pairs with no alignments under any condition: 23649277
            Sequence pairs did not map uniquely: 2023819
            Sequence pairs which were discarded because genomic sequence could not be extracted: 0

            Mapping efficiency is really low. What do you think it caused?

            For Bismark example data, I got this result:
            Final Alignment report
            ======================
            Sequences analysed in total: 10000
            Number of alignments with a unique best hit from the different alignments: 4732
            Mapping efficiency: 47.3%
            Sequences with no alignments under any condition: 4279
            Sequences did not map uniquely: 989
            Sequences which were discarded because genomic sequence could not be extracted: 0

            So I think my human genome reference is not bad.
            Last edited by barbarian; 03-16-2015, 05:44 PM.

            Comment


            • I just replied to you on Biostars, but producing 1 BAM file from paired-end reads is the appropriate result. The reads from each file are indicated appropriately in the BAM format.

              The low mapping efficiency is a different question then. There are a number of likely causes of that, the most common being fastq files that are out of sync. Try mapping fastq_1.fq by itself and see if the mapping efficiency jumps up.

              Comment


              • If mapping fastq_1.fq to itself, is there any biological meaning behind that? Will the result still represent the actual methylation condition? Thank you.

                Comment


                • "by itself", not "to itself", big difference. This is purely to diagnose the cause of the low mapping efficiency.

                  Comment


                  • oh, do you mean only use the first file, not together with the second file?

                    Comment


                    • That's correct. You essentially act as though you have a single-end dataset. If the mapping efficiency jumps to a more reasonable level when doing that, then either the fastq files are out of sync or there's something weird with fastq_2.fq.

                      Comment


                      • Ok. I will try now. Maybe will have another question tomorrow after the result is out

                        Comment


                        • Thanks Devon for jumping in. Here is a protocol that is worth reading in order to achieve good mapping results in most cases: http://www.epigenesys.eu/en/protcols...q-data-prot-57

                          Comment


                          • Ok, it's strange. I tried with another sample data. The result for mapping efficiency of both files is 0.1% and if it is only one file it's 13.5%. Before this step, what I do is using
                            fastq-dump --split-files <sra file>
                            trim_galore --rrbs <fastq1>
                            trim_galore --rrbs <fastq2>
                            For both files:
                            bismark --bowtie2 <ref> -1 <fastq1> -2 <fastq2>
                            For 1 file:
                            bismark --bowtie2 <ref> <fastq1>

                            For reference, I'm sure that I already build with bowtie2 and I have checked it with Bismark data samples and the result is similar with the document. I'm trying to do with the next sample to see if it's the sample fault or my command fault. Any suggestion? By the way, I download the sample from NCBI data. Here is the link : http://www.ncbi.nlm.nih.gov/geo/quer...i?acc=GSE61150
                            The sample that I checked is the first sample. Here : http://www.ncbi.nlm.nih.gov/geo/quer...acc=GSM1498453

                            Thank you for your help.

                            Additional:
                            Tried to check it again using Fastqc after trimming, the result for both Fastq file is 50-50, not all good. The bad result is in per tile sequence quality, per base sequence content, sequence duplication levels, Kmre constant
                            Last edited by barbarian; 03-17-2015, 06:06 PM.

                            Comment


                            • For paired-end files you need to run Trim Galore in paired-end mode like this:

                              trim_galore --rrbs --paired <fastq1> <fastq2>

                              If you run it in twice in single-end mode it will break the sequence-by-sequence order of the files which then results in very low mapping efficiency.
                              I am in a meeting all day but can take a look myself at the file in question tonight or tomorrow.

                              Comment


                              • Thank you for your reply. I've just realized it this afternoon. Now I'm waiting for the result. Maybe tomorrow I will have another question because usually it will not finish today. Good luck with your meeting.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM
                                • seqadmin
                                  The Impact of AI in Genomic Medicine
                                  by seqadmin



                                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                  02-26-2024, 02:07 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-14-2024, 06:13 AM
                                0 responses
                                32 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-08-2024, 08:03 AM
                                0 responses
                                72 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-07-2024, 08:13 AM
                                0 responses
                                80 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-06-2024, 09:51 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X