Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • gavin.oliver
    Senior Member
    • Jan 2010
    • 110

    Using Bfast to align paired end Illumina reads

    Hi,

    I am attempting to align paired end Illumina reads to the human genome with Bfast.

    The problem is that most aligners expect paired end reads in 2 fastq files while Bfast expects 1 file. Having searched this and other forums, the most commonly suggested/accepted solution appears to be conversion of the 2 files to 1 file using the ill2fastq script supplied with Bfast.

    I have now converted the files and run the alignment however there appear to be problems with the results. I have aligned the same sequences with Stampy and BWA (separately) and I get completely different results. The output of samtools flagstat is shown below.

    Does anyone have any suggestions on how best to utilise Bfast under such circumstances?

    BWA
    200000 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    191942 + 0 mapped (95.97%:nan%)
    200000 + 0 paired in sequencing
    100000 + 0 read1
    100000 + 0 read2
    190714 + 0 properly paired (95.36%:nan%)
    191364 + 0 with itself and mate mapped
    578 + 0 singletons (0.29%:nan%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)

    Stampy
    200000 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    199376 + 0 mapped (99.69%:nan%)
    200000 + 0 paired in sequencing
    100000 + 0 read1
    100000 + 0 read2
    197774 + 0 properly paired (98.89%:nan%)
    198768 + 0 with itself and mate mapped
    608 + 0 singletons (0.30%:nan%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)

    BFAST
    125396 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    123238 + 0 mapped (98.28%:nan%)
    50792 + 0 paired in sequencing
    25396 + 0 read1
    25396 + 0 read2
    0 + 0 properly paired (0.00%:nan%)
    49516 + 0 with itself and mate mapped
    326 + 0 singletons (0.64%:nan%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)
  • nilshomer
    Nils Homer
    • Nov 2008
    • 1283

    #2
    If you have two files, then you need to create one file that is interleaved so that the second read of a pair is directly after the first. Then specify the pairing orientation on the command line.

    Comment

    • gavin.oliver
      Senior Member
      • Jan 2010
      • 110

      #3
      Thanks Nils - I'll write something to interleave the two files.

      Comment

      • gavin.oliver
        Senior Member
        • Jan 2010
        • 110

        #4
        Nils,

        I interleaved the two files and have run them using Bfast. However, whenever I reach the postprocess stage, I am not getting successful pairing of the reads regardless of what I specify on the command line with the -P/-S or -Y options.

        Here is a small selection of the reads:

        Code:
        @chr17:95861:F:237/1
        GTTGGCGAACACATCCATGTGCCGGGAGGATGGTGCACCCCAACTCCACAAGGACCCTTCCAGACCGCGGCCGCTCCAGCTCTCAAAGCC
        +
        @<@?DDDD>FF:DAF9FFFCAGF<F3AAFD>2ACEF?CFC@?;FB:?@?;D@>86';EE;AE376?########################
        GCCCCAGACTCCACAGGTTAAGGGCTCGCATCTCTTGAACAGGGATCTTGATTGCCCCGCGACCTACTGACAATCTGAATTCTGGGCGCT
        +
        +:+=+ADDFCFBF3C:BE+@@;1):C?###############################################################
        @chr17:42735:F:247/1
        TTAAAACTGGATATCACCCAGTGTTGGCAAGGTACAGGAAAATGGGAACTATCATATACCACAGGGGCTGGAAGAGCATAAACTGGTTTA
        +
        CCCFFFFFHHHHHJJJJJJJJIJJJJJJJJJIJGIJJJJJJJIJJJJIJJJJJJIJJJJJIJIJIJJJIJIIJJJJIIJJIJHGFHHFFF
        @chr17:42735:F:247/2
        TTAAGTGACTTCATTTTTAATTACTATATGGGATTCTATCTTTCCAGTGTATCATGATTTATTTGACCTATTGCTGAATGTTGGAGGTTT
        +
        @CCFFFDDHHHHGJJJJJHIJJJJIGJJJJCHJJJIHIJIJJJJJIIIIJIIIGIIJJJHHHHIIGEE@HC=?BE>;>CEEC>A=ACC;A
        @chr17:85755:R:-161/1
        AAAATAATAAACCAGTCATTAGAACCATAAACCTGTACTGTTTTTGACAATGTAATGCACTGCCCTGTAAAGCACTACAATAAAGACGTT
        +
        1:1+4??=DCFC>+A)++3+3CACG4??:*::?DCDEAGB1?DB?<?B<B@DEHFE=)8B8=77=@((6).=CC7=;@>D@3)7;>B###
        @chr17:85755:R:-161/2
        AATTACATGTCATATGAATGAATACTTGACGTCAGCAGGACTGCGTTTTGGTGGTGAACTTGGTTCTAGGTAGAAACAAAGAATGGAGAA
        +
        @@@DAD:DFDFFDF9A4CB<:A?:<FC)ACF6?<CFFF;GABAB93BDGCE<DB?*8=@=B@=FFGIF<1?@>?BB@;AC>;(5-;;>A>
        @chr17:76132:F:140/1
        AATGTAAAACACCATAAAAATTAATCTTAAGGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGA
        +
        CCCFFFFFHHHHHJJJJIJJIJJJJJJJJJJJJJJJJJJJJIJIJJJJIJJJJGIJJIJHIGJJJHGIJJJJJJJIHGIIFIHHHHHHFF
        @chr17:76132:F:140/2
        ACGCCATTCTCCTGCCTCGGCCCCCCAAGTAGCTGGGACTACAGGCGTCCACCACCACGCCCGGCTAATTTTTTGTATTTTTAGTAGAGA
        +
        @C@FFFFFFHHHGJJJJJJJJJJJJJJIJJJJEHIIJJJJJJJJJIJDGHGGIIGHIIIJJJJJJJJGIIJJJIHGHHHHFFFFFDEEEE
        @chr17:72239:F:208/1
        CAGACTGGCATGCAATGGTGCGATCTCGGCTCACTGCAACCTCTGCCTCCCAAGTTCAAGTGATTCTCCTCCCTCAGCCTCCCAGGTAGT
        +
        CCCFFFFFHGHHHJJJJJJIJJGIGEHGIIJJHIJJJJJJJJJJJGJJJJIJIJJIIJIJEGCGHIIIHHDHFHHHFFFFFFFDCEEECC
        @chr17:72239:F:208/2
        AGAAAAAGTAAGAGACACCTATAGATCAGAAGACACTTGGGGCTGGGCATGGTGGCTCACACCTGTAATCCCAGCACTTTGGGAGGCCAA
        +
        @CCFFDDFHHHHHJJJFHIJJGJIJJJJJEGGIIJJJJJGIIJJJJGEIIFHGHIFHIIJJJJJEIJIIJIIGHHFFEEHFFFFFFEFCC
        @chr17:124386:R:-215/1
        AAGGGAAAAGACCCAAAGGGTTGGAAGCAATATGTGAAAAAATACAGAATTTATATTGTCTAATTACAAAAAGCAACTTCTAGAACCTTT
        +
        CC@FFFFFHHHHGJJJJJJJJJJJJJJJJJJJJJJIJJJJJJJIJHJJIJHIJJJJJJJJJJJIIIJJIIGIJIJIJGJJJIHHHHHHFF
        @chr17:124386:R:-215/2
        AGCCAGGCATGGTGGTGCATGCGTGTAATCCCAGCTACTCGGGAGCTGAGGCAGGAGAATCGCTTGAACCCACGAGTCAGAGGTTGCGGT
        +
        CCCFFFFFGHHHHJJJJJJHIJJJJJJIJIIIJJJIJJJJGIJIJJJJJJIJJIJJJJJJIIJJJIJJJHHHHHHHHFFFFFFFDFEEED
        @chr17:43455:F:112/1
        TTTTTGAGACAGGGTCTCACCATATCACCCAGGATGGAGTGCAGTGGCACCATCATGGCTCACCACGGCCTCAACTTGCTGGGATCAAGC
        +
        CCCFFFFFHHHHHJJIJGIJJJIJGIJJJJJJJJJJIIIJJJJIIJJJJJIJFGHGIJJJJEIIGHHHGHFFFFFEEEEEEEDDDDDDDC
        @chr17:43455:F:112/2
        ACGCTACTGCACTCTGTTCTAGGCAACCCCTGTCTGGGAAAAAAAAAAAAATTAGTGAGGCTTAGTGGTGCACACCTGTAGTCTCAGCTA
        +
        CCCFFFFFHHHHHJJJJIEIJJJJHGIJJIIEHGIIJJIIJ@FHIJJJIJGHIJIHHBGHGI=FIEDEGGIJHEHG=>AE?EHCEFDFE;
        @chr17:89085:R:-274/1
        TTTAAAGCACTACCCAGAGTTATTCAAAGCCAGGCAGGAGAACTGCAGGCATCAGAATGCCCTGGGGACGGGTCCAAAATGCAGAATCCT
        +
        @@BFFFFDHDHHHFGIJIGJJGJIJIHIEGJJJ)?@FHHJJI;FDHIIHH=>DFEDECEEECDDECCD55<@B>CDCBDD@CDCCDACCC
        @chr17:89085:R:-274/2
        GCTTGGTAAGGAATATTAGGTGAACACAGGTGCCTATGTGGGTTCCACCGTTTTCCATAAATGTTAATTTTCGGCACTGAGGACGGACAG
        +
        CCCFFFFFHHHHGIJJJJJIIIIJIJHF>CCE@3;(5>;?B?3:,5,3?((+3@ADCD::8:::4>@CD#####################
        Is there anything obviously problematic to be seen?

        Comment

        • nilshomer
          Nils Homer
          • Nov 2008
          • 1283

          #5
          That doesn't look like a valid FASTQ file (where's the read name for the second read). Also, the read names should be the same for multi-end reads. Please read the manual on the topic of the proper input format in the manual accompanying BFAST. There should be an example paired end read FASTQ (etc.).

          Comment

          • gavin.oliver
            Senior Member
            • Jan 2010
            • 110

            #6
            Nils, RE the second read name - it must have been a copy/paste error as the file actually does contain it:

            Code:
            @chr17:95861:F:237/1
            GTTGGCGAACACATCCATGTGCCGGGAGGATGGTGCACCCCAACTCCACAAGGACCCTTCCAGACCGCGGCCGCTCCAGCTCTCAAAGCC
            +
            @<@?DDDD>FF:DAF9FFFCAGF<F3AAFD>2ACEF?CFC@?;FB:?@?;D@>86';EE;AE376?########################
            @chr17:95861:F:237/2
            GCCCCAGACTCCACAGGTTAAGGGCTCGCATCTCTTGAACAGGGATCTTGATTGCCCCGCGACCTACTGACAATCTGAATTCTGGGCGCT
            +
            +:+=+ADDFCFBF3C:BE+@@;1):C?###############################################################
            I'll have a look at the FASTQ format in the Bfast manual and reattempt. I hadn't scrutinised the format too deeply as the same sequences have already been successfully aligned with a few other programs.

            EDIT: I have now rerun and the alignments look good. The problem seems to have been the presence of the /1 and /2 suffixes in the read names.
            Last edited by gavin.oliver; 01-09-2012, 01:56 AM.

            Comment

            • nilshomer
              Nils Homer
              • Nov 2008
              • 1283

              #7
              Great, I am glad it is working for you!

              Comment

              • david.tamborero
                Member
                • Feb 2011
                • 60

                #8
                Hi,

                I have actually the same problem. I have each pair end of the reads in separate files, but when they look as the following:

                --- pair_1 file:

                Code:
                @HWUSI-EAS1692_0001:1:1:1050:4451#0/1
                CAGATTCACANTCCTGAATATCATGTTTTCTTTCCAAGGNATGACATAACGTCTTGGGATCATCCCTTGCTTTAATGAAAATCGTGGCAAATGAA
                +HWUSI-EAS1692_0001:1:1:1050:4451#0/1
                Ybaac][T^YB[ZZ[SKVZT`bcYbccaccaaa_cZZ[ZB[Z[T_c`cYcc\bcccc^T\a`TcccbL\ac\^a\Ybb`^bY]bb_BBBBB
                -- pair_2 file:

                Code:
                @HWUSI-EAS1692_0001:1:1:1050:4451#0/2
                CATGATAATGCACTCCATCTCATTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGCACTAAAAAGCGGACCTTGGTGTGAAAACATAACACACAC
                +HWUSI-EAS1692_0001:1:1:1050:4451#0/2
                M_M^ZM\YL]U^L\^VQJIU\a__\``c\cW_aaaaa_R[_\_`W][__BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
                I use the ill2fastq.pl script contained in the bfast package to merge them and everything works fine. However, when the files are like:

                -- pair_1 file:

                Code:
                @ILLUMINA-A16956_0001:4:1:1099:4197#0/1
                CTGTTTTCAAAATAATATNCTTCTTGATTCCTTAGCTCTGTGCCTCAGAC
                +
                CCACCCACCCCCCC@<BA#@;:<<<CA=CCA=BBB@@C@ACCCCBCCCBC
                -- pair_2 file:

                @ILLUMINA-A16956_0001:4:1:1099:4197#0/2
                NNTATTTTCTAAAGTGAGGCGGCANGNANATNNTNGTAGGTANCNNNGNN
                +
                ###########################################################
                which seems more 'fastq - like' (with Phred+33 quality and '+' in the third line), then I have to merge them by my own script? I mean, I can't not merge them by using some argument of the bfast commands used for alignement?

                Thank you very much for your feedback,

                best
                David

                Comment

                • nilshomer
                  Nils Homer
                  • Nov 2008
                  • 1283

                  #9
                  The read names should be identical; they are not.

                  Comment

                  • swbarnes2
                    Senior Member
                    • May 2008
                    • 910

                    #10
                    The de novo assembler velvet also comes with a little perl script to interleveave two separate fastqs into one.

                    Comment

                    • david.tamborero
                      Member
                      • Feb 2011
                      • 60

                      #11
                      Nils, sorry for that, but I would say that the reads name are the same. In the first example, they are 'illumina-like'. In the second example, they are 'fastq-like'.

                      So my guess is that in the first case they must be interleaved + converted, I did it by using the ill2fastq script contained in the bfast package, and everything went fine.

                      In the second example, they have to be interleaved (but not converted, since the quality is already +33).

                      So, sorry if I am missing something obvious, but my questions are the following:

                      - I noticed that the ill2fastq script also complements + reverts the sequence of the second pair. Is it necessary?

                      - is there any way to input the two fastq files to the bfast command(s), without having to interleave them before?

                      thanks,
                      david

                      Comment

                      • nilshomer
                        Nils Homer
                        • Nov 2008
                        • 1283

                        #12
                        No, BFAST requires the reads to be interleaved with the read names exactly the same. Supporting platform specific vendors input formats becomes difficult (look at all of these conversion scripts!), and so a generic format is better.

                        I haven't looked at ILMN data in a while, so I am not sure if the ill2fastq scripts needs updating. I would be immensely grateful for a patch.

                        Comment

                        • david.tamborero
                          Member
                          • Feb 2011
                          • 60

                          #13
                          Yes, you're right.
                          Last question: when interleaving, the second end must be reversed and complemented?

                          I am trying to figure out whether bfast requires all the pairs to be listed in order of 5' to 3' and on the same strand. I am using bfast+bwa-0.7.0a.

                          Comment

                          • nilshomer
                            Nils Homer
                            • Nov 2008
                            • 1283

                            #14
                            Prior versions required that, but now you can specify their orientation during the "postprocess" step.

                            Comment

                            • david.tamborero
                              Member
                              • Feb 2011
                              • 60

                              #15
                              ok, I will download the last version to simplify the interleave step.

                              Many thanks for your help, Nils.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              25 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              42 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              48 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...