SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Reads do not align with reference genome chavesbio Ion Torrent 3 06-30-2015 10:31 AM
Why reads in unmapped.bam still align to reference genome? SpreeFu Bioinformatics 7 09-28-2014 10:14 PM
How to align reads to other reads (not to reference genome) username111 Bioinformatics 2 08-18-2014 08:49 AM
why single end reads align to the reference genome reversely kcm.eid RNA Sequencing 3 04-24-2014 06:33 AM
how to align the contigs to the reference genome jjjscuedu Bioinformatics 1 06-05-2012 09:39 AM

Reply
 
Thread Tools
Old 04-04-2016, 10:01 AM   #1
fh331
Member
 
Location: UK

Join Date: Apr 2016
Posts: 19
Default All ChIP Seq Reads failing to Align to reference genome

Hello NGS community,
I'm new to NGS analysis. I have some chipseq data for a transcription factor. The sequencing facility provided data as CRAM files. I did conversion to BAM and tried to do the downstream analysis but for every conversion I was getting problems. So i decided to go from CRAM to FastQ, do the alignment and then do the analysis. I have tried the following command line, however, all my reads fail to align. Is it because the parameters are too stringent or there's something I am missing?

fazal@fazal-Precision-T1700:/media/fazal/backup/BCL11A/FastQ_Files$ bowtie -m 1 -S --chunkmbs 10000 /media/fazal/backup/BCL11A/Bowtie_Indices/human_g1k_v37.fasta -1 18418_2#1_1.fastq -2 18418_2#1_2.fastq > /media/fazal/backup/BCL11A/Sam_from_FastQ/18418_2#1.sam
# reads processed: 40095362
# reads with at least one reported alignment: 736 (0.00%)
# reads that failed to align: 40094535 (100.00%)
# reads with alignments suppressed due to -m: 91 (0.00%)
Reported 736 paired-end alignments to 1 output stream(s)


Any help will be highly appreciated.

Thank you very much!
Fazal
fh331 is offline   Reply With Quote
Old 04-04-2016, 11:24 AM   #2
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 198
Default

Could be a number of issues. Did you run your data through FastQC or another QC program? Did you remove adapters/quality trim?

PS: You probably shouldn't have #s in your file names...
fanli is offline   Reply With Quote
Old 04-05-2016, 01:46 AM   #3
fh331
Member
 
Location: UK

Join Date: Apr 2016
Posts: 19
Default

Hi Fanli,
Thanks for the reply.
I did run FastQC. It doesn't look like there's adapter content. The only thing that FastQC marks is per sequence GC content and kmer content. kmer content is marked with cross (x) whereas per sequence gc content is marked (!).
fh331 is offline   Reply With Quote
Old 04-05-2016, 07:42 AM   #4
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 198
Default

Hmm, that's a bit odd. Maybe you can try using BLAT or BLAST with a random subsample of your reads to see if they hit anything?

Can you post your FastQC output here?
fanli is offline   Reply With Quote
Old 04-05-2016, 09:02 AM   #5
fh331
Member
 
Location: UK

Join Date: Apr 2016
Posts: 19
Default

Hi Fanli,
I have attached the FastQC output for one of the paired-end files. I blated the first sequences from the paired-end files(1/1, 1/2): and they hit different chromosomes. I don't know if that's normal.
Attached Files
File Type: pdf 18418_21_1_fastqc.pdf (1.04 MB, 10 views)
File Type: pdf 18418_21_2_fastqc.pdf (1.05 MB, 3 views)

Last edited by fh331; 04-05-2016 at 09:05 AM.
fh331 is offline   Reply With Quote
Old 04-05-2016, 10:01 AM   #6
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

Try with --trim3 50 in case the 3' ends of your reads are of low quality or contains N:s. Bowtie1 seems like an odd choice for PE75 reads but it should give you some alignments...
Chipper is offline   Reply With Quote
Old 04-05-2016, 01:11 PM   #7
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 198
Default

Quote:
Originally Posted by fh331 View Post
Hi Fanli,
I blated the first sequences from the paired-end files(1/1, 1/2): and they hit different chromosomes. I don't know if that's normal.
But they hit the human reference?

Also, your quality scores look...remarkably even. Can anyone else chime in? Is that an encoding error or have you guys seen sequencing like that before?
fanli is offline   Reply With Quote
Old 04-05-2016, 01:31 PM   #8
fh331
Member
 
Location: UK

Join Date: Apr 2016
Posts: 19
Default

Hi Fanli,
They sequences do hit human reference genome. GRCh37 which is what i am trying to align to. Don't know what's going on
fh331 is offline   Reply With Quote
Old 04-05-2016, 01:39 PM   #9
fh331
Member
 
Location: UK

Join Date: Apr 2016
Posts: 19
Post

Quote:
Originally Posted by Chipper View Post
Try with --trim3 50 in case the 3' ends of your reads are of low quality or contains N:s. Bowtie1 seems like an odd choice for PE75 reads but it should give you some alignments...
Hi Chippper,
thanks for the reply. would you recommend using bowtie2 instead of bowtie? In that case, if i am not wrong, i would need to index the reference genome with bowtie2 right?
fh331 is offline   Reply With Quote
Old 04-06-2016, 12:07 AM   #10
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

I don't remember exactly how many mismatches bowtie1 tolerates, it should work but you may have to change some settings if you have lots of mismatches at the end, hence my suggestion to try mapping with only part of the sequence (-3 50 gives the first 25 bases, -3 25 -5 25 the middle part etc).

Is this standard ChIP seq samples or could it be some kind of inline barcodes that makes it unmappable? Maybe you could post a few reads.
Chipper is offline   Reply With Quote
Old 04-06-2016, 01:15 AM   #11
SylvainL
Senior Member
 
Location: Geneva

Join Date: Feb 2012
Posts: 175
Default

Are you sure the reads in file 1 and file 2 are in the same order? Just print the first 5 reads of each file...

You can also try to map only one file (not considered as paired-end then) to see the percentage of mapped reads...
SylvainL is offline   Reply With Quote
Old 04-06-2016, 01:47 AM   #12
fh331
Member
 
Location: UK

Join Date: Apr 2016
Posts: 19
Default

I had got CRAM files from the sequencing facility and i thought that CRAM contains aligned data so i assume they had removed the barcodes. I wanted to go back and converted the cram to fastq cause i had some trouble analysing the data that way. Below are first 10 reads from the two PE files:

file 1/1:

@HS32_18418:2:2307:11553:47098#1/1
CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
+
BBBBBBF/FFFFFFFFFFFFFFFFFFFFFFFFFFFBBFFFB</<FFFF//FFFFFFFBBF<FFFFFFBF/B<FF/
@HS32_18418:2:1201:20716:93279#1/1
AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAAC
+
/<FFF</<FBF<</FFFBF<FFFFBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB
@HS32_18418:2:1102:8324:84406#1/1
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAACCCTA
+
BBBB</<<FFFFFBBF<<BFFBB/F/<<<<F/FFFFFBBB<F//B/F/<FBFFB//</BFFBB/</////</7/<
@HS32_18418:2:2304:9612:31489#1/1
AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC
+
/<FFF/<<FFF<//FFFB<<FFFFB/FFFBB<FFFFF<FFFFBFFFFFFFFFFFFFFFFFFFFFFFFBFBBBBBB
@HS32_18418:2:2109:18196:73431#1/1
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBB/FFF<F<BFFFF/BFF/B
@HS32_18418:2:2304:19412:56725#1/1
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAGCCCTAGCCCTAGCCCTAGCCCTA
+
FFFFFFFFFFFFFF/<FFFBF<FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB
@HS32_18418:2:1109:17555:23909#1/1
CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBFFF/F/FBB/
@HS32_18418:2:1312:9262:20826#1/1
TAAAACCCTAACCCTAAAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCC
+
</</<FFF/FFFFBF</BFFFFB<<FFFB<FFFFFFFFFFF</FFFFFFFFB<<FFFFF<FFFFFFFFFFBBBBB
@HS32_18418:2:1312:15929:23212#1/1
AACCCTAACCCTAACCCTAACCCTAACCCTTACCCTAACCCTAACCCAACCCTAACCCTAACCCTCACCCTCACC
+
</FFFFFBFFFFB<FFFFBBFFFBF<FFBFB/FFFFFBFFFFF/FFF<FFFFFFFFFFFFFFFFFFFFFFBBBBB
@HS32_18418:2:2309:4779:65172#1/1
CCTAACCCTCACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTCTAACCCTAACC
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFF<F<//F/F/FFF<//<BF<F///FB


file 1/2:

@HS32_18418:2:2307:11553:47098#1/2
TAACCCTACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTTACCCTAACCCTAACCCAAC
+
</FBB<///F<BF/B/F<FB<<FB<B<FFF/BFFBFBBB<FFB<FFFFBFFFFFF<FFFF/FFBFFF<FFBBBBB
@HS32_18418:2:1201:20716:93279#1/2
ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFF//FFB
@HS32_18418:2:1102:8324:84406#1/2
ACCCTAATCCTATCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCC
+
//FB<///<<///FB//<<</<//F<FF/F<<//<</B/<FF</<FFFFFFFF</F</F</<F<BFF/FFB<<</
@HS32_18418:2:2304:9612:31489#1/2
ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAACCTAACCC
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFBF<FFF///<<//<//<
@HS32_18418:2:2109:18196:73431#1/2
CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCAACCCTAACCCTAACCCTAACCCTAA
+
F<B<<FF</</FFFF</FFFF<<FBFFB<FBB/B<FFFB<FFFFFBFFFFFFFFFFFFFFFFFFFFFFFFBBBBB
@HS32_18418:2:2304:19412:56725#1/2
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FF//</FF/<FFFFF/FFFFFFF
@HS32_18418:2:1109:17555:23909#1/2
TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTTACCCTAACCCTACCCTAACCCTAACCCTAACCCTAAC
+
/</FFFBF/FFBB<<FFFFF<FFFFF/FFFBF/FFF<</FFBFF<FFB<FFFFFFFFFFFFFFFFFFFFFBBBBB
@HS32_18418:2:1312:9262:20826#1/2
CCTAACCCTAACCCTAACCCTAACCCTAACCCTTAACCCTCACCCTCACCCTCCCCCTCACCCTAACCCTAACCC
+
BBBBBFBFFFFFFFFFFFFFFFFFFFFFBBF/<<F/</</BBF/<<BFFBF/</<FBFFFBFFFF<FFBF//BFF
@HS32_18418:2:1312:15929:23212#1/2
CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFF/FFFFFFF<FFFBFFF<F/
@HS32_18418:2:2309:4779:65172#1/2
CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAAACCCTAAACC
+
FFB<</BFF<//FFF/</FFFFB<FFBFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB


it looks dodgy to me as all the reads seem to be repetitive sequences
fh331 is offline   Reply With Quote
Old 04-06-2016, 01:52 AM   #13
fh331
Member
 
Location: UK

Join Date: Apr 2016
Posts: 19
Default

Quote:
Originally Posted by Chipper View Post
I don't remember exactly how many mismatches bowtie1 tolerates, it should work but you may have to change some settings if you have lots of mismatches at the end, hence my suggestion to try mapping with only part of the sequence (-3 50 gives the first 25 bases, -3 25 -5 25 the middle part etc).

Is this standard ChIP seq samples or could it be some kind of inline barcodes that makes it unmappable? Maybe you could post a few reads.
I had got CRAM files from the sequencing facility and i thought that CRAM contains aligned data so i assume they had removed the barcodes. I wanted to go back and converted the cram to fastq cause i had some trouble analysing the data that way. Below are first 10 reads from the two PE files:

file 1/1:

@HS32_18418:2:2307:11553:47098#1/1
CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
+
BBBBBBF/FFFFFFFFFFFFFFFFFFFFFFFFFFFBBFFFB</<FFFF//FFFFFFFBBF<FFFFFFBF/B<FF/
@HS32_18418:2:1201:20716:93279#1/1
AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAAC
+
/<FFF</<FBF<</FFFBF<FFFFBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB
@HS32_18418:2:1102:8324:84406#1/1
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAACCCTA
+
BBBB</<<FFFFFBBF<<BFFBB/F/<<<<F/FFFFFBBB<F//B/F/<FBFFB//</BFFBB/</////</7/<
@HS32_18418:2:2304:9612:31489#1/1
AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC
+
/<FFF/<<FFF<//FFFB<<FFFFB/FFFBB<FFFFF<FFFFBFFFFFFFFFFFFFFFFFFFFFFFFBFBBBBBB
@HS32_18418:2:2109:18196:73431#1/1
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBB/FFF<F<BFFFF/BFF/B
@HS32_18418:2:2304:19412:56725#1/1
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAGCCCTAGCCCTAGCCCTAGCCCTA
+
FFFFFFFFFFFFFF/<FFFBF<FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB
@HS32_18418:2:1109:17555:23909#1/1
CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBFFF/F/FBB/
@HS32_18418:2:1312:9262:20826#1/1
TAAAACCCTAACCCTAAAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCC
+
</</<FFF/FFFFBF</BFFFFB<<FFFB<FFFFFFFFFFF</FFFFFFFFB<<FFFFF<FFFFFFFFFFBBBBB
@HS32_18418:2:1312:15929:23212#1/1
AACCCTAACCCTAACCCTAACCCTAACCCTTACCCTAACCCTAACCCAACCCTAACCCTAACCCTCACCCTCACC
+
</FFFFFBFFFFB<FFFFBBFFFBF<FFBFB/FFFFFBFFFFF/FFF<FFFFFFFFFFFFFFFFFFFFFFBBBBB
@HS32_18418:2:2309:4779:65172#1/1
CCTAACCCTCACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTCTAACCCTAACC
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFF<F<//F/F/FFF<//<BF<F///FB


file 1/2:

@HS32_18418:2:2307:11553:47098#1/2
TAACCCTACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTTACCCTAACCCTAACCCAAC
+
</FBB<///F<BF/B/F<FB<<FB<B<FFF/BFFBFBBB<FFB<FFFFBFFFFFF<FFFF/FFBFFF<FFBBBBB
@HS32_18418:2:1201:20716:93279#1/2
ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFF//FFB
@HS32_18418:2:1102:8324:84406#1/2
ACCCTAATCCTATCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCC
+
//FB<///<<///FB//<<</<//F<FF/F<<//<</B/<FF</<FFFFFFFF</F</F</<F<BFF/FFB<<</
@HS32_18418:2:2304:9612:31489#1/2
ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAACCTAACCC
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFBF<FFF///<<//<//<
@HS32_18418:2:2109:18196:73431#1/2
CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCAACCCTAACCCTAACCCTAACCCTAA
+
F<B<<FF</</FFFF</FFFF<<FBFFB<FBB/B<FFFB<FFFFFBFFFFFFFFFFFFFFFFFFFFFFFFBBBBB
@HS32_18418:2:2304:19412:56725#1/2
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FF//</FF/<FFFFF/FFFFFFF
@HS32_18418:2:1109:17555:23909#1/2
TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTTACCCTAACCCTACCCTAACCCTAACCCTAACCCTAAC
+
/</FFFBF/FFBB<<FFFFF<FFFFF/FFFBF/FFF<</FFBFF<FFB<FFFFFFFFFFFFFFFFFFFFFBBBBB
@HS32_18418:2:1312:9262:20826#1/2
CCTAACCCTAACCCTAACCCTAACCCTAACCCTTAACCCTCACCCTCACCCTCCCCCTCACCCTAACCCTAACCC
+
BBBBBFBFFFFFFFFFFFFFFFFFFFFFBBF/<<F/</</BBF/<<BFFBF/</<FBFFFBFFFF<FFBF//BFF
@HS32_18418:2:1312:15929:23212#1/2
CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFF/FFFFFFF<FFFBFFF<F/
@HS32_18418:2:2309:4779:65172#1/2
CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAAACCCTAAACC
+
FFB<</BFF<//FFF/</FFFFB<FFBFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB


it looks dodgy to me as all the reads seem to be repetitive sequences
fh331 is offline   Reply With Quote
Old 04-06-2016, 03:23 AM   #14
SylvainL
Senior Member
 
Location: Geneva

Join Date: Feb 2012
Posts: 175
Default

Looking at your reads, it seems you had a problem when you extracted them from the CRAm files. They look exactly similar with a shift of 3-4 bp...
SylvainL is offline   Reply With Quote
Old 04-06-2016, 03:50 AM   #15
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 502
Default

These sequences (CCCTAA) are from the telomere repeat. CRAMs are sorted files so, depending on the reference used for alignment, it's not unusual to see all of these reads at the beginning of the file.

What IS unusual is that both reads are from the same strand, when one should be the reverse complement of the other (e.g., TTAGGG). Not sure how that happened, but it's likely to be the reason why realignment failed.
HESmith is offline   Reply With Quote
Old 04-06-2016, 05:46 AM   #16
fh331
Member
 
Location: UK

Join Date: Apr 2016
Posts: 19
Default

Quote:
Originally Posted by SylvainL View Post
Looking at your reads, it seems you had a problem when you extracted them from the CRAm files. They look exactly similar with a shift of 3-4 bp...
I peeked into the original cram file using the following command and the result is the same Fastq. So i don't know if the Cram decompression to Fastq has caused the problem:

fazal@fazal-Precision-T1700:/media/fazal/backup/BCL11A/Cram$ java -jar /opt/cramtools-3.0.jar fastq -I 18418_2#1.cram | head -20
@HS32_18418:2:2307:11553:47098#1/1
CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
+
BBBBBBF/FFFFFFFFFFFFFFFFFFFFFFFFFFFBBFFFB</<FFFF//FFFFFFFBBF<FFFFFFBF/B<FF/
@HS32_18418:2:2307:11553:47098#1/2
TAACCCTACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTTACCCTAACCCTAACCCAAC
+
</FBB<///F<BF/B/F<FB<<FB<B<FFF/BFFBFBBB<FFB<FFFFBFFFFFF<FFFF/FFBFFF<FFBBBBB
@HS32_18418:2:1201:20716:93279#1/2
ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFF//FFB
@HS32_18418:2:1201:20716:93279#1/1
AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAAC
+
/<FFF</<FBF<</FFFBF<FFFFBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB
@HS32_18418:2:1102:8324:84406#1/1
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAACCCTA
+
BBBB</<<FFFFFBBF<<BFFBB/F/<<<<F/FFFFFBBB<F//B/F/<FBFFB//</BFFBB/</////</7/<
fh331 is offline   Reply With Quote
Old 04-06-2016, 06:15 AM   #17
fh331
Member
 
Location: UK

Join Date: Apr 2016
Posts: 19
Default

Quote:
Originally Posted by HESmith View Post
These sequences (CCCTAA) are from the telomere repeat. CRAMs are sorted files so, depending on the reference used for alignment, it's not unusual to see all of these reads at the beginning of the file.

What IS unusual is that both reads are from the same strand, when one should be the reverse complement of the other (e.g., TTAGGG). Not sure how that happened, but it's likely to be the reason why realignment failed.
Hi HESmith,
Bear with me for the silly and naive question i am about to ask: In PE chipseq, pair of reads should be completely reverse complementary to each other or there is like a region in which the reads are reverse complementary?
fh331 is offline   Reply With Quote
Old 04-06-2016, 06:24 AM   #18
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,856
Default

Quote:
Originally Posted by fh331 View Post
Hi HESmith,
Bear with me for the silly and naive question i am about to ask: In PE chipseq, pair of reads should be completely reverse complementary to each other or there is like a region in which the reads are reverse complementary?
They will only be completely rev-comp of each other, IF the insert size=number of F/R sequencing cycles (not likely for every fragment in your library).
GenoMax is offline   Reply With Quote
Old 04-06-2016, 09:36 AM   #19
fh331
Member
 
Location: UK

Join Date: Apr 2016
Posts: 19
Default

Quote:
Originally Posted by SylvainL View Post
Are you sure the reads in file 1 and file 2 are in the same order? Just print the first 5 reads of each file...

You can also try to map only one file (not considered as paired-end then) to see the percentage of mapped reads...
Hi SylvainL,
I tried to align single file from the PE. Here's the output report and worked fine i think:

fazal@fazal-Precision-T1700:/media/fazal/backup/BCL11A/FastQ_Files$ bowtie -m 1 -S /media/fazal/backup/BCL11A/Bowtie_Indices/Bowtie1_Index/human_g1k_v37.fasta 18418_2-1_1.fastq > /media/fazal/backup/BCL11A/Sam_from_FastQ/18418_2-1_1.sam
# reads processed: 40095362
# reads with at least one reported alignment: 33362937 (83.21%)
# reads that failed to align: 2434332 (6.07%)
# reads with alignments suppressed due to -m: 4298093 (10.72%)
Reported 33362937 alignments to 1 output stream(s)

No I don't know what's going wrong with aligning the two PEs together?

I am kind of getting frustrated
fh331 is offline   Reply With Quote
Old 04-06-2016, 09:46 AM   #20
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 198
Default

Like @HESmith said, you may have a problem with the orientation of your read pairs.

The default orientation for valid alignments in bowtie1 is --fr. If your read pairs got converted somehow to ff, then none of the alignments would be valid.
fanli is offline   Reply With Quote
Reply

Tags
alignment, chipseq, failure, reads

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:12 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO