Personal tools

OP-HELICOS-CAGE-Filtering-v1.0

From FANTOM5_SSTAR

Jump to: navigation, search

Protocol: OP-HELICOS-CAGE-Filtering-v1.0

Author: Katayama, Shintaro

Created: July 8, 2010

Updated: July 16, 2010

Parameters:


Description:

FilterSMS in helisphere-0.14.a015 package is used to filter out artificial or short/long reads using the following options. In short, all raw reads were filtered according to the following criteria by filterSMS in helisphere-0.14.a015 package; (1) read length is 20~70-nt, (2) AT content <= 90% and fraction of CT/TA/AG/GA dinucleotides <= 80%, (3) the longest prefix consisting of < 75% T, and (4) non similality to the base-addition-order sequence (BAO) and to some oligonucleotides added in the wet experiments and in the sequencing. In the last rule, any read with an alignment score (=(5m-4e)/l, where m is the number of matches, e is the number of errors of any type, and l is the read length) >= 3.5 was removed.

-- FilterSMS option begin ------
 --minlen 20 \
 --maxlen 70 \
 --quality 10 \
 --dinuc ${DINUC} \
 --trim_hp T/H/1/0.75 \
 --align ${BAOCNT} \
 --minscore 3.5 \
 --percent_error 30 \
 --config_file ${CONF}
-- FilterSMS option end ------
-- DINUC file begin ------
Filter  AA      AC      AG      AT      CA      CC      CG      CT      GA      GC      GG      GT      TA      TC      TG      TT      Thresh
BAO     0       0       1       0       0       0       0       1       0       1       0       0       1       0       0       0       0.80
AT      1       0.5     0.5     1       0.5     0       0       0.5     0.5     0       0       0.5     1       0.5     0.5     1       0.90
-- DINUC file end ------
-- BAOCNT file begin ------
>BASE_ADDTION_ORDER_REFERENCE
CTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAG
>dge_spike_low
AGATGCATCAATGGCGACACTGAGAGGCTGATGAGCCAATGCCTTCAAGAGACTCTTCTC
ATCATTAGTAGGTACGTCTTGGTGTCCATTAATGGTTACT
>dge_spike_medium
CTGACGGTCCACAGAAGTTTGAGCCTGACTCTTGAGTGTTGTGAGACCGTTGCAGCAGAG
GGTTGGGACCGGTCCGCCCTGAGTCACGTAGGATAAGCAA
>dge_spike_high
TCTCCTGCGTTTCCACTCTCAAGCTCTCCAGCACTCATCATGATTGGGTTGATACCCATC
TTGGCCATGACAAGCTCACACTGGAAGGATTTACCTTGAC
>dge_tailing_A
CAGGGCAGAGGATGGATGCAAGGATAAGTGGA
>dge_tailing_B
GACACTCACTTCTTACGACTCAGCGATGATGG
>dge_tailing_C
TTAGCCAACCGCGGACAGCTACATGGACTTCT
-- BAOCNT file end ------
-- CONF file begin ------
LocalGlobalOption GL
HomoPolymerOptionInReference 0
TagNumReads 1
HomoPolymerOptionInTag 0
-- CONF file end ------
-- SCORE file begin ------
ReferenceNonHomoPolymerGap -4
ReferenceHomoPolymerGap -1
TagNonHomoPolymerGap -4
TagHomoPolymerGap -1
NucleotideMatch 5
NucleotideMismatch -4
NucleotideToNMatch -2
-- SCORE file end ------