This directory contains information about whether Fantom and public cDNA sequences represent full-length RNAs or artefactual truncations. f3_mm5_full-length : 41025 out of 102801 Fantom cDNAs that have experimental support for their 5' and 3' ends (see below), and are not immediately upstream of A-rich sequences in the genome (> 10 As within 20 nt). gb_mm5_full-length : 26818 out of 56006 public non-RIKEN cDNAs that have experimental support for their 5' and 3' ends (see below), and are not immediately upstream of A-rich sequences in the genome (> 10 As within 20 nt). f3_mm5_tails : Genomic sequences from -10 to +30 relative to the 3' ends of Fantom1+2+3 sequences. When they are A-rich, it suggests that the Fantom sequence may be derived from internal priming of genomic DNA, pre-mRNA, or mature RNA. gb_mm5_tails : Genomic sequences from -10 to +30 relative to the 3' ends of public non-RIKEN cDNAs. Note that non-RIKEN cDNAs often include a poly-A tail, but Fantom cDNAs rarely do. f3_mm5_5end_support.txt : f3_mm5_3end_support.txt : gb_mm5_5end_support.txt : gb_mm5_3end_support.txt : Presence or absence of various kinds of experimental support for the 5' and 3' ends of the cDNAs. cDNAs were not considered unless they mapped unambiguously to the genome, and had at most 5 nt of sequence unaligned at the end being considered (5' or 3'), not counting poly-A tails for non-RIKEN sequences. Criteria for 5' ends: 2 CAGE tag starts within +-15 nt 3 CAGE tag starts within +-60 nt 4 CAGE tag starts within +-100 nt 1 GSC ditag start within +-0 nt 2 GSC ditag starts within +-50 nt 1 GIS ditag start within +-15 nt 1 RIKEN 5'EST start within +-3 nt 2 RIKEN 5'EST starts within +- 100 nt 1 non-RIKEN 5'EST start within +-2 nt 2 non-RIKEN 5'EST starts within +-100 nt 1 other Fantom clone start within +- 25 nt 1 non-RIKEN RNA start within +-50 nt Criteria for 3' ends: 1 GSC ditag end within +-0 nt 2 GSC ditag ends within +-50 nt 1 GIS ditag end within +-15 nt 1 RIKEN 3'EST end within +-2 nt 2 RIKEN 3'EST ends within +-100 nt 1 non-RIKEN 3'EST end within +-7 nt 2 non-RIKEN 3'ESTs ends within +-100 nt 1 other Fantom clone end within +- 25 nt 1 non-RIKEN RNA end within +-50 nt Each of these criteria has less than 1 in 1000 chance of occurring if the sites are randomly scattered across the genome. Sequences from the same clone (e.g. ESTs) were not counted as independent evidence. Both orientations were considered for GSC ditags. Many thanks to Par Engstrom for help with ESTs, and Shintaro Katayama and Akira Hasegawa for help with tags. Contact: Martin Frith