= comprehensive 5'/3'-end boundary set

This directory has the following data
* comprehensive 5'-end clusters (=TCs)
* comprehensive 3'-end clusters
* comprehensive 5'/3'-end pairs

== files

=== 5'-end clusters

* end5_clusters.txt
  Comprehensive 5'-end clusters
  This file is based on tss>TC>tc.txt in the file exchange server
  Additionally reliability value is assigned
  ('Reliable' if there are two or more evidences, 'Unknown' if one evidence)

  5'-ends are defined with the following transcript sequences
  * CAGE tags
  * 5'end of GIS ditags (<2.5 Mbps)
  * 5'end of GSC ditags (<2.5 Mbps)
  * RIKEN 5'ESTs
  * RIKEN mRNA (FANTOM3 103k full-length cDNA set)

  Tab-delimited text
  1)  TC_id
  2)  chr_no
  3)  strand (+/-)
  4)  genome region start
  5)  genome region end (start < end)
  6)  reliability (1=reliable/0=unknown)
  7)  the number of CAGE tags
  8)  the number of GIS ditags
  9)  the number of GSC ditags
  10) the number of RIKEN 5'ESTs
  11) the number of RIKEN mRNAs (FANTOM3 103k set)
  12) the number of long-SAGE (not used)
  13) the number of dbtss (not used)
  14) representative position ID (CAGE:ctss_id)
  15) representative genome position
  16) the number of tags in this representative genome position

* end5_clusters.gff
  GFF file of the above

=== 3'-end clusters

* end3_clusters.txt
  Comprehensive 3'-end clusters
  ('Reliable' if there are two or more evidences, 'Unknown' if one evidence)

  3'-ends are defined with the following transcript sequences
  * 3'end of GIS ditags (<2.5 Mbps)
  * 3'end of GSC ditags (<2.5 Mbps)
  * public (non-RIKEN) mRNAs
  * RIKEN mRNAs (FANTOM3 103k full-length cDNA set)
  * RIKEN 3'ESTs
  * public (non-RIKEN) 3'ESTs

  Tab-delimited text
  1)  3'end_id
  2)  chr_no
  3)  strand (+/-)
  4)  genome region start
  5)  genome region end (start < end)
  6)  reliability (1=reliable/0=unknown)
  7)  the number of GIS ditags
  8)  the number of GSC ditags
  9)  the number of RIKEN mRNAs (FANTOM3 103k set)
  10) the number of public (non-RIKEN) mRNAs
  11) the number of RIKEN 3'ESTs
  12) the number of public (non-RIKEN) 3'ESTs
  13) representative position ID (CAGE:ctss_id)
  14) representative genome position
  15) the number of tags in this representative genome position

* end3_clusters.gff
  GFF file of the above

=== 5'/3'-end pairs

* pair53_clusters.txt
  Comprehensive 5'/3'-end pair clusters

  Tab-delimited text
  1)  5'/3'-end pair ID
  2)  chr_no
  3)  strand (+/-)
  4)  genome region start
  5)  genome region end (start < end)
  6)  reliability (1=reliable/0=unknown)
  7)  the number of RIKEN mRNAs (FANTOM3 103k set)
  8)  the number of public (non-RIKEN) mRNAs
  9)  the number of RIKEN 5'/3'EST pairs
  10) the number of GIS ditags
  11) the number of GSC ditags

* pair53_clusters.gff
  GFF file of the above 5'/3'-end pair clusters

== contributors

* 5'-end clusters (TC)
  Shintaro Katayama and Kenji Nakano

* 3'-end clusters
  Mark Crowe, Christine Wells and Sean Grimmond

* 5'/3'-EST pairs
  Martin Frith

* 5'/3'-end pairs
  Takeya Kasukawa

== note

* "Reliablity" is different from "completeness".