This protocol adapts Cap Analysis of Gene Expression (CAGE) for Illumina patterned flow cell sequencers (e.g., NextSeq 1000/2000, NovaSeq) using unique dual indexes (UDI) and paired‑end reads. It updates earlier Illumina CAGE protocols by introducing dual‑index adapter ligations and second‑strand synthesis to ensure robust demultiplexing and to prevent adapter‑dimer competition on patterned flow cells.

Background

CAGE captures the 5′ ends of capped RNAs to profile transcription start sites (TSSs) at single‑base resolution[1][2][3][4][5]. This implementation maintains the cap‑trapper chemistry and single strand cDNA library construction, while adding UDI adapters and a double strand conversion/size‑selection step to support patterned flow cells and paired‑end sequencing[6].

Workflow Overview

fig03
Figure 1: Workflow overview of the Direct cDNA CAGE.[6]

Materials

  • Total RNA (3–5 µg per library; A260/230 > 1.8; A260/280 > 1.8; RIN > 7)
  • Reverse transcriptase (SuperScript III or equivalent)
  • Random N6+TCT primer
  • Agencourt RNA Clean XP and AMPure XP magnetic beads
  • Sodium periodate (NaIO₄), biotin (long‑arm) hydrazide
  • RNase I, RNase H
  • Streptavidin magnetic beads
  • UDI adapters (5′ adapter with i5, 3′ adapter with i7) compatible with Illumina paired‑end chemistry
  • LongAmp Taq (or equivalent) for second‑strand conversion
  • Exonuclease I
  • SPRIselect beads and standard buffers
  • QA/QC reagents (KAPA qPCR, fragment analyzer)
    Notes: Select UDI sets validated for patterned flow cells. Maintain RNase/DNase‑free conditions throughout.

Procedure

Bench‑top

A. RNA Preparation & QC (∼1 day)

  1. Extract total RNA using your preferred method (e.g., Maxwell, column or phenol/chloroform).
  2. Quantify and assess integrity (Nanodrop/Qubit; BioAnalyzer/TapeStation).
  3. Proceed only if A260/230 > 1.8, A260/280 > 1.8, and RIN > 7.

B. First-Strand cDNA Synthesis

  1. Prime 3–5 µg RNA with Random N6+TCT primer; synthesize cDNA (SuperScript III).
  2. Cleanup to obtain RNA/ss-cDNA hybrids.

C. Cap Trapping

  1. Oxidize RNA cap with NaIO₄; biotinylate with long-arm biotin hydrazide.
  2. Digest single-stranded RNA with RNase I.
  3. Capture biotinylated RNA/cDNA hybrids on streptavidin beads; wash thoroughly.
  4. Treat with RNase H and RNase I; recover ss-cDNA.

D. UDI Adapter Ligations

  1. Ligate 5′ adapter harboring a unique i5 index to ss-cDNA; wash to remove unbound adapters.
  2. Ligate 3′ adapter harboring a unique i7 index; wash to remove unbound adapters.

E. ds-Conversion & Cleanup

  1. Perform second-strand synthesis (polymerase-mediated) to overcome adapter-dimer competition on patterned flow cells.
  2. Digest residual primers/ss-DNA with Exonuclease I.
  3. Wash and perform SPRIselect size selection to remove adapter dimers and enrich library fragments.

F. Library QC

  1. Quantify libraries (e.g., KAPA qPCR) and assess size distribution (fragment analyzer).
  2. Pool 9–12 libraries per run as a starting point; adjust by platform/read length and target depth.

Sequencing (patterned flow cells)

  • Platform example: NextSeq 2000 (P2 flow cell).
  • Mandatory read configuration: paired‑end, e.g., 2×50 bp or 2×100 bp.
  • Use unique dual indexes (UDI) (both i5 and i7 unique per sample) to minimize index hopping and ensure strict demultiplexing.
  • Save demultiplexed FASTQs (R1/R2; I1 = i7, I2 = i5).
    Typical yield: 350–500 M reads/run for a pool of 9–12 libraries (∼30–45 M reads/library), depending on kit and flow cell.

Data Processing[7]

 Use the dscage‑pe2 pipeline[7] (Docker/Singularity available) for paired‑end direct cDNA CAGE. The pipeline performs quality control, mapping (hg38/mm10 default, more can be added if necessary*), and CTSS calling suitable for TSS/enhancer analyses.

  • Inputs: demultiplexed paired‑end FASTQs (R1/R2)
  • Outputs: mapped reads, CTSSs, QC reports, and files prepared for downstream CAGE analysis/visualization.
    *Extend to other genomes by adding corresponding annotations to the pipeline configuration.

Notes & Troubleshooting

  • Index hopping: Always use unique i5+i7 combinations (UDI). Non‑unique dual indexes are not recommended on patterned flow cells.
  • Low input: If ≤ 3 µg RNA is unavoidable, consider LQ‑ssCAGE[8] adapted by retaining the dual indexes and ds‑conversion step from this protocol before sequencing on patterned flow cells.
  • Pooling: For NextSeq 2000 P2, avoid pooling >12 libraries unless using longer kits/flow cells (e.g., P3) to maintain depth.

Compatibility

  • Sequencers: NextSeq 1000/2000, NovaSeq series (patterned flow cells).
  • Indexing: UDI plates/sets (96×96 commonly used) with matched i7/i5 pairs.
  • Read mode: Paired‑end only (pipeline assumes PE).

Safety

 Follow institutional guidelines for handling hazardous chemicals (e.g., sodium periodate) and RNase‑free techniques.

References

  1. ^ Kodzius, R. et al. CAGE: cap analysis of gene expression. Nat Methods 3, 211-22 (2006), doi: 10.1038/nmeth0306-211
  2. ^ Takahashi, H. et al. 5′ end–centered expression profiling using cap-analysis gene expression and next-generation sequencing. Nat Protoc. 7, 542-561 (2012), doi: 10.1038/nprot.2012.005
  3. ^ Kanamori‑Katayama, M. et al. Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res. 21(7), 1150-1159 (2011), doi: 10.1101/gr.115469.110
  4. ^ FANTOM Consortium and the RIKEN PMI and CLST (DGT). A promoter-level mammalian expression atlas. Nature 507(7493), 462-470 (2014), doi: 10.1038/nature13182
  5. ^ Murata, M. et al. Detecting expressed genes using CAGE. Methods Mol Biol. 1164, 67-85 (2014), doi: 10.1007/978-1-4939-0805-9_7
  6. ^ Delobel, D. et al. Protocol for direct cDNA cap analysis of gene expression for paired-end patterned flow cell sequencing. STAR Protocols. (2025), doi: 10.1016/j.xpro.2024.103594
  7. ^ Github: dscage-pe2
  8. ^ Takahashi, H. et al. Low Quantity Single Strand CAGE (LQ-ssCAGE) Maps Regulatory Enhancers and Promoters. Methods Mol Biol. 2351, 67-90 (2021), doi: 10.1007/978-1-0716-1597-3_4