Probing Gene Expression One Molecule at a Time

Gene expression, a central process to all life, is stochastic because most genes often exist in one or two copies per cell. Although the central dogma of molecular biology has been proven beyond a doubt, due to insufficient sensitivity, stochastic protein production has not been visualized in real-time in an individual cell at the single-molecule level. We report the first direct observation of single protein molecules as they are generated, one at a time in a single live E. coli cell, providing a quantitative description about gene expression.

We used a fast maturing fluorescent protein called Venus as a gene expression reporter developed by Mya. We demonstrated a general strategy for measuring live-cell single-molecule fluorophores: detection by localization (Fig. 1) [1]. The key for achieving single molecule detection was to immobilize the fluorescent protein reporter on the cell membrane, by constructing a chimeric fluorescent protein reporter tsr-venus (Fig. 2A), which contains a membrane localization sequence. It is normally difficult to detect single protein molecules inside cytoplasm - their fluorescence is spread by fast diffusion to the entire cell during the image acquisition time, and overwhelmed by the strong cellular autofluorescence. However, molecules on cell membranes diffuse much slower, and therefore can be detected individually with our sensitive microscope. Using this approach, we recorded movies of growing E. coli cells to study the real-time expression from the lac operon in repressed state in real-time (Fig. 2B).

Fig. 1. Single-molecule detection of a fluorescent fusion protein, Tsr-Venus, in live E. coli cells. (A) Fluorescence and (B) DIC images of two E. coli cells expressing Tsr-Venus. Two single fusion protein molecules were detected as diffraction-limited fluorescent spots in the left cell. (C) Line cross section of the fluorescence signal along long axes of the two E. coli cells a.u., arbitrary units. (D) Fluorescence time trace of a single Tsr-Venus molecule in an E. coli cell, showing abrupt photobleaching. (E) Detection by localization.

Transcription by RNA polymerase is initiated upon a stochastic dissociation event of the repressor from the operator region of DNA, generating one mRNA molecule. A burst of fusion protein molecules will be synthesized by multiple ribosome molecules bound to the mRNA, yielding fluctuating protein production in time. We observed that under the repressed condition protein molecules are produced in bursts, with each burst originating from a stochastically-transcribed single messenger RNA molecule, and that protein copy numbers in the bursts follow an exponential distribution.These observations were predicted only theoretically previously, accounted for by the competition between mRNA degradation by nuclease and translation by ribosome.

Fig.2(A) Scheme of live-cell observations of gene expression. Transcription of one mRNA by an RNA polymerase results from an infrequent and transient dissociation event of repressor from DNA. Multiple copies of protein molecules are translated from the mRNA by ribosomes. Upon being assembled into E. coli's inner membrane, Tsr-Venus protein molecules can be detected individually by a fluorescence microscope. (B) A DIC/fluorescence overlay image of E. coli expressing Tsr-Venus from lac operon. Single Tsr-Venus molecules are shown in yellow dots. (C) Stochastic protein production bursts in different cell lineages, each burst is due to a single mRNA molecule.

We simultaneously developed a different method using β-glactosidase as a reporter [2] to probe gene expression in living cells with single protein molecule sensitivity: A single copy of β-galactosidase generates many copies of fluorescence molecules when a cell trapped in a microfluidic device was treated with a fluorogenic substrate, resulting in enzymatic amplification in signal and single molecule sensitivity. This technique has been applied to probe gene expression from E. coli as well as individual budding yeast and mouse embryonic stem cells. Again protein production occurs in bursts with exponentially distributed protein copy numbers.

Fig 3. (A) Schematic diagram of the two-layer microfluidic chamber used for the enzymatic assay. Cells are trapped inside a volume of 100pl chamber. (B) Enzymatic reaction: hydrolysis of the synthetic substrate FDG by the β-gal yields a fluorescent product, fluorescein. (C) Real-time detection of β-gal production in a living cell. Discrete jumps in β-gal number are due to burst-like production of proteins. (D) Histogram of copy number of β-gal molecules per burst. The black line is the single exponential fit to the histogram.

For each gene, the dynamics of the central dogma can be described by two parameters — the burst frequency, a, which is the number of bursts per cell cycle; and the burst size, b, which is the average number of molecules produced per burst. We determined a and b can be from the single-cell time traces, such as in Fig. 3. Under steady-state conditions, temporal fluctuations of gene expression in each cell lineage lead to variation in copy number in an isogenic population of cells. A typical copy-number distribution, which is often asymmetrical, is shown in Fig. 3. A rigorous mathematical relationship between fluctuations in expression and the distribution of protein copy number in a population of cells has been lacking in the literature. A log-normal function has often been used as a convenient phenomenological fit, but it offers no physical insight.

We proved that, under steady-state conditions with uncorrelated and exponentially distributed bursts, the protein copy-number distribution, p(x), can be approximated as a gamma distribution (Fig.4), which has two parameters — a and b,as defined earlier. This allows extraction of intrinsic cellular parameters, a and b from fitting a gamma function to the measured p(x) . At low expression levels, the values for a and b determined in this way are consistent with those derived from the single-cell times traces.

Fig. 4. Protein molecules are produced in bursts (red), in addition to existing molecules (blue), and diluted upon cell division, leading to a steady-state distribution, p(x), of protein copy number, x, in a cell population with identical genome. p(x) is a Gamma function with two adjustable parameters, a and b, transcription burst frequency and burst size.

Our single-molecule experiments have provided quantitative descriptions about gene expression in a live cell.


[1]Yu, Ji; Xiao, Jie; Ren, Xiaojia; Lao, Kaiqin; Xie, X. Sunney; “Probing gene expression in live cells, one protein molecule at a time,” Science, 311, 1600-1603 (2006).
[2] Cai, Long; Friedman, Nir; Xie, X. Sunney; “Stochastic protein expression in individual cells at the single molecule level,” Nature, 440, 358-362 (2006).
[3] Friedman, Nir; Cai, Long; Xie, X. Sunney “Linking stochastic dynamics to population distribution: An analytical framework of gene expression,“Phys. Rev. Lett.97, 168302 (2006).