VeloPotential Tutorial

Import Packages

import scvelo as scv
import scanpy as sc
import matplotlib.pyplot as plt
import velopotential as vp

Load the Data

The analysis is based on a well-characterized mouse pancreatic endocrinogenesis dataset, which can be directly downloaded via a scVelo function.

adata = scv.datasets.pancreas()
print(adata)

AnnData object with n_obs × n_vars = 3696 × 27998
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score'
    var: 'highly_variable_genes'
    uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca'
    obsm: 'X_pca', 'X_umap'
    layers: 'spliced', 'unspliced'
    obsp: 'distances', 'connectivities'

Generate the initial RNA velocity field

We first process the data using the standard scVelo workflow, and then generate the initial RNA velocity field using the scVelo dynamical model.

Note: Our tool VeloPotential requires precomputed velocities as input. You can use any tool to calculate the velocities; here, we use scVelo as an example.

scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
scv.tl.recover_dynamics(adata)
scv.tl.velocity(adata, mode='dynamical')

Filtered out 20801 genes that are detected 20 counts (shared).
Normalized count data: X, spliced, unspliced.
Extracted 2000 highly variable genes.
Logarithmized X.
computing neighbors
    finished (0:00:10) --> added 
    'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
    finished (0:00:00) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
recovering dynamics (using 1/16 cores)

    finished (0:11:04) --> added 
    'fit_pars', fitted parameters for splicing dynamics (adata.var)
computing velocities
    finished (0:00:03) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)

We project the original scVelo velocities onto the UMAP embedding. The plot_velocity_projection function serves as a wrapper that encapsulates calls to scVelo’s velocity_graph and velocity_embedding_stream functions.

vp.pl.plot_velocity_projection(adata,color="clusters",figsize=(5,3.5),xkey="Ms")

computing velocity graph
finished.

_images/1f8316674034a1116a2f220bf9825f54169f7928c6a0362f55d581e3a971072b.png

computing velocity embedding
finished.

Construct the global potential landscape and the corresponding potential-driven velocity field.

We first construct the global potential landscape and its associated potential-driven velocity field by training the VeloPotential model using the construct_potential function. By default, the inferred global potential landscape is stored in adata.obs[‘potential’], and the corresponding potential-driven velocity field is stored in adata.layers[‘velocity_pred’].

vp.tl.construct_potential(adata)
print(adata)

epoch   1: loss=0.459280
epoch  51: loss=0.091485
epoch 101: loss=0.084943
epoch 151: loss=0.081621
epoch 201: loss=0.080599
epoch 251: loss=0.079732
epoch 301: loss=0.078333
epoch 351: loss=0.077839
Early stopping at epoch 365
AnnData object with n_obs × n_vars = 3696 × 2000
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition', 'potential'
    var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'highly_variable', 'fit_r2', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_', 'fit_scaling', 'fit_std_u', 'fit_std_s', 'fit_likelihood', 'fit_u0', 'fit_s0', 'fit_pval_steady', 'fit_steady_u', 'fit_steady_s', 'fit_variance', 'fit_alignment_scaling', 'velocity_genes'
    uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca', 'log1p', 'recover_dynamics', 'velocity_params', 'velocity_graph', 'velocity_graph_neg'
    obsm: 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'loss'
    layers: 'spliced', 'unspliced', 'Ms', 'Mu', 'fit_t', 'fit_tau', 'fit_tau_', 'velocity', 'velocity_u', 'velocity_pred'
    obsp: 'distances', 'connectivities'

We then project the potential-driven velocities onto the UMAP embedding.

vp.pl.plot_velocity_projection(adata,color="clusters",figsize=(5,3.5),vkey="velocity_pred",xkey="Ms")

computing velocity graph
finished.

_images/c26f8aeb6f21c96ba2759b9b937d829df56166d724424d65e53547420de97e37.png

computing velocity embedding
finished.

We present a heatmap of the global potential landscape.

fig, ax = plt.subplots(figsize=(5, 3.5))
sc.pl.umap(adata,color="potential",title="Potential",frameon=False,ax=ax)

_images/21927b7c4d855475728c93a3a038e3b3845012e83500a995be7fe1c9d5d1eb9e.png

We compute a “potential consistency score” to robustly diagnose velocity reliability at single-cell resolution.

adata.obs['Potential consistency score'] = vp.tl.cal_cosine_similarity(adata.layers['velocity'], adata.layers['velocity_pred'])
fig, ax = plt.subplots(figsize=(5, 3.5))
sc.pl.umap(adata,color="Potential consistency score",title="Potential consistency score",frameon=False,ax=ax,cmap="RdBu")

_images/3d4a7a99e34fe79106c7b9c468ef9996b6b73ec7e7542a384e3e1281b6b42f36.png

Finally, we examine the quantitative relationship between scVelo’s latent time and our inferred potential by plotting a scatter plot of the two measures.

scv.tl.latent_time(adata)
scv.pl.scatter(adata, x='latent_time', y='potential', color='clusters',title=" ",legend_loc="upper right",fontsize=12,figsize=(4,4),xlabel="scVelo latent time",ylabel="Potential",show=False)
ax = plt.gca()
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False) 
plt.show()

computing terminal states
    identified 2 regions of root cells and 1 region of end points .
    finished (0:00:00) --> added
    'root_cells', root cells of Markov diffusion process (adata.obs)
    'end_points', end points of Markov diffusion process (adata.obs)
computing latent time using root_cells as prior
    finished (0:00:00) --> added 
    'latent_time', shared time (adata.obs)

_images/0b96bb8190cadbcf2ab14a22057bd6082ce471d4099ea108b0d547495670ce80.png