| Title: | Penalized Fast Causal Inference for High-Dimensional Structure Learning |
|---|---|
| Description: | Implements Penalized Fast Causal Inference (PFCI), a two-stage causal structure learning procedure for high-dimensional settings with potential latent variables and selection bias. In the first stage, neighborhood selection via the Lasso constructs a sparse undirected skeleton. In the second stage, the Fast Causal Inference (FCI) algorithm orients edges on this reduced graph, producing a Partial Ancestral Graph (PAG) that accounts for latent confounders. The method is consistent under sparsity assumptions and substantially faster than standard FCI and RFCI in high dimensions. See Pal, Ghosh, and Yang (2025) <doi:10.48550/arXiv.2507.00173> for the underlying theory. |
| Authors: | Samhita Pal [aut] (ORCID: <https://orcid.org/0009-0001-4930-916X>), Dhrubajyoti Ghosh [aut, cre] (ORCID: <https://orcid.org/0000-0002-3360-3786>), Shu Yang [aut] (ORCID: <https://orcid.org/0000-0001-7703-707X>) |
| Maintainer: | Dhrubajyoti Ghosh <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.1 |
| Built: | 2026-06-03 09:42:32 UTC |
| Source: | https://github.com/djghosh1123/pfci |
Designed for the 3-line workflow: sim <- simulate_with_latent(...) fit <- pfci_fit(sim$X, ...) met <- metrics_with_latent(sim, fit)
metrics_with_latent(sim, fit)metrics_with_latent(sim, fit)
sim |
Output from simulate_with_latent(). |
fit |
Output from pfci_fit() (must contain $amat and $time$total). |
Returns only: SHD, F1_total, MCC, Time.
A named list with SHD, F1_total, MCC, Time.
simulate_with_latent, pfci_fit
sim <- simulate_with_latent(p_obs = 30, gamma = 0.05, n = 100, seed_graph = 1) fit <- pfci_fit(sim$X, alpha = 0.05) met <- metrics_with_latent(sim, fit) print(met)sim <- simulate_with_latent(p_obs = 30, gamma = 0.05, n = 100, seed_graph = 1) fit <- pfci_fit(sim$X, alpha = 0.05) met <- metrics_with_latent(sim, fit) print(met)
Runs a two-stage procedure: (1) Graphical lasso screening to obtain a sparse undirected super-skeleton (2) FCI on the restricted search space using fixedGaps and a gated CI test
pfci_fit( X, alpha = 0.05, rho = NULL, approx = TRUE, skel.method = "stable", doPdsep = FALSE, labels = NULL )pfci_fit( X, alpha = 0.05, rho = NULL, approx = TRUE, skel.method = "stable", doPdsep = FALSE, labels = NULL )
X |
Numeric matrix or data.frame of dimension n x p. |
alpha |
Significance level for conditional independence tests in FCI. |
rho |
Graphical lasso penalty. If NULL, uses a default depending on n. |
approx |
Passed to glasso::glasso. |
skel.method |
Skeleton method for pcalg::fci (default "stable"). |
doPdsep |
Logical; passed to pcalg::fci. Default FALSE. |
labels |
Optional variable names (length p). If NULL uses colnames or X1..Xp. |
An object of class pfci_fit, a list containing:
Adjacency matrix of the estimated PAG (integer codes: 0=none, 1=circle, 2=arrowhead, 3=tail).
The raw fci output object from pcalg.
Logical adjacency matrix from the glasso screening step.
Logical matrix of fixed gaps passed to FCI.
The glasso penalty used.
The significance level used.
A list with glasso, fci, and total runtimes in seconds.
Pal, S., Ghosh, D., and Yang, S. (2025). Penalized FCI for Causal Structure Learning in a Sparse DAG for Biomarker Discovery in Parkinson's Disease. Annals of Applied Statistics. doi:10.48550/arXiv.2507.00173
pfci_metrics, plot_pag,
simulate_pfci_toy
sim <- simulate_pfci_toy(p = 30, n = 100, edge_prob = 0.05, seed = 1) fit <- pfci_fit(sim$X, alpha = 0.05) print(fit)sim <- simulate_pfci_toy(p = 30, n = 100, edge_prob = 0.05, seed = 1) fit <- pfci_fit(sim$X, alpha = 0.05) print(fit)
Designed for the 3-line workflow: sim <- simulate_pfci_toy(...) fit <- pfci_fit(sim$X, ...) met <- pfci_metrics(sim, fit)
pfci_metrics(sim, fit, compute_marks = FALSE)pfci_metrics(sim, fit, compute_marks = FALSE)
sim |
Output from simulate_pfci_toy(). |
fit |
Output from pfci_fun()/pfci_fit() with at least $amat and $time$total. |
compute_marks |
Logical. If TRUE, also computes mark-level F1 when truth amat is present. |
Default metrics compare estimated PAG adjacency (skeleton) to the generating DAG skeleton.
If compute_marks=TRUE and sim$truth$amat exists, it also reports mark-level F1s:
F1_dir (->)
F1_oDir (o->)
F1_bidir (<->)
F1_circ (o-o)
F1_arrow (arrowheads)
F1_tail (tails)
A named list of metrics.
sim <- simulate_pfci_toy(p = 30, n = 100, edge_prob = 0.05, seed = 1) fit <- pfci_fit(sim$X, alpha = 0.05) met <- pfci_metrics(sim, fit) print(met)sim <- simulate_pfci_toy(p = 30, n = 100, edge_prob = 0.05, seed = 1) fit <- pfci_fit(sim$X, alpha = 0.05) met <- pfci_metrics(sim, fit) print(met)
Plots the Partial Ancestral Graph (PAG) estimated by pfci_fit
using the pcalg plot method. Requires Rgraphviz to be installed.
plot_pag(fit, ...)plot_pag(fit, ...)
fit |
A |
... |
Additional arguments passed to the pcalg plot method. |
Invisibly returns NULL. Called for its side effect of
producing a graph plot.
sim <- simulate_pfci_toy(p = 20, n = 100, edge_prob = 0.05, seed = 1) fit <- pfci_fit(sim$X, alpha = 0.05) plot_pag(fit)sim <- simulate_pfci_toy(p = 20, n = 100, edge_prob = 0.05, seed = 1) fit <- pfci_fit(sim$X, alpha = 0.05) plot_pag(fit)
Workflow: sim <- simulate_pfci_toy(...) fit <- pfci_fun(sim$X, ...) met <- pfci_metrics(sim, fit)
simulate_pfci_toy( p = NULL, sparsity = NULL, n = 100, edge_prob = 0.02, errDist = c("normal", "t4", "mixt3"), seed = 1L, p_obs = NULL, gamma = 0.1 )simulate_pfci_toy( p = NULL, sparsity = NULL, n = 100, edge_prob = 0.02, errDist = c("normal", "t4", "mixt3"), seed = 1L, p_obs = NULL, gamma = 0.1 )
p |
Number of observed variables (preferred). |
sparsity |
Number of nodes eligible for edges (<= p). Default p. |
n |
Sample size. |
edge_prob |
Edge probability among eligible nodes. |
errDist |
Error distribution for pcalg::rmvDAG ("normal","t4","mixt3"). |
seed |
Random seed. |
p_obs |
(legacy) alias for p. |
gamma |
(legacy) ignored (kept only for backward compatibility). |
This simulator:
generates a topologically ordered DAG (edges only i -> j for i < j)
simulates data via pcalg::rmvDAG with requested errDist
returns truth skeleton (undirected) and an "amat-style" truth from dag2cpdag
NOTE: The returned truth_amat is derived from the CPDAG of the generating DAG (so it contains directed and o-o circle edges, but not latent-induced o-> / <->).
Backward-compat: accepts old args p_obs/gamma (ignored) so old vignettes won't fail.
A list: X, truth (true_dag, adj_mat, skel, amat), meta
sim <- simulate_pfci_toy(p = 30, n = 100, edge_prob = 0.05, seed = 1) str(sim$truth)sim <- simulate_pfci_toy(p = 30, n = 100, edge_prob = 0.05, seed = 1) str(sim$truth)
This follows the exact latent SEM + oracle truth scheme:
Build a DAG over (observed + latent) nodes with:
observed->observed edges only for i<j (acyclic)
latent->observed edges (Poisson out-degree)
Simulate data from linear SEM with chosen error distribution
Construct "truth" by running FCI on the ORACLE correlation of observed nodes using a very large virtual sample size and alpha_truth (oracle-ish), with m.max controlling speed (e.g., m.max = 2)
simulate_with_latent( p_obs = 100, gamma = 0.05, n = 100, edge_prob_obs = 0.02, latent_out_deg = 3, w_sd = 0.8, errDist = c("normal", "t4", "mixt3"), noise_sd = 1, mix = 0.05, seed_graph = 1, seed_data = 2, truth_alpha = 0.9999, truth_mmax = 2, truth_verbose = FALSE )simulate_with_latent( p_obs = 100, gamma = 0.05, n = 100, edge_prob_obs = 0.02, latent_out_deg = 3, w_sd = 0.8, errDist = c("normal", "t4", "mixt3"), noise_sd = 1, mix = 0.05, seed_graph = 1, seed_data = 2, truth_alpha = 0.9999, truth_mmax = 2, truth_verbose = FALSE )
p_obs |
Number of observed variables. |
gamma |
Latent ratio; p_lat = max(1, round(gamma * p_obs)). |
n |
Sample size. |
edge_prob_obs |
Edge probability among observed nodes (i<j only). |
latent_out_deg |
Mean outgoing degree for each latent to observed (Poisson). |
w_sd |
SD of nonzero edge weights. |
errDist |
Error distribution for SEM noise: "normal", "t4", "mixt3". |
noise_sd |
Noise SD multiplier. |
mix |
Mixing proportion for "mixt3" heavy tail component. |
seed_graph |
Seed controlling graph + weights. |
seed_data |
Seed controlling data noise draws. |
truth_alpha |
Alpha for oracle-truth FCI (typical: 0.9999). |
truth_mmax |
Maximum conditioning set size in oracle FCI (speed knob; e.g., 2). |
truth_verbose |
Logical; verbose output from oracle FCI. |
The returned truth is the skeleton implied by the oracle-FCI PAG (not marks).
A list with elements: X, truth (skel + amat), meta, sem (A,W,indices).
sim <- simulate_with_latent(p_obs = 30, gamma = 0.05, n = 100, seed_graph = 1) str(sim$truth)sim <- simulate_with_latent(p_obs = 30, gamma = 0.05, n = 100, seed_graph = 1) str(sim$truth)