Layout Comparison

How does BitZoom's layout compare to established graph layout algorithms?
Full report with methodology, all configurations, and limitations.

Why compare?

Graph layout is a solved problem in many contexts. Force-directed algorithms like ForceAtlas2 have decades of refinement. Dimensionality reduction methods like UMAP and t-SNE produce excellent 2D embeddings. BitZoom takes a different approach: it positions nodes by property similarity first, uses topology as a secondary signal, and produces a hierarchical zoom structure from stored coordinates.

The natural question is what you gain and what you give up. This comparison measures both topology preservation (do connected nodes end up nearby?) and property-similarity preservation (do nodes with similar attributes end up nearby?). Topology metrics favor force-directed methods; property metrics evaluate BitZoom's core design goal.

Which method should I use?

Method Best when Tradeoff
BitZoom Property similarity matters more than connectivity. Interactive exploration, zoom levels, millisecond speed. Weaker on sparse chain topologies.
ForceAtlas2 Topology IS the signal. Social networks, connectivity analysis, chain-like graphs. Slower (minutes). No property awareness.
UMAP Dimensionality reduction with interpretable embeddings. High-dimensional feature data. Seconds, not milliseconds. Seed-dependent.
t-SNE Revealing fine-grained local clusters. Best local topology preservation. Requires perplexity tuning. Slow.

What we measured

We use k=10 neighbors as a balance between local structure fidelity and global layout quality.

The short version

On topology metrics, ForceAtlas2 and t-SNE lead. On property-similarity, the answer depends on whether properties correlate with graph connectivity. When they don't (MITRE, Synth Packages), BitZoom with property weights leads by 2.6-6x. When they do (BitZoom Source, where call edges track file structure), topology-based methods capture property structure incidentally and score higher.

DatasetNodesPropsBitZoomPropNbrPTopoNbrPTakeaway
MITRE4,736richα=0 wt2.57x0.37xProperties ≠ topology; BitZoom leads
Synth Pkg1,868richα=0 wt6.02x0.31xProperties ≠ topology; BitZoom leads
BZ Source433richα=0.5 wt gauss0.95x0.28xProperties ≈ topology; near FA2 with gaussian
Email-EU1,005noneα=1.00.87x0.87xEdge-only; topology comparable
Facebook4,039noneα=1.00.79x0.74xDense; smoothing works well
Power Grid4,941noneα=0.750.68x0.01xSparse chains; smoothing limited

Ratios are BitZoom / ForceAtlas2. Higher = better for both NbrP columns.

Dataset by dataset

Email-EU: a dense communication network

1,005 researchers at a European institution, 16.7K emails, 42 departments as ground truth. Edge-only (no node properties beyond auto-generated tokens).

LayoutTimeEdgeLenTopoNbrPPropNbrPSilhouetteNote
BitZoom α=02ms0.4660.0060.007-0.47Near-random layout
BitZoom α=1.01ms0.2210.0560.007-0.29Best BitZoom for topology
ForceAtlas254s0.0080.0640.008-0.40Shortest edges
UMAP11s0.1820.1070.010+0.01Only positive silhouette
t-SNE3s0.1540.1090.009-0.12Highest TopoNbrP

PropNbrP is uniformly low (0.007-0.010) across all methods. Without real node properties, auto-generated tokens provide little differentiation. UMAP and t-SNE recover department structure best on topology metrics.

Facebook: dense ego networks

4,039 users, 88K friendship edges. Dense community structure.

LayoutTimeEdgeLenTopoNbrPPropNbrPNote
BitZoom α=1.06ms0.0630.1100.00374% of FA2's TopoNbrP
ForceAtlas2172s0.0110.1500.003Shortest edges
t-SNE15s0.0710.1760.003Highest TopoNbrP

Dense ego-network structure responds well to topology smoothing. BitZoom at α=1.0 reaches 74% of ForceAtlas2's TopoNbrP. PropNbrP is uniformly low (edge-only dataset).

Power Grid: sparse chains

4,941 substations, 6.6K transmission lines. Average degree 2.7, diameter ~46.

LayoutTimeEdgeLenTopoNbrPPropNbrPNote
BitZoom α=0.758ms0.2710.0030.002Best BitZoom; α=1.0 is worse
ForceAtlas2152s0.0050.1970.002Traces chains via global forces
t-SNE26s0.1780.0410.002Limited by sparse adjacency

ForceAtlas2 dominates. Its global optimization traces long chains (diameter ~46) that 5-pass local smoothing cannot reach. α=0.75 outperforms α=1.0 because pure topology with few passes oversmooths hubs while leaving chains unresolved.

MITRE ATT&CK: the property-similarity test

4,736 nodes (techniques, tactics, software, mitigations) with rich properties: platforms, kill chain phases, aliases. This is the dataset that tests BitZoom's core claim.

LayoutTimeEdgeLenTopoNbrPPropNbrPNote
BitZoom α=010ms0.4610.0010.007No property weights
BitZoom α=0 wt8ms0.5340.0020.034Property weights: best PropNbrP
BitZoom α=0.5 wt8ms0.4820.0020.034Adding topology barely changes PropNbrP
ForceAtlas2178s0.2030.0040.013Shorter edges; lower PropNbrP
t-SNE23s0.2920.0040.026Second-highest PropNbrP

BitZoom with property weights (group=5, platforms=6, killchain=4) scores 2.6x higher than ForceAtlas2 on PropNbrP (0.034 vs 0.013). Without property weights, BitZoom's PropNbrP drops to 0.007 — the weights are what provide the signal. All methods score low on TopoNbrP because graph neighbors are often semantically different node types. t-SNE scores second on PropNbrP (0.026), likely because adjacency partially correlates with property similarity.

Synth Packages: designed group structure

1,868 synthetic packages, 4K co-reference edges. Properties: group, downloads, license, version, depcount. Edges are co-reference links, not property-based.

LayoutTimeEdgeLenTopoNbrPPropNbrPNote
BitZoom α=02ms0.4700.0030.010No property weights
BitZoom α=0 wt2ms0.3870.0060.049Property weights: best PropNbrP (6x FA2)
BitZoom α=0.5 wt3ms0.3060.0110.039Topology trades PropNbrP for TopoNbrP
ForceAtlas2151s0.0300.0180.008Shortest edges; low PropNbrP
t-SNE9s0.2180.0010.012Low on both metrics

BitZoom with weights scores 6x higher than ForceAtlas2 on PropNbrP (0.049 vs 0.008). Graph connectivity and property similarity diverge: edges are co-reference links, not property-based. Increasing α from 0 to 0.5 trades PropNbrP for TopoNbrP, directly demonstrating the α tradeoff. Gaussian quantization shows negligible difference here (±2%), suggesting the post-blend distribution is approximately uniform.

BitZoom Source: when topology tracks properties

433 functions/methods/classes from this project's source code, 940 call edges. Properties: kind, file, lines, bytes, age.

LayoutTimeEdgeLenTopoNbrPPropNbrPNote
BitZoom α=0 wt1ms0.4200.0190.172Property weights, rank quant
BitZoom α=0.5 wt2ms0.2670.0320.179Rank quant; topology helps
BitZoom α=0.5 wt gauss1ms0.3010.0350.198Gaussian quant: +11% PropNbrP
ForceAtlas212s0.0310.1270.208High PropNbrP via adjacency
UMAP13s0.1150.0820.244Highest PropNbrP
t-SNE3s0.2780.0480.204High PropNbrP via adjacency

Topology-based methods score higher on PropNbrP than BitZoom with rank quantization (0.20-0.24 vs 0.17-0.18). Functions that call each other tend to share file and kind, so adjacency captures property structure incidentally. Switching to Gaussian quantization improves BitZoom's PropNbrP by 11% (0.179 → 0.198), reaching 95% of ForceAtlas2. Gaussian quantization preserves density structure rather than forcing uniform occupancy, which helps when similar nodes form tight clusters.

What each method is good at

Aspect ForceAtlas2 UMAP / t-SNE BitZoom
Edge lengthBest (optimizes for this)ModerateImproves with α
Topology preservationStrong; global forcesBest overall (t-SNE)Comparable on dense graphs
Property groupingIncidental; depends on adjacencyModerate (via adjacency)Best when props ≠ topology
Sparse / chain graphsStrong (global forces)LimitedLimited (local smoothing)
SpeedMinutes (O(n log n)/iter)SecondsMilliseconds (O(n))
Hierarchical zoomNoNo14 levels from 4 bytes/node
DeterminismSeed-dependentSeed-dependentFully deterministic

Rank vs Gaussian quantization

BitZoom supports two quantization modes. Rank quantization sorts nodes by position and assigns grid cells uniformly. Gaussian quantization uses fixed CDF boundaries, preserving density structure: tight clusters stay tight, sparse regions stay spread out.

DatasetConfigRank PropNbrPGauss PropNbrPChange
BZ Sourceα=0 wt0.1720.179+4%
BZ Sourceα=0.5 wt0.1790.198+11%; closes gap to FA2 (0.208)
Synth Pkgα=0 wt0.0490.050+1%
Synth Pkgα=0.5 wt0.0390.038-2%

Gaussian quantization helps when the post-blend distribution has meaningful density variation (BZ Source: functions cluster by file). When the distribution is approximately uniform (Synth Packages), the two modes produce similar results. The effect is dataset-dependent, but Gaussian quantization never hurts substantially and can close the gap to topology-based methods by preserving cluster tightness.

Caveats

PropNbrP uses the same similarity BitZoom optimizes. Token-set Jaccard is both the ground-truth similarity and the signal BitZoom's MinHash approximates. This is partially circular. A fully independent property-similarity metric (e.g., domain-expert labels) would be stronger evidence.

Edge-only datasets show no property differentiation. Three of six datasets have no real node properties. PropNbrP is uniformly low across all methods on these graphs.

When adjacency correlates with properties, topology-based methods win PropNbrP too. On BitZoom Source, ForceAtlas2 and UMAP outscore BitZoom on PropNbrP because call-graph edges track file/kind similarity. BitZoom's advantage is specific to datasets where property structure differs from graph connectivity.

Hierarchical zoom is not measured. BitZoom derives 14 aggregation levels from 4 stored bytes per node. No other method produces a zoom hierarchy. This capability is not captured by any metric above.