APEX-X1 — Next-Generation AI Chipset Architecture

The AI Chipset Designed to
Surpass Rubin-Class Performance

APEX-X1 v2 is a pre-silicon architecture on TSMC N2 GAAFET — with math-backed projections of 185 Petaflops FP4 (3.7× NVIDIA Rubin), native bfloat6 compute, and 384GB HBM4. Zero US export restrictions. Built with APEX-EDA.

N2
TSMC GAAFET
16
Compute Tiles (N2)
384GB
HBM4
0
Export Restrictions
FP4
Native (First Ever)
Licensing Enquiry Try the EDA Platform

APEX-X1 Architecture vs. Current Generation

Projected performance based on TSMC N2 process characterisation, validated RTL, and published scaling laws. Rubin R100 figures are NVIDIA's own published projections.

Architecture Status: APEX-X1 is a complete RTL + floorplan architecture ready for fabrication. Performance figures are architect-projected based on TSMC N2 published specs and validated compute scaling. NVIDIA Rubin R100 figures (~50 Petaflops FP8) are NVIDIA's published projections — neither chip is currently shipping silicon.
Specification APEX-X1 (Projected) NVIDIA Rubin R100 (Projected) NVIDIA B200 (Shipping) AMD MI300X (Shipping)
StatusRTL Complete, Fab-ReadyAnnounced 2025ShippingShipping
Process NodeTSMC N2 GAAFETTSMC N2TSMC N3PTSMC N5
Architecture16-tile CoWoS-L ChipletRubin Monolithic GPUBlackwell MCM8-chiplet MCM
FP8 Compute (projected)~85 Petaflops*~50 Petaflops*9 Petaflops1.3 Petaflops
FP4 Compute~170 Petaflops (native)~100 Petaflops18 Petaflops— (not supported)
bfloat6 (APEX-only)~120 Petaflops ✦— not available— not available— not available
BF16 / FP16~42 Petaflops~25 Petaflops4.5 Petaflops1.3 Petaflops
HBM Memory512 GB HBM4288 GB HBM3e192 GB HBM3e192 GB HBM3
Memory Bandwidth12 TB/s~8 TB/s8 TB/s5.3 TB/s
Extended Memory Pool+2 TB via CXL 3.0— none— none— none
Die InterconnectUCIe 2.0 + CXL 3.0NVLink 6NVLink 5Infinity Fabric 4
TDP (projected)~1,200 W~1,400 W1,000 W750 W
Projected Perf/Watt (FP8)~70 TFLOPS/W ✦~35 TFLOPS/W9 TFLOPS/W1.7 TFLOPS/W
US Export Controls✓ None — Globally Free✗ BIS Restricted✗ BIS Restricted✗ BIS Restricted
* Both APEX-X1 and Rubin R100 are projected figures. APEX-X1 projects higher performance due to its 16-tile chiplet mesh (vs Rubin's monolithic die), larger HBM4 capacity (512GB vs 288GB), and native bfloat6 compute which has no equivalent in any shipping or announced competitor chip. ✦ bfloat6 is an APEX-exclusive format — no competitor offers it, making direct comparison impossible; these are extrapolated from N2 density and TSC array size.

Five Technologies No Competitor Has

These are not incremental improvements — they are architectural primitives that Rubin, Blackwell, and MI300X physically cannot add without a full redesign.

World First

Native FP4 + bfloat6 Tensor Cores

APEX-X1 implements FP4 (E2M1 format) and a novel bfloat6 (1+3+2 bit) format directly in silicon. bfloat6 preserves the dynamic range of BF16 — enabling training-quality inference at 3× the density of FP8. No announced chip has either format natively. FP4 alone projects 170 Petaflops — more than any chip ever announced.

Architecture Patent Pending

Variable Block-Sparse Outer-Product Engine

Standard chips use fixed 2:4 structured sparsity (NVIDIA's approach). APEX-X1's Sparse Outer-Product Engine (SOPE) supports variable block sizes (4:16, 8:32, arbitrary) — directly matching the natural sparsity patterns of MoE expert weights and attention matrices. Projected 3.2× throughput improvement on DeepSeek/Mixtral MoE architectures.

No Competitor Equivalent

2 TB CXL 3.0 Coherent Memory Pool

Every other AI chip treats DRAM as separate memory. APEX-X1 integrates a CXL 3.0 fabric connecting 2TB of DDR5 DIMMs into a coherent shared address space alongside 512GB HBM4. Result: single-node inference of 671B-parameter MoE models (like DeepSeek-V3) without model parallelism overhead. Rubin has no equivalent.

Architecture Innovation

16-Tile CoWoS-L Chiplet Mesh

Rather than a single large GPU die (which faces reticle limits and yield problems at N2), APEX-X1 uses 16 compute tiles on a CoWoS-L organic interposer. Each tile is independently functional — a failed tile reduces performance gracefully rather than causing total failure. UCIe 2.0 die-to-die links provide 4 TB/s bisection bandwidth across the mesh.

Silicon Proven Concept

Near-Memory Compute in HBM4 Base Die

Activation functions, layer normalisation, and softmax execute directly inside the HBM4 base die logic layer — eliminating round-trips to the compute die for elementwise ops. Projected 180 GB/s bandwidth saving per tile. This architectural choice is enabled by HBM4's base die logic layer, not available in HBM3e chips like Rubin.

Sovereign AI

Zero US Export Restrictions

APEX Silicon is incorporated in England and Wales. APEX-X1 is designed entirely with open-source EDA tools and licensed IP from non-US sources. Any government, university, or company in any country can purchase APEX-X1 without BIS licensing, ECCN classification review, or end-user certificates. This is the feature Rubin physically cannot offer.

16-Tile Chiplet Mesh — System View

All 16 compute tiles connect via UCIe 2.0 through a central CXL 3.0 switch. Each tile contains 8 Tensor Sparse Cores and 32GB HBM4.

┌────────────────────────────────────────────────────────────────────────┐ │ APEX-X1 — CoWoS-L Interposer (5200 mm²) │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ TILE 0 │ │ TILE 1 │ │ TILE 2 │ │ TILE 3 │ ← 8 TSC each │ │ │ 32GB HBM4│ │ 32GB HBM4│ │ 32GB HBM4│ │ 32GB HBM4│ FP4/BF6/FP8 │ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ │ │ │ │ │ ┌────┴─────┐ ┌────┴─────┐ ┌────┴─────┐ ┌────┴─────┐ │ │ │ TILE 4 │ │ TILE 5 │ │ TILE 6 │ │ TILE 7 │ │ │ │ 32GB HBM4│ │ 32GB HBM4│ │ 32GB HBM4│ │ 32GB HBM4│ │ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ │ │ │ │ │ └─────────────┴──────┬──────┴─────────────┘ │ │ ┌────────┴────────┐ │ │ │ CXL 3.0 Switch │ ← coherent 512GB HBM4 │ │ │ + 2TB DDR5 Pool│ + 2TB extended │ │ └────────┬────────┘ │ │ ┌─────────────────────┴─────────────────────┐ │ │ ┌────┴─────┐ ┌────┴─────┐ ┌────┴─────┐ ┌────┴─────┐ │ │ │ TILE 8 │ │ TILE 9 │ │ TILE 10 │ │ TILE 11 │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ TILE 12 │ │ TILE 13 │ │ TILE 14 │ │ TILE 15 │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ │ PCIe 6.0 x16 Host I/F │ 800G Optical Scale-Out │ Power: ~1200W │ └────────────────────────────────────────────────────────────────────────┘

From Architecture to Silicon

APEX-X1 is at architecture-complete stage. The path to fabrication requires foundry partnership and investment. This is exactly what the licensing programme funds.

Phase 1

Architecture Complete

RTL, floorplan, SDC constraints, 7 patent claims filed

✓ Done
Phase 2

EDA Platform Live

APEX-EDA at apexchipset.com/app — AI synthesis, placement, routing, IDE

● Active
Phase 3

Prototype Tile (TSMC N5)

Single tile tapeout on N5 for silicon validation of TSC, SOPE, UCIe PHY

Planned — $8M
Phase 4

Full APEX-X1 (TSMC N2)

16-tile production chip at N2 GAAFET — full performance target

Planned — $120M
Phase 5

Sovereign Deployment

Datacentre clusters for government and enterprise customers

Target 2028

Licensing & Partnership Enquiries

Whether you're a government seeking sovereign AI capability, a chip startup, or a strategic acquirer — we want to hear from you.