TMCRA TokenGraph-LLM Stage C
TMCRA TokenGraph-LLM is an experimental graph-native autoregressive language model prototype. It is not a Transformer wrapper and does not call an external LLM at inference time. Text is generated from token-level graph encoding, learned edge gates, graph message passing, and a dynamic graph causal decoder.
This Hugging Face repository hosts model artifacts. Full source code, training scripts, graph builders, and documentation are published in the GitHub repository:
https://github.com/reshuibuduo/TMCRA-TokenGraph-LLM
Current Model
Current default checkpoint:
- release line:
v0.2.0-stagec - package:
tmcra_tokengraph_stagec_model_package_20260606.zip - checkpoint inside package:
checkpoint/token_graph_dynamic_decoder_v3.pt - parameters:
114,615,372 - shape:
dim=512,graph_layers=8,decoder_layers=10 - embeddings: untied
- precision during training:
bf16 - effective training samples: about
1.03M - training steps:
62,000 - SHA256:
cc23285628eaed47c20009b6be6b5eb0600ded57ac2e09519370d97158fecd33
Legacy v0.1 files may still be present in this repository for historical comparison. The Stage C package is the current recommended artifact.
Package Contents
The Stage C zip contains:
checkpoint/token_graph_dynamic_decoder_v3.pt
dataset_metadata/tokenizer.json
dataset_metadata/manifest.json
training_summary_stagec_public.json
docs/TMCRA_TOKENGRAPH_STAGEC_TECHNICAL_OVERVIEW.md
docs/TMCRA_TOKENGRAPH_STAGEC_TECHNICAL_OVERVIEW_ZH.md
docs/STAGEC_DETAILED_BENCHMARK_SMOKE_20260606.md
MODEL_CARD.md
PACKAGE_MANIFEST.md
SHA256SUMS.txt
Full-Chain Training Code
The GitHub source repository now includes the full-chain Stage C training path:
- open-corpus schema2 conversion scripts;
- optional semantic teacher annotation through OpenAI-compatible or local Hugging Face models;
- token-level reasoning graph builders;
simple_plus_causal_targetgraph mode;- Stage C training and checkpoint continuation;
- graph ablation and token attribution evaluation.
Start from:
docs/FULL_CHAIN_TRAINING.md
docs/FULL_CHAIN_TRAINING_ZH.md
scripts/run_stagec_full_chain_template.sh
scripts/run_stagec_sharded_training_template.sh
How Next-Token Generation Works
Stage C predicts the next token through a graph-native causal path:
flowchart TD
A["Text / prompt / source segments / target text"] --> B["Tokenizer"]
B --> C["Token Graph Builder"]
C --> D["Token nodes"]
C --> E["Typed candidate edges"]
D --> G["TokenGraphEncoderV3"]
E --> G
G --> H["Encoded context graph states"]
H --> I["Dynamic Token Graph Decoder"]
I --> J["Generated token node"]
J --> I
I --> K["Next-token distribution"]
schema2 text
-> token graph nodes and typed candidate edges
-> learned edge-gated graph propagation
-> dynamic generated-token graph nodes
-> prefix-edge + context-edge gated decoding
-> vocabulary logits
The graph builder proposes token nodes and typed candidate edges. The model then learns edge gates, propagates messages through the token graph, scores context nodes, and decodes each generated token as a dynamic graph node. The decoder combines a learned prefix message from previous generated-token nodes with a learned context message from encoded graph nodes, then maps the updated graph-decoder state to next-token logits. This keeps the main objective as next-token prediction while making generation depend on typed graph structure rather than Transformer self-attention.
Single-step decoding:
flowchart LR
A["Encoded context graph<br/>N nodes"] --> D["Context gate"]
B["Generated prefix nodes<br/>window W"] --> C["Prefix gate"]
C --> E["Generated-token node t"]
D --> E
E --> F["Graph decoder update"]
F --> G["Vocabulary logits"]
G --> H["next token"]
H --> I["Append as graph node"]
Complexity Growth
Dense Transformer self-attention grows roughly as:
O(n^2 * d)
Stage C replaces sequence-wide all-token attention with graph candidate edges and dynamic graph decoding:
Graph encoder: O(L_g * (N + E) * d)
Dynamic prefix path: O(L_d * T * W * d)
Context tunnel path: O(L_d * T * N * d)
where N is context graph nodes, E is candidate edges, T is generated length, and W is the bounded generated-prefix window. The current context tunnel still scans encoded context nodes; the accurate claim is not constant-time generation, but replacing dense sequence-wide self-attention with sparse typed graph propagation plus explicit context tunneling.
Current Capability Boundary
Stage C can generate early English story-style continuations and shows measurable dependence on typed graph edges. It is not a reliable production LLM. Current weak areas include exact factual QA, numeric reasoning, robust instruction following, grammar stability, multilingual generation, and long-range concept binding.
Smoke results:
| evaluation | result |
|---|---|
| Stage C normal total loss | 6.512117 |
| Stage C normal LM loss | 4.641285 |
| no_edges total loss | 8.310654 |
| shuffle_edges total loss | 7.702783 |
| TinyStories avg words | 73.88 |
| BLiMP smoke | 59%-64% |
These are smoke tests, not leaderboard claims.
License
MIT.