Graph Neural Networks in Practice: Molecules, Fake News, and Traffic
Most ML projects treat data as rows (tables) or sequences (time series). But in many real systems the signal lives in relationships: atoms bonded to atoms, users connected by information flow, sensors linked by road topology. This repo is a three-part portfolio showing how the same core idea—message passing on graphs—solves problems across chemistry, social media, and transportation.
The shared core idea: message passing
A GNN repeatedly updates each node by combining information from its neighbors:
- Start with node features (atom descriptors, user/content embeddings, sensor measurements)
- For each layer, aggregate neighbor messages (mean/sum/attention)
- Update the node representation
-
Produce predictions using a task-specific readout:
- graph classification: pool node states → one label per graph
- node regression/forecasting: predict per node (often per timestep)
Across all three projects, nodes/edges/features change, but the algorithmic skeleton stays the same.
Cross-project summary
| Project | Goal | Nodes (x) |
Edges (edge_index) + attrs |
Targets (y) |
Model |
|---|---|---|---|---|---|
| 1) HIV Molecules | Predict whether a molecule is HIV active vs inactive from its SMILES-derived molecular graph (captures chemical topology + bond attributes). | Atoms (DeepChem featurizer) | Bonds + edge_attr |
Binary label | TransformerConv + periodic TopKPooling |
| 2) Fake News (UPFD) | Classify fake vs real by learning diffusion patterns in the propagation graph, combined with the root story’s content via root injection. | Root post + users | Propagation links | Binary label | GATConv ×3 + root injection |
| 3) Traffic (METR-LA) | Forecast future traffic speed for each sensor using recent history, leveraging spatial dependencies across the road network and temporal dynamics over time windows. | Sensors | Road proximity/weights | Multi-step future speeds per node | A3TGCN (torch_geometric_temporal) |
Why GNNs matter
GNNs matter because many high-value problems aren’t independent rows of data — they are systems of interacting entities where the signal is in who/what connects to what.
A GNN explicitly models those connections and learns via message passing: each node updates its representation using information from neighbors, so predictions reflect local and multi-hop context (structure + attributes).
This gives you a principled way to:
- handle variable-size inputs (graphs of different shapes)
- incorporate edge attributes (bond types, distances, interactions)
- generalize beyond fixed feature templates
When relationships drive outcomes, GNNs often outperform tabular/sequence-only approaches because they don’t discard topology.
Real world application
GNNs show up wherever influence or constraints propagate through a network:
- Drug discovery: molecules as atom–bond graphs for property/toxicity prediction
- Fraud and risk: transaction/account graphs to detect collusive rings
- Recommendation/search: user–item graphs for ranking and cold-start
- Social moderation: propagation graphs for misinformation detection
- Infrastructure: traffic, power grids, logistics as spatio-temporal graphs
- Security: computer/network graphs for anomaly detection
The practical pattern is consistent: define nodes/edges/features, run message passing, then apply the right readout for the task (graph classification, node classification, link prediction, forecasting).
