Graph Neural Networks in Practice: Molecules, Fake News, and Traffic

Most ML projects treat data as rows (tables) or sequences (time series). But in many real systems the signal lives in relationships: atoms bonded to atoms, users connected by information flow, sensors linked by road topology. This repo is a three-part portfolio showing how the same core idea—message passing on graphs—solves problems across chemistry, social media, and transportation.

Github

The shared core idea: message passing

A GNN repeatedly updates each node by combining information from its neighbors:

Start with node features (atom descriptors, user/content embeddings, sensor measurements)
For each layer, aggregate neighbor messages (mean/sum/attention)
Update the node representation
Produce predictions using a task-specific readout:
- graph classification: pool node states → one label per graph
- node regression/forecasting: predict per node (often per timestep)

Across all three projects, nodes/edges/features change, but the algorithmic skeleton stays the same.

Cross-project summary

Project	Goal	Nodes (`x`)	Edges (`edge_index`) + attrs	Targets (`y`)	Model
1) HIV Molecules	Predict whether a molecule is HIV active vs inactive from its SMILES-derived molecular graph (captures chemical topology + bond attributes).	Atoms (DeepChem featurizer)	Bonds + `edge_attr`	Binary label	TransformerConv + periodic TopKPooling
2) Fake News (UPFD)	Classify fake vs real by learning diffusion patterns in the propagation graph, combined with the root story’s content via root injection.	Root post + users	Propagation links	Binary label	GATConv ×3 + root injection
3) Traffic (METR-LA)	Forecast future traffic speed for each sensor using recent history, leveraging spatial dependencies across the road network and temporal dynamics over time windows.	Sensors	Road proximity/weights	Multi-step future speeds per node	A3TGCN (`torch_geometric_temporal`)

Why GNNs matter

GNNs matter because many high-value problems aren’t independent rows of data — they are systems of interacting entities where the signal is in who/what connects to what.

A GNN explicitly models those connections and learns via message passing: each node updates its representation using information from neighbors, so predictions reflect local and multi-hop context (structure + attributes).

This gives you a principled way to:

handle variable-size inputs (graphs of different shapes)
incorporate edge attributes (bond types, distances, interactions)
generalize beyond fixed feature templates

When relationships drive outcomes, GNNs often outperform tabular/sequence-only approaches because they don’t discard topology.

Real world application

GNNs show up wherever influence or constraints propagate through a network:

Drug discovery: molecules as atom–bond graphs for property/toxicity prediction
Fraud and risk: transaction/account graphs to detect collusive rings
Recommendation/search: user–item graphs for ranking and cold-start
Social moderation: propagation graphs for misinformation detection
Infrastructure: traffic, power grids, logistics as spatio-temporal graphs
Security: computer/network graphs for anomaly detection

The practical pattern is consistent: define nodes/edges/features, run message passing, then apply the right readout for the task (graph classification, node classification, link prediction, forecasting).

Leo ooooo

Graph Neural Networks in Practice: Molecules, Fake News, and Traffic

Leo ooooo