top of page

Graph Neural Networks in Practice: Molecules, Fake News, and Traffic

Most ML projects treat data as rows (tables) or sequences (time series). But in many real systems the signal lives in relationships: atoms bonded to atoms, users connected by information flow, sensors linked by road topology. This repo is a three-part portfolio showing how the same core idea—message passing on graphs—solves problems across chemistry, social media, and transportation.

The shared core idea: message passing

A GNN repeatedly updates each node by combining information from its neighbors:

  1. Start with node features (atom descriptors, user/content embeddings, sensor measurements)
  2. For each layer, aggregate neighbor messages (mean/sum/attention)
  3. Update the node representation
  4. Produce predictions using a task-specific readout:
    • graph classification: pool node states → one label per graph
    • node regression/forecasting: predict per node (often per timestep)

Across all three projects, nodes/edges/features change, but the algorithmic skeleton stays the same.


Cross-project summary

Project Goal Nodes (x) Edges (edge_index) + attrs Targets (y) Model
1) HIV Molecules Predict whether a molecule is HIV active vs inactive from its SMILES-derived molecular graph (captures chemical topology + bond attributes). Atoms (DeepChem featurizer) Bonds + edge_attr Binary label TransformerConv + periodic TopKPooling
2) Fake News (UPFD) Classify fake vs real by learning diffusion patterns in the propagation graph, combined with the root story’s content via root injection. Root post + users Propagation links Binary label GATConv ×3 + root injection
3) Traffic (METR-LA) Forecast future traffic speed for each sensor using recent history, leveraging spatial dependencies across the road network and temporal dynamics over time windows. Sensors Road proximity/weights Multi-step future speeds per node A3TGCN (torch_geometric_temporal)

Why GNNs matter

GNNs matter because many high-value problems aren’t independent rows of data — they are systems of interacting entities where the signal is in who/what connects to what.

A GNN explicitly models those connections and learns via message passing: each node updates its representation using information from neighbors, so predictions reflect local and multi-hop context (structure + attributes).

This gives you a principled way to:

  • handle variable-size inputs (graphs of different shapes)
  • incorporate edge attributes (bond types, distances, interactions)
  • generalize beyond fixed feature templates

When relationships drive outcomes, GNNs often outperform tabular/sequence-only approaches because they don’t discard topology.


Real world application

GNNs show up wherever influence or constraints propagate through a network:

  • Drug discovery: molecules as atom–bond graphs for property/toxicity prediction
  • Fraud and risk: transaction/account graphs to detect collusive rings
  • Recommendation/search: user–item graphs for ranking and cold-start
  • Social moderation: propagation graphs for misinformation detection
  • Infrastructure: traffic, power grids, logistics as spatio-temporal graphs
  • Security: computer/network graphs for anomaly detection

The practical pattern is consistent: define nodes/edges/features, run message passing, then apply the right readout for the task (graph classification, node classification, link prediction, forecasting).

bottom of page