Mini-batching is indeed tricky, as you need the information of the complete local neighborhood for all nodes in a mini-batch. Let's say you select N nodes for a mini-batch, then you would also have to provide all nodes of the k-th order neighborhood for a neural net with k layers (if you want an exact procedure). I'd recommend to do some subsampling in this case, although this is not trivial. We're currently looking into this. Otherwise full-batch gradient descent tends to work very well on most datasets. Datasets up to ~1 million nodes should fit into memory and the training updates should still be quite fast.