Data might not fit in GPU-memory (including activations and gradients), for which one uses mini-batches, and it might not fit in RAM, for which one uses
fit_generator. Or at least, the latter is my hypothesis I would like to validate here.
Is it true that Keras applies a producer-consumer strategy to first load the yielded elements of the generator into RAM until (or while not)
queue_size is filled, and then keeps on filling it whenever batches are popped to train the network? The documentation mentions that this is useful to use the CPU for data augmentation and the GPU for training. Is the use case where this producer-consumer parallelism is used to load the data from disk, into RAM because it doesn't fit in RAM at once, also valid? My data has 100k CT-scans, which obviously do not fit in RAM.
fit_generator only to be used to parallelize data pre-processing and training, or can it also be (sensibly) used to parallelize data loading (to RAM) and training? Or would the latter be like using a hammer to get a screw in the wall?