Depending on the type of image, a simple solution could be using OpenCV and some clever heuristics.
For a deep learning approach, I would start by looking into literature on semantic segmentation. Here is a blog post I just found which gives an intro: [1]
With state-of-the-art models (e.g. DeepLabV3) and a good dataset of foreground/background segmentations, the results could be of useful quality already.
The next step would be to look into literature on image matting (e.g. deep image matting [2]) which instead of trying to classify each pixel as foreground/background, regresses the foreground colour and transparency.
___
[1] https://divamgupta.com/image-segmentation/2019/06/06/deep-le...
[2] https://arxiv.org/abs/1703.03872