Except for the obvious image copy without attribution, it doesn't seem to be. At first blush it looks to be a specific application of the previous paper's technique(s).
There is no cite, they seem to change the wordings of the paper to avoid 1 to 1 copy of text. E.g. they change 'label distribution model' with 'lesion distribution model'. Their chapter 'Learning Image-Text Mapping Model' basically describes the method of the first paper with minor but bad changes and it is not really self contained. They claim they simply forgot to cite. For me, it doesn't look like forgetting a cite. I wanted to hear different opinions.