I was working on a paper concerning that. Found similar results but I was comparing modulation indexes (P25 versus 5 kHz was a 1 dB difference for example).
I haven't explored the effect of modulation indexes.
TSB-88 uses Delivered Audio Quality (Example DAQ 3.4) to describe what was in the "analog past" called Circuit Merit. NIST and Vendors went to a lot of effort to catalog the different modulations at different signal levels with respect to subjective listening tests. TSB-88 is pretty much the bible, and what ever inaccuracies it may have, it has turned out to be an industry standard, which is very good, because before this document was published, the major radio vendors, consultants, and customers all had a different opinion on how to design and test a land mobile radio system.
This is all worked out from TSB-88 B, version D if I recall, has same table more modulation types.
For 12.5 KHz P25 vs 25 KHz FM, you pick up 3.6 dB advantage of the "inferred noise floor" (1) and 2.3 dB advantage for the DAQ/CPC. If you compare with analog 12.5 KHz FM, P25 is improved another 3 dB.
(1) The "inferred noise floor" is a term Bernie Olsen came up with when I requested clarification of the first draft of TSB-88 where receiver noise figure was the sensitivity metric. LMR radio specifications never publish a noise figure. So Bernie's solution is to use the 12 dB SINAD CPC as a benchmark above the noise floor. So you deduct the SINAD value and that is the noise floor, then add the CPC value for the appropriate modulation.
The receiver values I use are for "guaranteed" specs for common radios. In practice, receiver sensitivity is normally better than spec. But YMMV.
I haven't looked to see if Phase 2 modulation impacts any of this, hopefully not because Phase 2 systems are being built with same RF layout as Phase 1.