Hi Rui
the approach you propose is indeed the correct one.
For kerberos to work, you need to install the Kerberos client package inside the Docker image, and mount the krb5.conf configuration file
For Hadoop, you will need to install the CDH client packages, and mount the various configuration directories (/etc/{hadoop,hive,spark,...}/conf). Beware that depending on the way your CDH cluster is setup, you may have a number of symlink indirections in there.
Another difficulty related to Spark is that the Spark workers (running on the cluster nodes) need to be able to connect back to the Spark driver (running on the DSS host), which imposes extra constraints on the way the container network is configured.
All in all it is a workable setup, though you need some understanding of the inners of Hadoop to configure it correctly. We have already done it a few times but do not have readily-exportable materials for it. Do not hesitate to come back to us if you get into difficulties
Regards
Patrice Bertin
Dataiku