Hi,

in "whole data" mode, some statistics are indeed not available, because they would lead to heavy computations, and we try to limit the statistics to those that can be computed in one or two passes over the data.

The count of distinct values is not approximated, but the median, P25 and P75 values are computed with approximate percentiles. The implementation is then dependent on the database if the dataset is SQL, on Hive or Impala if the dataset is HDFS, and is computed with t-digests using 100 bins.