The partitioning column does not display in dataiku

boumezrag · ‎01-24-2018

Hi everybody,

I an getting a real problem, when I import a table from hive , the partitioning column does not display in dataiku

please any help ?

fchataigner2 · ‎01-24-2018

Hi,

partitioning columns in Hive are logical columns that expose the path from the table root directory on HDFS to the files containing the data for a given partition value. In DSS, when you retrieve data from the dataset corresponding to the Hive table, you pass a list of values for the partitioning columns, and the data is filtered on these values. You can see the list of the existing values in the partitioning columns in the Status tab, or in the Sampling panel on the left of your Explore tab.

Regards,

View solution in original post

fchataigner2 · ‎01-24-2018

Hi,

partitioning columns in Hive are logical columns that expose the path from the table root directory on HDFS to the files containing the data for a given partition value. In DSS, when you retrieve data from the dataset corresponding to the Hive table, you pass a list of values for the partitioning columns, and the data is filtered on these values. You can see the list of the existing values in the partitioning columns in the Status tab, or in the Sampling panel on the left of your Explore tab.

Regards,

boumezrag · ‎01-24-2018

Thank you for your answer,
To be honest I didn't understand , in the status tab we can see the list of all columns but not the partitioning one.
My question is : I have a table with 13 columns ( including the partitioning column) , I can see only 12 ! how can I do to display the 13 columns .

Thanks in advance.

fchataigner2 · ‎01-24-2018

since you imported a partitioned Hive table as a DSS dataset, you should have a defined partitioning scheme in the dataset's Partitioning tab (under its Settings), with the missing column as dimension.
In the Status tab, you can display as Partition table, and the display will be a table with the partition identifiers as row identifiers. A partition identifier is a '|' separated list of the values of the partitioning columns.

boumezrag · ‎01-25-2018

I attached a screenshot,
When you said "and the display will be a table with the partition identifiers as row identifiers"
is this what I screenshoted ?

boumezrag · ‎01-25-2018

this is my screenshot

fchataigner2 · ‎01-25-2018

the values for the partition column can indeed be seen on the left.

boumezrag · ‎01-25-2018

So there is no way to display this column with the others ??? sorry for asking too many questions

fchataigner2 · ‎01-25-2018

This is not possible at the moment. But:
- you can specify which values of this partition column you want when you browse a dataset or build a dataset
- you can always access the column and its data via Hiveserver2 (ie SQL notebook, or in a Hive recipe when you set the engine in the Advanced tab to Hiveserver2)

boumezrag · ‎01-25-2018

Thank you so much ,
I get it now 😉

Sign up to take part

The partitioning column does not display in dataiku

The partitioning column does not display in dataiku

Labels

Hive

Partitioning