Hive Partitioned Table

ubethke · ‎12-11-2015

I have imported a partitioned Hive table through the command line utility. The partitioning pattern is recognised and I can list the partitions. However the column that the table is partitioned on does not show up in the schema of the dataset. Any tips?

thanks

uli

Clément_Stenac · ‎12-11-2015

Hi Uli, This is actually normal: there are two partitioning models for DSS: "files-based" and "column-based". In files based partitioning, the general rule is that the partitioning dimensions don't appear in the data files.

That's the case here: the data files don't actually contain the partitioning dimension. The DSS schema represents the physical schema of the data, so the partitioning dimensions don't appear there either.

Hive has a fairly hybrid behavior: the partitioning columns are not fully considered as part of the schema, but a virtual column is automatically created.
This behavior is not without problems: for example, you can't do "create table as select *" since that makes the partitioning dimension "appear".

DSS does not do that: the partitioning dimensions don't appear when you explore a dataset. At the moment, unfortunately, it's not possible to know which partition a record belongs to. We're thinking about ways to improve this.

Hive Partitioned Table

Hive Partitioned Table

Labels

Hadoop

Troubleshooting

Sign up to take part

Hive Partitioned Table

Hive Partitioned Table

Labels

Hadoop

Troubleshooting