Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi again 🙂
I'm trying to play with geozip processor and with elasticsearch / Kibana. My sample contains two columns "zipcode,country" and the geozip processor generates me a third column with a Geopoint value (that processor is perfect!) . I also added an other string column which is the concatenation of latitude and longitude, separed by a comma (this is a valid Geopoint format in ES). Example :
42800;POINT (4.6657 45.5598);4.6657,45.5598;France;
69480;POINT (4.7028 45.9126);4.7028,45.9126;France;
I syncronise those datas to elasticsearch. Unfortunately, the automatically inferred schema is not detecting geoPoint types:
{
"test" : {
"mappings" : {
"test" : {
"properties" : {
"Country" : {
"type" : "string",
"fields" : {
"Country_facet" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"PostCodeZip" : {
"type" : "long",
"store" : true
},
"geo" : {
"type" : "string",
"fields" : {
"geo_facet" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"geopoint" : {
"type" : "string",
"fields" : {
"geopoint_facet" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}
}
}
}
}
So I tried to delete the ES index and apply my own schema with correct fields types, through a mapping :
curl -XPUT ../test/test/_mapping -d '{
"test" : {
"mappings" : {
"test" : {
"properties" : {
"Country" : {
"type" : "string",
"fields" : {
"Country_facet" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"PostCodeZip" : {
"type" : "long",
"store" : true
},
"geo" : {
"type" : "string",
"fields" : {
"geo_facet" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"geopoint" : {
"type" : "geo_point"
}
}
}
}
}
}
}
}
'
This mapping is erased by DSS before uploading data and a new schema (with wrong types) is auto inferred.
I found the DSS configuration file for this "sync" module and I edit it to change type from "string" to "geo_point" (valid ES type), just like this :
File $DSS_FOLDER/projects/<project_name>/datasets/<sync_name>.json
.....
{
"name": "geopoint",
"type": "geo_point",
"maxLength": -1
},
.....
It generates the following error :
[12:12:09] [ERROR] [dku.flow.jobrunner] running sync_test_NP - Activity unexpectedly failed
java.lang.IllegalArgumentException: in running sync_test_NP: Type not found: geo_point
at com.dataiku.dip.utils.ErrorContext.iae(ErrorContext.java:82)
at com.dataiku.dip.datasets.Type.forName(Type.java:97)
at com.dataiku.dip.coremodel.SchemaColumn.getType(SchemaColumn.java:82)
at com.dataiku.dip.datasets.elasticsearch.ElasticSearchUtils.getElasticSearchType(ElasticSearchUtils.java:55)
at com.dataiku.dip.datasets.elasticsearch.ElasticSearchUtils.getMappingDefinition(ElasticSearchUtils.java:125)
at com.dataiku.dip.datasets.elasticsearch.ElasticSearchOutput$ElasticSearchOutputWriter.init(ElasticSearchOutput.java:147)
at com.dataiku.dip.dataflow.exec.stream.ToDatasetStreamSplitRunner.init(ToDatasetStreamSplitRunner.java:55)
at com.dataiku.dip.dataflow.exec.sync.FSToAny.init(FSToAny.java:67)
at com.dataiku.dip.dataflow.exec.SyncRecipeRunner.init(SyncRecipeRunner.java:110)
at com.dataiku.dip.dataflow.jobrunner.ExecutionRunnablesBuilder.getRunnables(ExecutionRunnablesBuilder.java:49)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner.runActivity(ActivityRunner.java:383)
at com.dataiku.dip.dataflow.jobrunner.JobRunner.runActivity(JobRunner.java:102)
at com.dataiku.dip.dataflow.jobrunner.JobRunner.access$700(JobRunner.java:27)
at com.dataiku.dip.dataflow.jobrunner.JobRunner$ActivityExecutorThread.run(JobRunner.java:263)
any ideas ?