Get shared projects using Dataiku API

osk · ‎04-16-2019

Hi there,

I am looking for a way to get the database keys and names that are shared into my project using the Dataiku API.

I tried the following:


project = client.get_project('PROJECT_NAME')
datasets = project.list_datasets()

When using datasets[index_of_database]['params']['table'], then I get the name of a database.

However, the API call does not include databases which are shared into my project.

Background of this is to find dependencies of projects (e.g. if database A is shared into project B, then project A needs to be built first)

I am looking forward to your help.

Best,

Oliver

UserBird · ‎04-16-2019

Hi, this code snippet can help you get the list of shared datasets + their connections.


client = dataiku.api_client()
for project_key in client.list_project_keys():
    print "*** EXPOSED FROM PROJECT %s ***" % (project_key)
    p = client.get_project(project_key)
    for exposed_object in p.get_settings().get_raw()["exposedObjects"]["objects"]:
        connection = p.get_dataset(exposed_object["localName"]).get_definition().get('params').get('connection')
        print "    Object id=%s type=%s db=%s is exposed to projects:" % (exposed_object["localName"], exposed_object["type"], connection)
        for rule in exposed_object["rules"]:
            print "      %s" % rule["targetProject"]

Cheers,

View solution in original post

UserBird · ‎04-16-2019

Hi, this code snippet can help you get the list of shared datasets + their connections.


client = dataiku.api_client()
for project_key in client.list_project_keys():
    print "*** EXPOSED FROM PROJECT %s ***" % (project_key)
    p = client.get_project(project_key)
    for exposed_object in p.get_settings().get_raw()["exposedObjects"]["objects"]:
        connection = p.get_dataset(exposed_object["localName"]).get_definition().get('params').get('connection')
        print "    Object id=%s type=%s db=%s is exposed to projects:" % (exposed_object["localName"], exposed_object["type"], connection)
        for rule in exposed_object["rules"]:
            print "      %s" % rule["targetProject"]

Cheers,

osk · ‎04-16-2019

Thanks a lot, Du. Very helpful!

tomas · ‎04-17-2019

If you want to check if the shared (exported) dataset is used in downstream (i.e. is an input of a recipe in the other project) you can use something like this:


def get_shared_datasets(client, project_key=None, direction='from'):
    # Returns all the shared dataset
    #  1. from a given project (direction = from)
    #   i.e. it returns all the datasets that are exported(shared) from this project
    #   and are used. So for example if DS1 is exported from PRJA to PRJB
    #   it is reported only if in PRJB there is a recipe reading PRJA.DS1.
    #  2. or to a given project (direction = to)
    #   i.e. it returns all the datasets that are imported to this project
    #   and are used. So for example if DS is imported from PRJB to PRJA
    #   it is reported only if in PRJA there is a recipe reading PRJB.DS1
    # project_key can be <str> or <list> of <str>
    # If project_key is None, then returns exported datasets from every project
    # Result is a dict with structure:
    # {u'PROJECT_KEY_A':
    #       {u'dataset_A': [u'CHILD_PROJECT_A'],
    #        u'dataset_B': [u'CHILD_PROJECT_A',u'CHILD_PROJECT_B'],
    #         ... },
    #  u'PROJECT_KEY_B':
    #       { .. }
    # }
    # client = dataiku.api_client()
    projects = []
    if isinstance(project_key, str):
        projects = [project_key]
    if isinstance(project_key, list):
        projects = project_key
    patt = re.compile('\w+\.\w+')
    shared_datasets = {}
    for project in client.list_projects():
        prj = client.get_project(project['projectKey'])
        for r in prj.list_recipes():
            if 'inputs' in r:
                if 'main' in r['inputs']:
                    if 'items' in r['inputs']['main']:
                        for inp in r['inputs']['main']['items']:
                            if patt.match(inp['ref']):
                                proj_ds = inp['ref'].split('.')
                                if project_key is None or (proj_ds[0] in projects and direction == 'from') or\
                                        (project['projectKey'] in projects and direction == 'to'):
                                    if proj_ds[0] not in shared_datasets:
                                        shared_datasets[proj_ds[0]] = {}
                                    if proj_ds[1] not in shared_datasets[proj_ds[0]]:
                                        shared_datasets[proj_ds[0]][proj_ds[1]] = []
                                    if project['projectKey'] not in shared_datasets[proj_ds[0]][proj_ds[1]]:
                                        shared_datasets[proj_ds[0]][proj_ds[1]].append(project['projectKey'])
    return shared_datasets

Get shared projects using Dataiku API

Get shared projects using Dataiku API

Labels

API

Projects

Sign up to take part

Get shared projects using Dataiku API

Get shared projects using Dataiku API

Labels

API

Projects