[12:44:41] [INFO] [dku] running compute_knowledge_bank_1_NP - ---------------------------------------- [12:44:41] [INFO] [dku] running compute_knowledge_bank_1_NP - DSS startup: jek version:12.6.0 [12:44:41] [INFO] [dku] running compute_knowledge_bank_1_NP - DSS home: C:\Users\admin\AppData\Local\Dataiku\DataScienceStudio\dss_home [12:44:41] [INFO] [dku] running compute_knowledge_bank_1_NP - OS: Windows 10 10.0 amd64 - Java: Temurin 1.8.0_322 [12:44:41] [INFO] [dku.flow.jobrunner] running compute_knowledge_bank_1_NP - Allocated a slot for this activity! [12:44:41] [INFO] [dku.flow.jobrunner] running compute_knowledge_bank_1_NP - Run activity [12:44:41] [INFO] [dku.flow.activity] running compute_knowledge_bank_1_NP - Executing default pre-activity lifecycle hook [12:44:41] [INFO] [dku.flow.activity] running compute_knowledge_bank_1_NP - Checking if sources are ready [12:44:41] [INFO] [dku.flow.activity] running compute_knowledge_bank_1_NP - Will check readiness of TUT_LLM_QAWITHRAGAPPROACH.dataiku_knowledge_base p=NP [12:44:41] [INFO] [dku.datasets.file] running compute_knowledge_bank_1_NP - Building Filesystem handler config: {"path":"C:\\Users\\admin\\AppData\\Local\\Dataiku\\DataScienceStudio\\dss_home\\uploads\\TUT_LLM_QAWITHRAGAPPROACH\\datasets\\dataiku_knowledge_base","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}} [12:44:41] [DEBUG] [dku.datasets.fsbased] running compute_knowledge_bank_1_NP - getReadiness: will enumerate partition [12:44:41] [INFO] [dku.datasets.ftplike] running compute_knowledge_bank_1_NP - Enumerating Filesystem dataset prefix= [12:44:41] [DEBUG] [dku.datasets.fsbased] running compute_knowledge_bank_1_NP - Building FS provider for dataset handler: TUT_LLM_QAWITHRAGAPPROACH.dataiku_knowledge_base [12:44:41] [DEBUG] [dku.datasets.fsbased] running compute_knowledge_bank_1_NP - FS Provider built [12:44:41] [DEBUG] [dku.fs.local] running compute_knowledge_bank_1_NP - Enumerating local filesystem prefix=/ [12:44:41] [DEBUG] [dku.fs.local] running compute_knowledge_bank_1_NP - Enumeration done nb_paths=1 size=4220603 [12:44:41] [DEBUG] [dku.datasets.fsbased] running compute_knowledge_bank_1_NP - getReadiness: enumerated partition, found 1 paths, computing hash [12:44:41] [INFO] [dku.flow.activity] running compute_knowledge_bank_1_NP - Checked source readiness TUT_LLM_QAWITHRAGAPPROACH.dataiku_knowledge_base -> true [12:44:41] [DEBUG] [dku.flow.activity] running compute_knowledge_bank_1_NP - Computing hashes to propagate BEFORE activity [12:44:41] [DEBUG] [dku.flow.activity] running compute_knowledge_bank_1_NP - Recorded 1 hashes before activity run [12:44:41] [DEBUG] [dku.flow.activity] running compute_knowledge_bank_1_NP - Building recipe runner of type [12:44:41] [DEBUG] [dku.flow.activity] running compute_knowledge_bank_1_NP - Recipe runner built, will use 1 thread(s) [12:44:41] [DEBUG] [dku.flow.activity] running compute_knowledge_bank_1_NP - Preparing execution thread: com.dataiku.dip.recipes.nlp.rag_embedding.RAGEmbeddingRecipeRunner@18b9f6a3 [12:44:41] [DEBUG] [dku.flow.activity] running compute_knowledge_bank_1_NP - Starting execution thread: Thread[Thread-18,5,main] [12:44:41] [DEBUG] [dku.flow.activity] running compute_knowledge_bank_1_NP - Execution threads started, waiting for activity end [12:44:41] [INFO] [dku.flow.activity] - Run thread for activity compute_knowledge_bank_1_NP starting [12:44:41] [INFO] [dku.recipes.nlp.rag_embedding] - RAG Embededing recipe runner started [12:44:41] [INFO] [dku.venv.selector] - Select code env lang=PYTHON projectSelection={"mode":"INHERIT","preventOverride":false} globalDefault=null [12:44:41] [INFO] [dku.recipes.nlp.rag_embedding] - Run embedding in code env testrag [12:44:41] [INFO] [dku.ml.distributed.service] - New worker pool created: pool-320ycnumgrxxhepi [12:44:41] [INFO] [dku.datasets.file] - Building Filesystem handler config: {"path":"C:\\Users\\admin\\AppData\\Local\\Dataiku\\DataScienceStudio\\dss_home\\uploads\\TUT_LLM_QAWITHRAGAPPROACH\\datasets\\dataiku_knowledge_base","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}} [12:44:41] [DEBUG] [dku.datasets.fsbased] - Building FS provider for dataset handler: TUT_LLM_QAWITHRAGAPPROACH.dataiku_knowledge_base [12:44:41] [DEBUG] [dku.datasets.fsbased] - FS Provider built [12:44:41] [INFO] [dku.datasets.file] - Building Filesystem handler config: {"path":"C:\\Users\\admin\\AppData\\Local\\Dataiku\\DataScienceStudio\\dss_home\\uploads\\TUT_LLM_QAWITHRAGAPPROACH\\datasets\\dataiku_knowledge_base","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}} [12:44:41] [DEBUG] [dku.datasets.fsbased] - Building FS provider for dataset handler: TUT_LLM_QAWITHRAGAPPROACH.dataiku_knowledge_base [12:44:41] [DEBUG] [dku.datasets.fsbased] - FS Provider built [12:44:41] [INFO] [dku.code.projectLibs] - EXTERNAL LIBS FROM TUT_LLM_QAWITHRAGAPPROACH is {"gitReferences":{},"pythonPath":["python"],"rsrcPath":["R"],"importLibrariesFromProjects":[]} [12:44:41] [DEBUG] [dku.code.projectLibs] - Impersonation or zipped config are enabled, copying project TUT_LLM_QAWITHRAGAPPROACH lib chunk /projects/TUT_LLM_QAWITHRAGAPPROACH/lib/python to C:\Users\admin\AppData\Local\Dataiku\DataScienceStudio\dss_home\jobs\TUT_LLM_QAWITHRAGAPPROACH\Build_knowledge_bank__NP__2024-04-22T19-44-40.767\compute_knowledge_bank_1_NP\rag-embedding-recipe\pyrunosiNIadfnal7\project-python-libs\TUT_LLM_QAWITHRAGAPPROACH\python [12:44:41] [INFO] [dku.code.projectLibs] - chunkFolder is /projects/TUT_LLM_QAWITHRAGAPPROACH/lib/R [12:44:41] [DEBUG] [dku.code.projectLibs] - Impersonation or zipped config are enabled, copying project lib chunk /projects/TUT_LLM_QAWITHRAGAPPROACH/lib/R to C:\Users\admin\AppData\Local\Dataiku\DataScienceStudio\dss_home\jobs\TUT_LLM_QAWITHRAGAPPROACH\Build_knowledge_bank__NP__2024-04-22T19-44-40.767\compute_knowledge_bank_1_NP\rag-embedding-recipe\pyrunosiNIadfnal7\project-r-src\TUT_LLM_QAWITHRAGAPPROACH\R [12:44:41] [INFO] [dku.recipes.code.base] - Writing dku-exec-env for local execution in C:\Users\admin\AppData\Local\Dataiku\DataScienceStudio\dss_home\jobs\TUT_LLM_QAWITHRAGAPPROACH\Build_knowledge_bank__NP__2024-04-22T19-44-40.767\compute_knowledge_bank_1_NP\rag-embedding-recipe\pyrunosiNIadfnal7\remote-run-env-def.json [12:44:42] [INFO] [dku.code.envs.resolution] - Executing Python activity in env: testrag [12:44:42] [INFO] [dku.flow.abstract.python] - Execute activity command: ["C:\\Users\\admin\\AppData\\Local\\Dataiku\\DataScienceStudio\\dss_home\\code-envs\\python\\testrag\\Scripts\\python.exe","-u","-m","dataiku.llm.rag.rag_embedding_recipe","C:\\Users\\admin\\AppData\\Local\\Dataiku\\DataScienceStudio\\dss_home\\knowledge-banks\\TUT_LLM_QAWITHRAGAPPROACH\\GWIv7Hem","TUT_LLM_QAWITHRAGAPPROACH.dataiku_knowledge_base"] [12:44:42] [INFO] [dku.flow.abstract.python] - Attached worker pool to com.dataiku.dip.recipes.nlp.rag_embedding.RAGEmbeddingRecipeRunner$1 recipe runner: pool-320ycnumgrxxhepi [12:44:42] [INFO] [dku.security.process] - Starting process (regular) [12:44:42] [INFO] [dku.security.process] - Process started with pid=13532 [12:44:42] [DEBUG] [dku.resourceusage] - Reporting start of CRU:{"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"TUT_LLM_QAWITHRAGAPPROACH","jobId":"Build_knowledge_bank__NP__2024-04-22T19-44-40.767","activityId":"compute_knowledge_bank_1_NP","activityType":"recipe","recipeType":"nlp_llm_rag_embedding","recipeName":"compute_knowledge_bank_1"},"type":"LOCAL_PROCESS","id":"rWzeIAr38DXNDJon","startTime":1713815082387,"localProcess":{"cpuCurrent":0.0,"cpuAverageOverPast60Seconds":0.0}} [12:44:42] [DEBUG] [dku.resource] - Process stats for pid 13532: {"pid":13532,"commandName":"C:\\Users\\admin\\AppData\\Local\\Dataiku\\DataScienceStudio\\dss_home\\code-envs\\python\\testrag\\Scripts\\python.exe","cpuCurrent":0.0,"cpuAverageOverPast60Seconds":0.0,"vmRSSTotalMBS":0} [12:44:50] [INFO] [dku.utils] - C:\Users\admin\AppData\Local\Dataiku\DataScienceStudio\dss_home\code-envs\python\testrag\lib\site-packages\langchain\vectorstores\__init__.py:35: LangChainDeprecationWarning: Importing vector stores from langchain is deprecated. Importing from langchain will no longer be supported as of langchain==0.2.0. Please import from langchain-community instead: [12:44:50] [INFO] [dku.utils] - `from langchain_community.vectorstores import FAISS`. [12:44:50] [INFO] [dku.utils] - To install langchain-community run `pip install -U langchain-community`. [12:44:50] [INFO] [dku.utils] - warnings.warn( [12:44:50] [INFO] [dku.utils] - C:\Users\admin\AppData\Local\Dataiku\DataScienceStudio\dss_home\code-envs\python\testrag\lib\site-packages\langchain\vectorstores\__init__.py:35: LangChainDeprecationWarning: Importing vector stores from langchain is deprecated. Importing from langchain will no longer be supported as of langchain==0.2.0. Please import from langchain-community instead: [12:44:50] [INFO] [dku.utils] - `from langchain_community.vectorstores import Pinecone`. [12:44:50] [INFO] [dku.utils] - To install langchain-community run `pip install -U langchain-community`. [12:44:50] [INFO] [dku.utils] - warnings.warn( [12:44:50] [INFO] [dku.utils] - C:\Users\admin\AppData\Local\Dataiku\DataScienceStudio\dss_home\code-envs\python\testrag\lib\site-packages\langchain\vectorstores\__init__.py:35: LangChainDeprecationWarning: Importing vector stores from langchain is deprecated. Importing from langchain will no longer be supported as of langchain==0.2.0. Please import from langchain-community instead: [12:44:50] [INFO] [dku.utils] - `from langchain_community.vectorstores import Chroma`. [12:44:50] [INFO] [dku.utils] - To install langchain-community run `pip install -U langchain-community`. [12:44:50] [INFO] [dku.utils] - warnings.warn( [12:44:52] [INFO] [dku.utils] - 2024-04-22 12:44:52,083 INFO Loading dataset records [12:44:53] [INFO] [dku.utils] - 2024-04-22 12:44:53,552 INFO Loaded 944 records from datasets [12:44:53] [INFO] [dku.utils] - 2024-04-22 12:44:53,554 INFO Performing splitting [12:44:54] [INFO] [dku.utils] - 2024-04-22 12:44:54,254 INFO After splitting, have 10314 records to embed [12:44:54] [INFO] [dku.utils] - 2024-04-22 12:44:54,256 INFO Performing embedding and indexing [12:44:54] [INFO] [dku.utils] - 2024-04-22 12:44:54,265 INFO Performing embedding of 10314 texts [12:44:58] [INFO] [dku.utils] - *************** Recipe code failed ************** [12:44:58] [INFO] [dku.utils] - Begin Python stack [12:44:58] [INFO] [dku.utils] - Traceback (most recent call last): [12:44:58] [INFO] [dku.utils] - File "C:\Users\admin\AppData\Local\Dataiku\DataScienceStudio\kits\dataiku-dss-12.6.0-win\python\dataikuapi\dssclient.py", line 1465, in _perform_http [12:44:58] [INFO] [dku.utils] - http_res.raise_for_status() [12:44:58] [INFO] [dku.utils] - File "C:\Users\admin\AppData\Local\Dataiku\DataScienceStudio\dss_home\code-envs\python\testrag\lib\site-packages\requests\models.py", line 1021, in raise_for_status [12:44:58] [INFO] [dku.utils] - raise HTTPError(http_error_msg, response=self) [12:44:58] [INFO] [dku.utils] - requests.exceptions.HTTPError: 500 Server Error: Server Error for url: http://127.0.0.1:11201/dip/publicapi/projects/TUT_LLM_QAWITHRAGAPPROACH/llms/embeddings [12:44:58] [INFO] [dku.utils] - During handling of the above exception, another exception occurred: [12:44:58] [INFO] [dku.utils] - Traceback (most recent call last): [12:44:58] [INFO] [dku.utils] - File "C:\Users\admin\AppData\Local\Dataiku\DataScienceStudio\kits\dataiku-dss-12.6.0-win\python\dataiku\llm\rag\rag_embedding_recipe.py", line 111, in [12:44:58] [INFO] [dku.utils] - main(run_folder, input_dataset_name) [12:44:58] [INFO] [dku.utils] - File "C:\Users\admin\AppData\Local\Dataiku\DataScienceStudio\kits\dataiku-dss-12.6.0-win\python\dataiku\llm\rag\rag_embedding_recipe.py", line 53, in main [12:44:58] [INFO] [dku.utils] - vectorstore = FAISS.from_documents(documents, embeddings) [12:44:58] [INFO] [dku.utils] - File "C:\Users\admin\AppData\Local\Dataiku\DataScienceStudio\dss_home\code-envs\python\testrag\lib\site-packages\langchain_core\vectorstores.py", line 528, in from_documents [12:44:58] [INFO] [dku.utils] - return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs) [12:44:58] [INFO] [dku.utils] - File "C:\Users\admin\AppData\Local\Dataiku\DataScienceStudio\dss_home\code-envs\python\testrag\lib\site-packages\langchain_community\vectorstores\faiss.py", line 965, in from_texts [12:44:58] [INFO] [dku.utils] - embeddings = embedding.embed_documents(texts) [12:44:58] [INFO] [dku.utils] - File "C:\Users\admin\AppData\Local\Dataiku\DataScienceStudio\kits\dataiku-dss-12.6.0-win\python\dataiku\langchain\dku_embeddings.py", line 85, in embed_documents [12:44:58] [INFO] [dku.utils] - resp = query.execute() [12:44:58] [INFO] [dku.utils] - File "C:\Users\admin\AppData\Local\Dataiku\DataScienceStudio\kits\dataiku-dss-12.6.0-win\python\dataikuapi\dss\llm.py", line 129, in execute [12:44:58] [INFO] [dku.utils] - ret = self.llm.client._perform_json("POST", "/projects/%s/llms/embeddings" % (self.llm.project_key), body=self.eq) [12:44:58] [INFO] [dku.utils] - File "C:\Users\admin\AppData\Local\Dataiku\DataScienceStudio\kits\dataiku-dss-12.6.0-win\python\dataikuapi\dssclient.py", line 1481, in _perform_json [12:44:58] [INFO] [dku.utils] - return self._perform_http(method, path, params=params, body=body, files=files, stream=False, raw_body=raw_body).json() [12:44:58] [INFO] [dku.utils] - File "C:\Users\admin\AppData\Local\Dataiku\DataScienceStudio\kits\dataiku-dss-12.6.0-win\python\dataikuapi\dssclient.py", line 1472, in _perform_http [12:44:58] [INFO] [dku.utils] - raise DataikuException("%s: %s" % (ex.get("errorType", "Unknown error"), ex.get("detailedMessage", ex.get("message", "No message")))) [12:44:58] [INFO] [dku.utils] - dataikuapi.utils.DataikuException: com.dataiku.dip.io.SocketBlockLink$SecretKernelTimeoutException: Subprocess failed to connect, it probably crashed at startup. Check the logs., caused by: SocketException: Socket operation on nonsocket: configureBlocking [12:44:58] [INFO] [dku.utils] - End Python stack [12:44:58] [DEBUG] [dku.resourceusage] - Reporting completion of CRU:{"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"TUT_LLM_QAWITHRAGAPPROACH","jobId":"Build_knowledge_bank__NP__2024-04-22T19-44-40.767","activityId":"compute_knowledge_bank_1_NP","activityType":"recipe","recipeType":"nlp_llm_rag_embedding","recipeName":"compute_knowledge_bank_1"},"type":"LOCAL_PROCESS","id":"rWzeIAr38DXNDJon","startTime":1713815082387,"localProcess":{"pid":13532,"commandName":"C:\\Users\\admin\\AppData\\Local\\Dataiku\\DataScienceStudio\\dss_home\\code-envs\\python\\testrag\\Scripts\\python.exe","cpuCurrent":0.0,"cpuAverageOverPast60Seconds":0.0,"vmRSSTotalMBS":0}} [12:44:58] [INFO] [dip.exec.resultHandler] - Error file found, trying to throw it: C:\Users\admin\AppData\Local\Dataiku\DataScienceStudio\dss_home\jobs\TUT_LLM_QAWITHRAGAPPROACH\Build_knowledge_bank__NP__2024-04-22T19-44-40.767\compute_knowledge_bank_1_NP\rag-embedding-recipe\pyrunosiNIadfnal7\error.json [12:44:58] [INFO] [dip.exec.resultHandler] - Raw error is{"errorType":"\u003cclass \u0027dataikuapi.utils.DataikuException\u0027\u003e","message":"com.dataiku.dip.io.SocketBlockLink$SecretKernelTimeoutException: Subprocess failed to connect, it probably crashed at startup. Check the logs., caused by: SocketException: Socket operation on nonsocket: configureBlocking","detailedMessage":"\u003cclass \u0027dataikuapi.utils.DataikuException\u0027\u003e: com.dataiku.dip.io.SocketBlockLink$SecretKernelTimeoutException: Subprocess failed to connect, it probably crashed at startup. Check the logs., caused by: SocketException: Socket operation on nonsocket: configureBlocking","stackTrace":[]} [12:44:58] [INFO] [dip.exec.resultHandler] - After enrichment of error file, error is: {"errorType":"\u003cclass \u0027dataikuapi.utils.DataikuException\u0027\u003e","message":"Error in Python process: com.dataiku.dip.io.SocketBlockLink$SecretKernelTimeoutException: Subprocess failed to connect, it probably crashed at startup. Check the logs., caused by: SocketException: Socket operation on nonsocket: configureBlocking","detailedMessage":"Error in Python process: \u003cclass \u0027dataikuapi.utils.DataikuException\u0027\u003e: com.dataiku.dip.io.SocketBlockLink$SecretKernelTimeoutException: Subprocess failed to connect, it probably crashed at startup. Check the logs., caused by: SocketException: Socket operation on nonsocket: configureBlocking","stackTrace":[]} [12:44:58] [INFO] [dku.ml.distributed.pool] - Closing worker pool pool-320ycnumgrxxhepi [12:44:58] [INFO] [dku.ml.distributed.service] - Unregistered worker pool: pool-320ycnumgrxxhepi [12:44:58] [INFO] [dku.flow.activity] - Run thread failed for activity compute_knowledge_bank_1_NP com.dataiku.common.server.APIError$SerializedErrorException: Error in Python process: : com.dataiku.dip.io.SocketBlockLink$SecretKernelTimeoutException: Subprocess failed to connect, it probably crashed at startup. Check the logs., caused by: SocketException: Socket operation on nonsocket: configureBlocking at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.throwFromErrorFileIfPossible(JobExecutionResultHandler.java:108) at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.throwFromErrorFileOrLogs(JobExecutionResultHandler.java:39) at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.throwFromErrorFileOrLogs(JobExecutionResultHandler.java:34) at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResult(JobExecutionResultHandler.java:26) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:70) at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeModule(AbstractPythonRecipeRunner.java:106) at com.dataiku.dip.recipes.nlp.rag_embedding.RAGEmbeddingRecipeRunner$1.run(RAGEmbeddingRecipeRunner.java:124) at com.dataiku.dip.recipes.nlp.rag_embedding.RAGEmbeddingRecipeRunner.run(RAGEmbeddingRecipeRunner.java:104) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:391) [12:44:59] [INFO] [dku.flow.activity] running compute_knowledge_bank_1_NP - activity is finished [12:44:59] [ERROR] [dku.flow.activity] running compute_knowledge_bank_1_NP - Activity failed com.dataiku.common.server.APIError$SerializedErrorException: Error in Python process: : com.dataiku.dip.io.SocketBlockLink$SecretKernelTimeoutException: Subprocess failed to connect, it probably crashed at startup. Check the logs., caused by: SocketException: Socket operation on nonsocket: configureBlocking at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.throwFromErrorFileIfPossible(JobExecutionResultHandler.java:108) at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.throwFromErrorFileOrLogs(JobExecutionResultHandler.java:39) at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.throwFromErrorFileOrLogs(JobExecutionResultHandler.java:34) at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResult(JobExecutionResultHandler.java:26) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:70) at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeModule(AbstractPythonRecipeRunner.java:106) at com.dataiku.dip.recipes.nlp.rag_embedding.RAGEmbeddingRecipeRunner$1.run(RAGEmbeddingRecipeRunner.java:124) at com.dataiku.dip.recipes.nlp.rag_embedding.RAGEmbeddingRecipeRunner.run(RAGEmbeddingRecipeRunner.java:104) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:391) [12:44:59] [INFO] [dku.flow.activity] running compute_knowledge_bank_1_NP - Executing default post-activity lifecycle hook [12:44:59] [INFO] [dku.flow.activity] running compute_knowledge_bank_1_NP - Done post-activity tasks