Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I have a task to perform web scraping in a Dataiku notebook, and for that purpose, I need to utilize ChromeDriver. However, I'm unsure about the process of installing ChromeDriver and integrating it into a Dataiku notebook. Is there a method to invoke ChromeDriver within a Dataiku notebook?
Hi @Ramya ,
So you would need your systems admin to add and install
1) chrome driver
wget https://chromedriver.storage.googleapis.com/$(curl -sS https://chromedriver.storage.googleapis.com/LATEST_RELEASE)/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
sudo mv chromedriver /usr/local/bin/
2) Download and install chrome
sudo wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
sudo yum localinstall google-chrome-stable_current_x86_64.rpm
Then you should add selenium to a code env and use
import dataiku
from selenium import webdriver
import time
import pandas as pd
# selenium stuff
options = webdriver.ChromeOptions() ;
prefs = {"download.default_directory" : "/tmp", "prompt_for_download": "false"};
output_dataset = dataiku.Dataset("fitness2")
chromeOptions = webdriver.ChromeOptions()
chromeOptions.add_argument("--headless")
chromeOptions.add_argument("--download.prompt_for_download=false")
chromeOptions.add_argument("--download.default_directory=/tmp")
chromeOptions.add_experimental_option("prefs",prefs);
driver = webdriver.Chrome('/usr/local/bin/chromedriver', chrome_options=chromeOptions)
try:
driver.get('https://www.browserstack.com/test-on-the-right-mobile-devices');
downloadcsv= driver.find_element_by_css_selector('.icon-csv');
gotit= driver.find_element_by_id('accept-cookie-notification');
gotit.click();
downloadcsv.click();
time.sleep(5)
driver.close()
except:
print("Invalid URL")
driver.close()
# read downloaded file and create dataset
cereal_df = pd.read_csv("/tmp/BrowserStack - List of devices to test on.csv")
output_dataset.write_with_schema(cereal_df)