Common Issues with PySpark
- Common Issues with PySpark
- if you encounter an error,
The root scratch dir: /tmp/hive on HDFS should be writable - Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
What if the data file I try to upload through Jupyter Notebook exceeds 25 MB?
Jupyter Notebook allows you to upload files no larger than 25MB. If the file exceeds the 25 MB, you can use the wget approach (if the data file has a downloadable link).
To use wget from jupyter notebook, use New > Terminal, then you have access to the bash terminal, where you can run:
If the file is already your disk (but no in cloudera VM), you can upload to cloudera VM using the shared vagrant folder (e.g. your MSBA desktop's c:/vagrant is shared with Cloudera VM's /vagrant).
- copy your file to
c:/vagranton your MSBA desktop - open bash from Cloudera VM. Type to verify that your file exists in the vagrant folder on your VM.
- Copy or move your file to the intended directory, e.g.: This will copy your file to your home directory.
if you encounter an error, The root scratch dir: /tmp/hive on HDFS should be writable
If you encounter an error, The root scratch dir: /tmp/hive on HDFS should be writable. Open a terminal and run the following comamnd:
Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
If at any point you see this error, AnalysisException: u'java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;.
Please
- removing the *.lck file from hive metastore_db
If the above does not work still, try to restart the kernel (from your current jupyter notebook's menu, kernel > restart).