Contains software installed on top of an Ubuntu 18.04 distribution:
Apache Spark 2.4.0
A virtualenv for Python 3.6 with a scientific Python stack (numpy, scipy, matplotplib, pandas, statmodels, gensim, networkx, scikit-learn) plus IPython 6 + Jupyter notebook
R 3.5.1 with a few packages installed (rmarkdown, magrittr, tidyverse, data.table, ggplot2, caret, SparkR, sparklyr)
Spark notebook Kernels for Python 3.6, Scala (SPylon) and R (IRKernel), in addition to the default "plain" (i.e. non-Spark capable) Python 3.6 kernel.
A few small notebook extensions
A notebook startup daemon script with facilities to configure Spark execution mode
Spark pre-configured to use GraphFrames 0.7.0
Note this is a base box, in particular neither Spark nor Jupyter notebook are fully configured. A complementary Vagrantfile builds on this base box to provide a fully functional Spark environment. That one is available in the ml-vm-notebook repository.
Contains software installed on top of an Ubuntu 18.04 distribution:
Apache Spark 2.3.2
A virtualenv for Python 3.6 with a scientific Python stack (numpy, scipy, matplotplib, pandas, statmodels, gensim, networkx, scikit-learn) plus IPython 6 + Jupyter notebook
R 3.5.1 with a few packages installed (rmarkdown, magrittr, tidyverse, data.table, ggplot2, caret, SparkR, sparklyr)
Spark notebook Kernels for Python 3.6, Scala (SPylon) and R (IRKernel), in addition to the default "plain" (i.e. non-Spark capable) Python 3.6 kernel.
A few small notebook extensions
A notebook startup daemon script with facilities to configure Spark execution mode
GraphFrames 0.6.0
Note this is a base box, in particular neither Spark nor Jupyter notebook are fully configured. A complementary Vagrantfile builds on this base box to provide a fully functional Spark environment. That one is available in the ml-vm-notebook repository.
Changes: base box is Ubutu 18.04 (with Python 3.6), Spark is 2.3.2 (custom build, including native BLAS of Spark bindings), IPython 7.1.1, R 3.5.1, updated R &Python packages, GraphFrames is 0.6.0
Contains software installed on top of an Ubuntu 16.04 distribution:
Apache Spark 2.3.0
A virtualenv for Python 3.5 with a scientific Python stack (numpy, scipy, matplotplib, pandas, statmodels, gensim, networkx, scikit-learn) plus IPython 6 + Jupyter notebook
R 3.4.4 with a few packages installed (rmarkdown, magrittr, tidyverse, data.table, ggplot2, caret, SparkR, sparklyr)
Spark notebook Kernels for Python 3.5, Scala (SPylon) and R (IRKernel), in addition to the default "plain" (i.e. non-Spark capable) Python 3.5 kernel.
Note this is a base box, in particular neither Spark nor Jupyter notebook are fully configured. A complementary Vagrantfile builds on this base box to provide a fully functional Spark environment. That one is available in the ml-vm-notebook repository.
Changes: Spark is 2.3.0 (custom build, including native BLAS of Spark bindings), IPython 6.2.1, R 3.4.4, SPylon kernel for Scala, updated R &Python packages, added Cartopy, pydot instead of pydot-ng, GraphViz compiled from source, updated GraphFrames snapshot
Contains software installed on top of an Ubuntu 16.04 distribution:
Apache Spark 2.2.0
A virtualenv for Python 3.5 with a scientific Python stack (numpy, scipy, matplotplib, pandas, statmodels, gensim, networkx, scikit-learn) plus IPython 6 + Jupyter notebook
R 3.4.2 with a few packages installed (rmarkdown, magrittr, dplyr, tidyr, data.table, ggplot2, caret, SparkR, sparklyr)
Spark notebook Kernels for Python 3.5, Scala (SPylon) and R (IRKernel), in addition to the default "plain" (i.e. non-Spark capable) Python 3.5 kernel.
Note this is a base box, in particular neither Spark nor Jupyter notebook are fully configured. A complementary Vagrantfile builds on this base box to provide a fully functional Spark environment. That one is available in the ml-vm-notebook repository.
Changes: based on Ubuntu 16.04, Spark is 2.2.0 (custom build, including native BLAS of Spark bindings), IPython 6.2.1, R 3.4.2, SPylon kernel for Scala, updated R & Python packages, added GraphFrames
Contains software installed on top of a CentOS 7.3 distribution:
Apache Spark 2.1.0
A virtualenv for Python 2.7.5 with a scientific Python stack (numpy, scipy, matplotplib, pandas, statmodels, gensim, networkx, scikit-learn) plus IPython 5 + Jupyter notebook
R 3.3.2 with a few packages installed (rmarkdown, magrittr, dplyr, tidyr, data.table, ggplot2, caret, SparkR, sparklyr)
Spark notebook Kernels for Python 2.7, Scala (Toree) and R (IRKernel), in addition to the default "plain" (i.e. non-Spark capable) Python 2.7 kernel.
A notebook startup daemon script with facilities to configure Spark execution mode
Note this is a base box, in particular neither Spark nor Jupyter notebook are fully configured. A complementary Vagrantfile builds on this base box to provide a fully functional Spark environment. That one is available in the ml-vm-notebook repository.
Changes: based on CentOS 7.3, Spark is 2.1.0, IPython 5.3.0, R 3.3.2, use OpenJDK 8 instead of Oracle Java, no Theano/Keras installation (left to child boxes), fixes in manager script, custom build of Spark, including native BLAS bindings, sparklyr installed from github, custom Toree build, optimized for upload