How to use this box with Vagrant:

Vagrant.configure("2") do |config|
  config.vm.box = "paulovn/spark-base64"
  config.vm.box_version = "1.0.0"
end
vagrant init paulovn/spark-base64 \
  --box-version 1.0.0
vagrant up

This version was created almost 8 years ago.

Contains software installed on top of a CentOS 6.7 distribution:

  • Apache Spark 1.6.2
  • Python 2.7.8 from the Software Collections
  • A virtualenv for Python 2.7.8 with a scientific Python stack (scipy, numpy, matplotplib, pandas, statmodels, gensim, networkx, scikit-learn, theano+keras) plus IPython 4 + Jupyter notebook
  • R 3.3.1 with a few packages installed (rmarkdown, magrittr, dplyr, tidyr, data.table, ggplot2, caret)
  • Spark notebook Kernels for Python 2.7, Scala (Toree) and R (IRKernel), in addition to the default "plain" (i.e. non-Spark capable) Python 2.7 kernel.
  • A few small notebook extensions
  • A notebook startup daemon script with facilities to configure Spark execution mode
  • Two additional Spark external libraries (plus configuration to use the GraphFrames package)
    • The Kafka Spark Streaming artifact
    • The Spark CSV library

Note this is a base box, in particular neither Spark nor Jupyter notebook are fully configured. A complementary Vagrantfile builds on this base box to provide a fully functional Spark environment. That one is available in the ml-vm-notebook repository.

Changes: Spark updated to 1.6.2, R updated to 3.3.1, added graphviz Python package, added a system theanorc file, install openblas, updated to the most recent available versions of Python & R packages, reduced Spark log levels, improved wrapper script

1 provider for this version.
  • virtualbox
    unknown Hosted by Vagrant Cloud (2.44 GB)