How to use this box with Vagrant:

Vagrant.configure("2") do |config|
  config.vm.box = "paulovn/spark-base64"
end
vagrant init paulovn/spark-base64
vagrant up

This version was created 9 months ago.

  • Base OS is Ubuntu 22.04
  • R is now 4.3.1
  • Python is 3.10.12
  • Spark is 3.4.1
1 provider for this version.
  • virtualbox
    amd64 Hosted by Vagrant Cloud (2.6 GB)

This version was created about 1 year ago.

  • Base OS updated to Ubuntu 22.04
    • R is now 4.xx
    • Python is 3.10.x
  • Spark is 3.3.2
  • Graphframes 0.8.3-SNAPSHOT assembly jar added to the VM
1 provider for this version.
  • virtualbox
    unknown Hosted by Vagrant Cloud (2.67 GB)

This version was created almost 3 years ago.

  • Base OS Updated to Ubuntu 20.04
    • R is now 4.0.3
    • Python is 3.8.5
  • Spark is 3.1.2
  • PyLucene is 8.6.1
1 provider for this version.
  • virtualbox
    unknown Hosted by Vagrant Cloud (2.51 GB)

This version was created over 3 years ago.

  • Base OS Updated to Ubuntu 20.04
    • R is now 4.0.3
    • Python is 3.8.5
  • Spark is 3.1.1
1 provider for this version.
  • virtualbox
    unknown Hosted by Vagrant Cloud (2.59 GB)

This version was created about 4 years ago.

  • Spark is 2.4.6
  • Base OS Updated (R is now 3.6.2)
  • Updated some Python packages. PyLucene is now 8.1.1
  • Added package for Brotli codec (to use in Parquet files)
  • Updated Spark config
    • Use the optimized version of Hadoop FileOutputCommiter
    • Packages for Spark 2.4.6
    • Compatibiliy Setting for PyArrow >= 0.15.0
  • Refactored management scripts
    • Start jupyter service automatically at system startup
    • Wrap Jupyter actual launch to wait until the notebook directory (shared folder) is ready
1 provider for this version.
  • virtualbox
    unknown Hosted by Vagrant Cloud (2.17 GB)

This version was created over 5 years ago.

Contains software installed on top of an Ubuntu 18.04 distribution:

  • Apache Spark 2.4.0
  • A virtualenv for Python 3.6 with a scientific Python stack (numpy, scipy, matplotplib, pandas, statmodels, gensim, networkx, scikit-learn) plus IPython 6 + Jupyter notebook
  • R 3.5.1 with a few packages installed (rmarkdown, magrittr, tidyverse, data.table, ggplot2, caret, SparkR, sparklyr)
  • Spark notebook Kernels for Python 3.6, Scala (SPylon) and R (IRKernel), in addition to the default "plain" (i.e. non-Spark capable) Python 3.6 kernel.
  • A few small notebook extensions
  • A notebook startup daemon script with facilities to configure Spark execution mode
  • Spark pre-configured to use GraphFrames 0.7.0

Note this is a base box, in particular neither Spark nor Jupyter notebook are fully configured. A complementary Vagrantfile builds on this base box to provide a fully functional Spark environment. That one is available in the ml-vm-notebook repository.

Changes: Spark 2.4.0, GraphFrames is 0.7.0

1 provider for this version.
  • virtualbox
    unknown Hosted by Vagrant Cloud (1.95 GB)

This version was created over 5 years ago.

Contains software installed on top of an Ubuntu 18.04 distribution:

  • Apache Spark 2.3.2
  • A virtualenv for Python 3.6 with a scientific Python stack (numpy, scipy, matplotplib, pandas, statmodels, gensim, networkx, scikit-learn) plus IPython 6 + Jupyter notebook
    • R 3.5.1 with a few packages installed (rmarkdown, magrittr, tidyverse, data.table, ggplot2, caret, SparkR, sparklyr)
  • Spark notebook Kernels for Python 3.6, Scala (SPylon) and R (IRKernel), in addition to the default "plain" (i.e. non-Spark capable) Python 3.6 kernel.
  • A few small notebook extensions
  • A notebook startup daemon script with facilities to configure Spark execution mode
  • GraphFrames 0.6.0

Note this is a base box, in particular neither Spark nor Jupyter notebook are fully configured. A complementary Vagrantfile builds on this base box to provide a fully functional Spark environment. That one is available in the ml-vm-notebook repository.

Changes: base box is Ubutu 18.04 (with Python 3.6), Spark is 2.3.2 (custom build, including native BLAS of Spark bindings), IPython 7.1.1, R 3.5.1, updated R &Python packages, GraphFrames is 0.6.0

1 provider for this version.
  • virtualbox
    unknown Hosted by Vagrant Cloud (1.79 GB)

This version was created about 6 years ago.

Contains software installed on top of an Ubuntu 16.04 distribution:

  • Apache Spark 2.3.0
  • A virtualenv for Python 3.5 with a scientific Python stack (numpy, scipy, matplotplib, pandas, statmodels, gensim, networkx, scikit-learn) plus IPython 6 + Jupyter notebook
  • R 3.4.4 with a few packages installed (rmarkdown, magrittr, tidyverse, data.table, ggplot2, caret, SparkR, sparklyr)
  • Spark notebook Kernels for Python 3.5, Scala (SPylon) and R (IRKernel), in addition to the default "plain" (i.e. non-Spark capable) Python 3.5 kernel.
  • A few small notebook extensions
  • A notebook startup daemon script with facilities to configure Spark execution mode
  • GraphFrames 0.6.0 snapshot

Note this is a base box, in particular neither Spark nor Jupyter notebook are fully configured. A complementary Vagrantfile builds on this base box to provide a fully functional Spark environment. That one is available in the ml-vm-notebook repository.

Changes: Spark is 2.3.0 (custom build, including native BLAS of Spark bindings), IPython 6.2.1, R 3.4.4, SPylon kernel for Scala, updated R &Python packages, added Cartopy, pydot instead of pydot-ng, GraphViz compiled from source, updated GraphFrames snapshot

1 provider for this version.
  • virtualbox
    unknown Hosted by Vagrant Cloud (1.73 GB)

This version was created over 6 years ago.

Contains software installed on top of an Ubuntu 16.04 distribution:

  • Apache Spark 2.2.0
  • A virtualenv for Python 3.5 with a scientific Python stack (numpy, scipy, matplotplib, pandas, statmodels, gensim, networkx, scikit-learn) plus IPython 6 + Jupyter notebook
  • R 3.4.2 with a few packages installed (rmarkdown, magrittr, dplyr, tidyr, data.table, ggplot2, caret, SparkR, sparklyr)
  • Spark notebook Kernels for Python 3.5, Scala (SPylon) and R (IRKernel), in addition to the default "plain" (i.e. non-Spark capable) Python 3.5 kernel.
  • A few small notebook extensions
  • A notebook startup daemon script with facilities to configure Spark execution mode
  • GraphFrames 0.6.0 snapshot

Note this is a base box, in particular neither Spark nor Jupyter notebook are fully configured. A complementary Vagrantfile builds on this base box to provide a fully functional Spark environment. That one is available in the ml-vm-notebook repository.

Changes: based on Ubuntu 16.04, Spark is 2.2.0 (custom build, including native BLAS of Spark bindings), IPython 6.2.1, R 3.4.2, SPylon kernel for Scala, updated R & Python packages, added GraphFrames

1 provider for this version.
  • virtualbox
    unknown Hosted by Vagrant Cloud (1.59 GB)

This version was created over 7 years ago.

Contains software installed on top of a CentOS 7.3 distribution:

  • Apache Spark 2.1.0
  • A virtualenv for Python 2.7.5 with a scientific Python stack (numpy, scipy, matplotplib, pandas, statmodels, gensim, networkx, scikit-learn) plus IPython 5 + Jupyter notebook
  • R 3.3.2 with a few packages installed (rmarkdown, magrittr, dplyr, tidyr, data.table, ggplot2, caret, SparkR, sparklyr)
  • Spark notebook Kernels for Python 2.7, Scala (Toree) and R (IRKernel), in addition to the default "plain" (i.e. non-Spark capable) Python 2.7 kernel.
  • A few small notebook extensions
  • A notebook startup daemon script with facilities to configure Spark execution mode

Note this is a base box, in particular neither Spark nor Jupyter notebook are fully configured. A complementary Vagrantfile builds on this base box to provide a fully functional Spark environment. That one is available in the ml-vm-notebook repository.

Changes: based on CentOS 7.3, Spark is 2.1.0, IPython 5.3.0, R 3.3.2, use OpenJDK 8 instead of Oracle Java, no Theano/Keras installation (left to child boxes), fixes in manager script, custom build of Spark, including native BLAS bindings, sparklyr installed from github, custom Toree build, optimized for upload

1 provider for this version.
  • virtualbox
    unknown Hosted by Vagrant Cloud (1.55 GB)
Next