Reducing the Size of Jupyter Notebooks

Content:

Jupyter notebooks are a common way to share data science projects. As well as containing the project source code, they also include the outputs generated by running the code. This includes both text and graphical output, allowing complete projects to be shared in a single file.

A downside to this is the potentially large size of the resulting notebook files. With a large number of graphs and figures, notebooks can span tens of megabytes in size, or even larger for particularly complex projects.

This article explains how you can optimise the size of your Jupyter notebooks, without sacrificing content.

What is ipynbcompress?

ipynbcompress is a Python package that optimizes Jupyter Notebook files by compressing their contents. It helps reduce the file size by removing unnecessary metadata, stripping output cells, and optionally converting images to more efficient formats.

It is available as a Python package, and can be installed using

pip install ipynbcompress

Basic Usage

ipynbcompress is a command line tool, with a variety of options available to control how your notebook is compressed.

The simplest way to use the tool is to pass the path of the notebook to be compressed.

ipynb-compress your_notebook.ipynb

Note the hyphen in the command name, which is not present in the package name.

$ ipynb-compress your_notebook.ipynb
your_notebook.ipynb: 813 kilobytes decrease

The command will return the reduction in the size of the file after running the tool.

If an output filename is not specified, the original file will be overwritten. This can be prevented by passing the name to use for the output.

ipynb-compress your_notebook.ipynb your_compressed_notebook.ipynb

The new notebook will be saved as a new file, preserving the original.

Advanced Options

Additional options are available to tweak parameters used during compression. These are outlined by running the --help command.

Options:
  -w, --img-width INTEGER      Which width images should be resized to.
  -f, --img-format [png|jpeg]  Which compression to use on the images, valid options are png or jpeg (required libjpeg).

Resizing Images

Using the -w flag, it’s possible to set the width the images contained within the notebook are resized to. By default, this value is 2048 pixels.

ipynb-compress -w 1000 your_notebook.ipynb your_compressed_notebook.ipynb

Setting this to a lower value will reduce the resulting file size.

It’s possible to reduce the size of images directly in the code that produces them, but this method allows a higher quality version of the included image to be reproduced by re-running the source code.

Image Format

Using the -f flag, it’s possible to specify an output format for the contained images. By default, images are formatted as pngs.

If you have libjpeg installed on your system, it’s possible to change this to jpeg by passing jpeg as the option following this flag.

ipynb-compress -f jpeg your_notebook.ipynb your_compressed_notebook.ipynb

jpeg files tend to be smaller than png files, at the expense of quality.

Example

To test it out, we ran ipynbcompress on a notebook file which was approximately 9MB in size. The notebook contained analysis of a dataset, and contained a large quantity of graphs as a result.

Running ipynb-compress with the default settings produced a file 7.6% smaller than the original.

With a fresh copy of the original file, the test was repeated using an image width of 800. This lead to a whopping 56% reduction in file size, albeit with a visible reduction in image quality. The quality was still acceptable though, and the original quality images could be regenerated by re-running the code.

Conclusion

ipynbcompress is a valuable tool for anyone working with Jupyter Notebooks, helping to streamline and optimise notebooks for better performance and easier sharing. In testing. using the advanced options allowed for a significant reduction in file size, demonstrating the usefulness of the tool.