Content:
Jupyter notebooks are a common way to share data science projects. As well as containing the project source code, they also include the outputs generated by running the code. This includes both text and graphical output, allowing complete projects to be shared in a single file.
A downside to this is the potentially large size of the resulting notebook files. With a large number of graphs and figures, notebooks can span tens of megabytes in size, or even larger for particularly complex projects.
This article explains how you can optimise the size of your Jupyter notebooks, without sacrificing content.
What is ipynbcompress
?
ipynbcompress
is a Python package that optimizes Jupyter Notebook files by compressing their contents. It helps reduce the file size by removing unnecessary metadata, stripping output cells, and optionally converting images to more efficient formats.
It is available as a Python package, and can be installed using
pip install ipynbcompress
Basic Usage
ipynbcompress
is a command line tool, with a variety of options available to control how your notebook is compressed.
The simplest way to use the tool is to pass the path of the notebook to be compressed.
ipynb-compress your_notebook.ipynb
Note the hyphen in the command name, which is not present in the package name.
$ ipynb-compress your_notebook.ipynb
your_notebook.ipynb: 813 kilobytes decrease
The command will return the reduction in the size of the file after running the tool.
If an output filename is not specified, the original file will be overwritten. This can be prevented by passing the name to use for the output.
ipynb-compress your_notebook.ipynb
your_compressed_notebook.ipynb
The new notebook will be saved as a new file, preserving the original.
Advanced Options
Additional options are available to tweak parameters used during compression. These are outlined by running the --help
command.
Options:
-w, --img-width INTEGER Which width images should be resized to.
-f, --img-format [png|jpeg] Which compression to use on the images, valid options are png or jpeg (required libjpeg).
Resizing Images
Using the -w
flag, it’s possible to set the width the images contained within the notebook are resized to. By default, this value is 2048 pixels.
ipynb-compress -w 1000 your_notebook.ipynb
your_compressed_notebook.ipynb
Setting this to a lower value will reduce the resulting file size.
It’s possible to reduce the size of images directly in the code that produces them, but this method allows a higher quality version of the included image to be reproduced by re-running the source code.
Image Format
Using the -f
flag, it’s possible to specify an output format for the contained images. By default, images are formatted as png
s.
If you have libjpeg
installed on your system, it’s possible to change this to jpeg
by passing jpeg
as the option following this flag.
ipynb-compress -f jpeg your_notebook.ipynb
your_compressed_notebook.ipynb
jpeg
files tend to be smaller than png
files, at the expense of quality.
Example
To test it out, we ran ipynbcompress
on a notebook file which was approximately 9MB in size. The notebook contained analysis of a dataset, and contained a large quantity of graphs as a result.
Running ipynb-compress
with the default settings produced a file 7.6% smaller than the original.
With a fresh copy of the original file, the test was repeated using an image width of 800. This lead to a whopping 56% reduction in file size, albeit with a visible reduction in image quality. The quality was still acceptable though, and the original quality images could be regenerated by re-running the code.
Conclusion
ipynbcompress
is a valuable tool for anyone working with Jupyter Notebooks, helping to streamline and optimise notebooks for better performance and easier sharing. In testing. using the advanced options allowed for a significant reduction in file size, demonstrating the usefulness of the tool.