For me, the largest nuisance is that figures are stored as very long strings in the output of cells. This makes it almost impossible to manually resolve conflicts while merging two branches. The solution is simple: just clear the output of the cells before committing the notebook. I'm doing this with a simple script (which I found somewhere on the Internet).
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
from __future__ import print_function | |
import json, os, sys | |
def main(): | |
""" | |
Usage: ./clear_output_ipynb.py filename.ipynb | |
Output: filename.ipynb.cln | |
Script to clear all output from Jupyter notebooks. | |
""" | |
try: | |
inFileName = sys.argv[1] | |
outFileName = inFileName + ".cln" | |
except: | |
print("ERROR: no file name entered") | |
sys.exit(1) | |
with open(inFileName) as inFileHandle: | |
data = json.load(inFileHandle) | |
## remove output form code cells, set execution count to None | |
for n, cell in enumerate(data["cells"]): | |
if cell["cell_type"] == "code": | |
cell["execution_count"] = None | |
cell['outputs'] = [] | |
## write cleaned data to file | |
with open(outFileName, 'w') as outFileHandle: | |
json.dump(data, outFileHandle, indent=2, sort_keys=True) | |
if __name__=='__main__': | |
main() |
The script
clear_output_ipynb.py
lives in the same folder (called notebooks
) as my Jupyter notebooks. I don't track changes in the .ipynb
files, but have "clean" copies of the notebooks
(with extension .ipynb.cln
) that are part of the Git repository.
To make life easy, I have two makefiles in my project folder called cln.makefile
and
nbs.makefile
. Before I stage the changes in my notebooks, I first run
$ make -f cln.makefilewhich runs the script
clear_output_ipynb.py
for each notebook in my notebooks
folder.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
NOTEBOOKS := $(wildcard notebooks/*.ipynb) | |
CLEAN_NOTEBOOKS := $(NOTEBOOKS:.ipynb=.ipynb.cln) | |
all: $(CLEAN_NOTEBOOKS) | |
%.ipynb.cln: %.ipynb | |
notebooks/clear_output_ipynb.py $< |
After I pull changes from a remote repository, or switch to another branch, I have to copy all
.ipynb.cln
files to .ipynb
files. For this I have another makefile,
and so I run
$ make -f nbs.makefilebefore using and modifying the notebooks.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CLEAN_NOTEBOOKS := $(wildcard notebooks/*.ipynb.cln) | |
NOTEBOOKS := $(CLEAN_NOTEBOOKS:.ipynb.cln=.ipynb) | |
all: $(NOTEBOOKS) | |
%.ipynb: %.ipynb.cln | |
cp $< $@ |
Of course, sometimes I forget to clean the notebooks before committing, or I forget to make the
.ipynb
files. I've tried to automate the process of cleaning and
copying with Git "hooks", but I have not been able to make that work. If somebody knows how, let me know!