View on GitHub

meg-hackathon

MEG/EEG hackathon on Open Science, Coding Skills, etc.

Interoperability between Python, MATLAB, R, etc.

Sanne asked the question “I often switch during my analysis between different programs, such as Python, MATLAB, Julia, and R. How can I automate this and make easier analysis scripts that are easier to develop, run and maintain?”

Outline

Using wrappers and some glue code

It is common for larger projects (or analysis pipelines) to have complex dependencies. A strategy to deal with them is to implement then in a modular fashion in which a distinction is made between high- and low-level code. See e.g. http://www.fieldtriptoolbox.org/development/module/.

alt text

The distinction is in reality more complex, since in general there are more layers. It is common that low-level code encapsulates even lower-level code (e.g. in private functions), but there can also be higher level code such as wrappers made by the researcher himself.

In general you can think of the code being organized in a hierarchical structure with multiple layers. The following example in FieldTrip for example applies to EEG forward model leadfield computations using the external OpenMEEG software that is implemented as a C/C++ command-line application:

  1. fieldtrip/ft_prepare_headmodel, which calls
  2. fieldtrip/forward/ft_headmodel_openmeeg, which calls
  3. fieldtrip/external/om_assemble, which calls
  4. om_assemble Linux command-line executables

or using the external SIMBIO software that uses mex files with a mix of C and Fortran:

  1. fieldtrip/ft_prepare_headmodel, which calls
  2. fieldtrip/forward/ft_headmodel_simbio, which calls
  3. fieldtrip/external/simbio mex-files

Use the system() call

The simplest way to call external software when working in an interpreted enviroment is to use the system() call. This is available in Python as os.system, in MATLAB as the system function, In Julia using backticks and the run command, and in R as system.

In all cases the system() call executes another program as a command-line application, just like how it would be executed in the Linux or macOS terminal. So if you can write the piece of your analysis as a command-line application, you are good to go.

Transforming your code in a command-line application

Using the shebang syntax you can make scripts in any interpreted language executable. However, for some interpreters it is more common than for others.

Making Python scripts executable

Your Python script test1.py would simply start with

#!/usr/bin/env python

import sys
import os
import argparse
from glob import glob


parser = argparse.ArgumentParser()
parser.add_argument("inputfile", nargs='+', help="input file file the computation")
parser.add_argument('--foo', nargs=1, help='foo help')
parser.add_argument('--bar', action="store_true", default=False, help='bar help')
args = parser.parse_args()

# show how the command line arguments work
print(args)

# here comes your Python code that deals with
# 1. the command-line options and the input files
# 2. performs the computations
# 3. writes the results to the STDOUT output or (better) to an output file

Furthermore, your Python script should be made executable using chmod +x test1.py.

Making MATLAB scripts executable

Your MATLAB script test2.m cannot start with #!/usr/bin/env matlab, since the MATLAB interpreter does not take the name of the m-file directly as a command line argument. Rather, it requires commands to be passed in the -r option or in the -batch option. Furthermore, MATLAB does not understand the first line starting with # as a commented-out line.

To execute a piece of MATLAB code, you have to wrap it in a bit of BASH code like this

#!/usr/bin/env -S {1} ${2} bash

MATLAB=/Applications/MATLAB_R2020a.app/bin/matlab

TEMPSCRIPT=$(mktemp ${TMPDIR}matlabXXXXX).m
SCRIPTDIR=$(dirname ${TEMPSCRIPT})
SCRIPTNAME=$(basename ${TEMPSCRIPT} .m)

echo MATLAB=$MATLAB

echo TEMPSCRIPT=$TEMPSCRIPT
echo SCRIPTDIR=$SCRIPTDIR
echo SCRIPTNAME=$SCRIPTNAME

if [ -n $1 ]; then
  echo arg1="$1"
end

if [ -n $2 ]; then
  echo arg2="$2" >> ${TEMPSCRIPT}
end

# pass the first 4 command line options as MATLAB string variables
if [ -n $1 ]; then echo arg1 = \'$1\'\; >> ${TEMPSCRIPT}; fi
if [ -n $2 ]; then echo arg2 = \'$2\'\; >> ${TEMPSCRIPT}; fi
if [ -n $3 ]; then echo arg3 = \'$3\'\; >> ${TEMPSCRIPT}; fi
if [ -n $4 ]; then echo arg4 = \'$4\'\; >> ${TEMPSCRIPT}; fi

cat << EOF >> ${TEMPSCRIPT}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% MATLAB CODE BEGINS

% here comes your MATLAB code that deals with
% 1. the command-line options and the input files
% 2. performs the computations
% 3. writes the results to the STDOUT output or (better) to an output file

ver('matlab')
plot(randn(10))
pause(5)

% MATLAB CODE ENDS
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

EOF

MATLABCMD=$(printf 'addpath(%s%s%s); %s' "'" "$SCRIPTDIR" "'"  "$SCRIPTNAME")

${MATLAB} -batch "${MATLABCMD}"

Try copying this in a file test2.m, make it executable with chmod +x test2.m and run it.

Note that the strategy above is also more or less how the evaluation of qsubcellfun and qsubfeval works, see here for details.

Making Julia scripts executable

This works similar to Python, using the shebang syntax. See also this post on passing options to Julia.

Making R scripts executable

This works similar to Python, using the shebang syntax. See for example this post.

Exchanging parameters and input/output data

Since you probably also have to deal with parameters, input files, and output files, it is recommended to make a native wrapper in environment “A” around the code you want to execute using environment “B”. The examples given above already include some code that explains how to parse the additional command-line arguments.

When calling Python from within MATLAB, you could write a MATLAB function test1.m that takes some input arguments, writes those to disk, and that executes the test1.py file (see above) using a system() call. From the MATLAB perspective, you are only dealing with the test1.m function that encapsulates all complexities.

When calling MATLAB from within Python, you would write a Python function test2 either in a separate module that you import, or in your main Python script. It takes some input arguments, writes those to disk or passes them as arguments, and it executes the test2.m file (see above) using a system() call.

Using API bindings for another language

In many programming (or data analysis) environments it is possible to execute code that is implemented in other programming languages.

Starting from working in
Julia can call external C and Fortran code.
Julia has packages providing support for calling code and manipulating data from C++, Java, MATLAB, Python, R (including compatibility between MixedModels.jl and lme4).
MATLAB has MEX files for C/C++ and Fortran code.
MATLAB can access and execute Java classes, which is also used in its own graphical user interface and desktop.
MATLAB can execute functions and objects in Python
Python has an API that allows programmers to extend it with C/C++ code
Python you can also execute Julia, R, and Stan code by using the relevant packages.
R can call external C and Fortran code.
R has interfaces to C++, Python, Julia and Stan (see also here).

The advantage of this tighter binding over using the system() call is that you can directly pass variables between the two; there is no need to transfer input and output data using temporary files.

Using an execution environment