|
|
Subscribe / Log in / New account

An introduction to SciPy

January 19, 2021

This article was contributed by Lee Phillips

SciPy is a collection of Python libraries for scientific and numerical computing. Nearly every serious user of Python for scientific research uses SciPy. Since Python is popular across all fields of science, and continues to be a prominent language in some areas of research, such as data science, SciPy has a large user base. On New Year's Eve, SciPy announced version 1.6 of the scipy library, which is the central component in the SciPy stack. That release gives us a good opportunity to delve into this software and give some examples of its use.

What is SciPy?

The name SciPy refers to a few related ideas. It is used in the titles of several international conferences related to the use of Python in scientific research. It is also the name of the scipy library, which contains modules for use in various areas of scientific and numerical computing. Some examples of these are:

  • scipy.integrate, with routines for numerical integration and for solving differential equations
  • scipy.fft, for Fourier transforms
  • scipy.linalg, for linear algebra
  • scipy.stats, for statistics
  • scipy.special, providing many special functions used in mathematical physics, such as the airy, elliptic, bessel, and hypergeometric functions

Beyond scipy, the SciPy collection includes a number of useful components packaged together for use in scientific research. NumPy, the fundamental library that adds an array data type to Python and wraps the C and Fortran routines that make numerical work with Python practical, is one of those components, as is pandas, for data science, and the Python plotting library Matplotlib. The user interfaces IPython and Jupyter are also part of the collection, along with Cython, for building C extensions; there are others as well. Used in this sense, SciPy refers, then, to a software ecosystem: a large number of modules that can be used together to build solutions to research problems and to share them with others.

The idea behind SciPy is to allow the scientist, with a single install command, to have a big toolbox available. Aside from a few glitches, this mainly works as promised. Even with the residual version conflicts that crop up from time to time, the existence of the project largely frees the researcher from the arduous task of hunting down compatible versions of a dozen different libraries and installing them individually. In addition, SciPy provides a community as well as a central place to seek documentation and assistance.

Using SciPy

The range of applications for SciPy is enormous, so the small number of examples that there is space for here will naturally leave out almost everything that it can do. Nevertheless, a few examples will give a feel for how people make use of the project. The main example is similar to problems addressed in recent articles on Pluto and Julia, so that the reader can compare the methods of solution offered by the different projects.

Of course, the first step is to install everything. The project provides official installation instructions that go over a handful of ways to get the libraries in place, using more recent versions than will be provided by most distributions' package managers. I chose to use the pip install option within a fresh virtual environment. Afterward, I was left with an IPython that failed to do tab completion and crashed frequently. Some focused Google searching turned up a version conflict; something that the use of virtual environments is supposed to help prevent. The problem was fixed by following the advice to downgrade the completion module with the command:

    $ python -m pip install jedi==0.17.2 

The three main ways to use Python, and therefore the SciPy libraries, are by creating script files using any editor, by using a Jupyter notebook, or by using the IPython read-eval-print loop (REPL). Of course, the "ordinary" REPL is always available by typing python in the shell, but since IPython provides significant enhancements over that, there is little reason not to use it.

Our first example will be the numerical solution of the ordinary differential equation (ODE) that we attacked in the Julia article. The problem, called the logistic equation, is:

    f' = f - f²

where the prime signifies the time derivative.

The listing below shows the steps from the IPython session where this problem was set up and solved, interspersed with explanations of what the code is doing:

    from scipy.integrate import solve_ivp

First we import solve_ivp, which is SciPy's basic ODE numerical solver. scipy.integrate is a module that includes various routines for approximating integrals and finding numerical approximations to the solutions of ODEs.

    # Here we define a function that returns the RHS of the ODE:    
    def f(t, f):
        return [f - f**2]

The f() function, which describes the right-hand side of the equation, needs to return a list. In this case it just contains a single element, because the equation is first-order, meaning that it just has a first derivative. One must rewrite higher-order equations as systems of first-order equations, in which case the returned array will have an element for each equation in the system.

    # We'll store the time interval for the solution here:
    tint = [0, 8]

    import numpy as np

    # linspace creates an array of equally-spaced values:
    t = np.linspace(0, 8, 100)

The variable t will contain an array from 0 to 8 in 100 equally spaced steps. These will be the times at which the solution gets calculated. If I only wanted to know the value of the solution at the end of a time interval, I wouldn't need to set up an array like this, but I intend to plot the solution, so I want its value at many points.

    # Now we can solve, storing the solution in sol:
    sol = solve_ivp(f, tint, [0.05], t_eval=t)

The third argument in the call to the solver is the initial condition at the first time value, which in this case is 0.05. It's in an array, like the function itself, to handle the case of a system of multiple equations. The return value, stored in sol, is an object with a handful of fields with information about the solution. The solution itself is stored in sol.y, with corresponding independent variable values in sol.t. In this case, sol.t should be identical to t, but if the solver is not supplied with an input time array, it will decide at what times to provide solution values, and sol.t will need to be consulted.

    import matplotlib.pyplot as plt

    # A no-frills plot, using the array stored in fields of sol:
    plt.plot(sol.t, sol.y[0])

    # Here is the command to display the plot:
    plt.show()

The plot is shown below. If we had a system of equations rather than just one, we could plot the variables of interest by passing additional elements of the sol.y array into the plotting function.

[Logistic fn plot]

The scipy component of SciPy contains some interesting modules focused on various domains. For example, let's explore parts of the image processing library scipy.ndimage. The listing below is from another IPython session, where I read in an image and apply four transformations to it.

    from scipy import ndimage
    import matplotlib.pyplot as plt

    # Read an image file from disk
    bogy = plt.imread('bogart.jpg')

The transformations in the ndimage library operate on matrices; the imread() function reads the image and stores it in the variable bogy as a 1200×1200×3 numerical matrix: the original image dimensions, and one plane for each of the primary colors. Each of the image transformation functions used below accepts the input image as a first argument, followed by parameters for the transformation. The sigma parameter in the calls to gaussian_filter() and gaussian_laplace() determine the strength of the transformation. The gaussian_laplace() filter transforms a smooth image by using second derivatives to detect changes in intensity.

Shown directly below is the original image; after each transformation, the resulting image is shown.

[Original Bogart]
    # The first transformation: Gaussian blurring:
    bogyBlurred = ndimage.gaussian_filter(bogy, sigma=17)
[Blurred]
    # The rotation function takes degrees:
    bogyTilted = ndimage.rotate(bogy, 45)
[Tilted]
    # This transformation can create sketch-like images
    # from photographs:
    bogySketched = ndimage.gaussian_laplace(bogy, sigma=3)
[Sketched]
    # The geometric_transform routine uses a mapping function
    # to transform coordinates:
    def mapping(t):
        return (t[0], t[1]**1.3, t[2])

    bogySqueezed = ndimage.geometric_transform(bogy, mapping)
[Squeezed]
    # Each processed image is stored in its own variable. We could
    # apply additional transformations to them or save them
    # to disk with the following:
    plt.imshow(bogyTilted)
    plt.savefig('bogyTilted.png')

The ndimage library has many more functions available; it has applications that go well beyond transforming images of noir movie stars. It can usefully be applied to such problems as automatically counting the number of bacteria on a microscope slide or measuring the dimensions of objects in a photograph.

Changes in 1.6

In the release announcement, linked above, version 1.6 is described as "the culmination of 6 months of hard work. It contains many new features, numerous bug-fixes, improved test coverage and better documentation". As is usual in a large project of this sort, each item in the long list of changes is of interest only to those who happen to be using the module to which it applies. But it's worth noting here that many of the changes were for the ndimage library explored in the previous section, including an overhaul of the library's documentation.

Help and documentation

SciPy's wide adoption and long history mean that there is plenty of help and reference material available. The starting page for the official documentation is a convenient index to the documentation for each of SciPy's major components. Each of these components is its own project with its own team, and this index leads to the various project sites. For example, one of the components is SymPy, which I reviewed here recently; the index leads directly to the documentation maintained on the SymPy site.

The SciPy team maintains Planet SciPy, which is an aggregator of articles and posts about SciPy from around the web. Many of these are tutorials on various topics, both for the beginner and the expert. The SciPy Cookbook should not be overlooked. This is a collection of detailed, and sometimes extensive, code samples giving examples of the use of SciPy libraries to attack various problems. The examples range from starting points for the beginner to highly focused advanced applications.

Moving beyond official sources, the SciPy Lecture Notes is a comprehensive document that starts with overviews of the Python language and the SciPy ecosystem and continues to advanced topics in many areas. It has many contributing authors and seeks contributions from the community though GitHub. These well-illustrated Lecture Notes are called that because they are designed to be usable by teachers as projected slides in addition to being self-contained reading material. To this end, code is displayed in boxes with widgets that collapse each code sample into a more concise form, suppressing prompts and some output. This makes the slides less busy and more effective as accompaniments to lectures, while keeping all the details at hand for the online reader.

Physicist Robert Johansson maintains a set of lectures on scientific computing with Python on GitHub, which serves as a useful source of information on the use the SciPy. The lectures are in the form of Jupyter notebooks for download, but the author offers a PDF version as well.

For those who prefer books, Johansson's volume Numerical Python covers much of this ground. The Python Data Science Handbook is well regarded by data scientists, and the author, Jake VanderPlas, offers the full text free for reading both in conventional form and as a set of Jupyter notebooks. Another book by a physicist is Learning Scientific Programming with Python. Its author, Christian Hill, maintains a GitHub page with interesting case studies, in Jupyter form, related to topics mentioned in his book.

There is certainly a wealth of reference and tutorial documentation available in various forms. And when one gets stuck, one advantage of SciPy's wide adoption is that there is likely to be a way to get unstuck that can be found in sites like Stack Overflow.

SciPy or not?

Python, NumPy, and the other components that eventually became consolidated into SciPy paved the way for free software in the sciences. Although some other languages and libraries may now be superior, Python and SciPy laid the foundation for the adoption of them, partly due to the attraction of IPython/Jupyter notebooks. The Python ecosystem opened scientists' eyes to the advantages of free software as tools for research. Using free software means not only avoiding licensing fees, which are real obstacles in many parts of the world, but avoiding black boxes as well, thereby supporting openness, reliability, reproducibility, and confidence in results. The benefits to science are substantial.

If one has a good reason to use the Python ecosystem—perhaps the need for special capabilities of a particular library or being part of a group already programming in Python—then SciPy will be the obvious choice. But SciPy may not be the best choice for every scientist looking for free-software tools. Those who need only a free, fast, and sophisticated program for matrix computations, equation solving, and similar tasks, and prefer the simplicity of installing one thing that doesn't depend on other pieces, should consider Octave, reviewed here back in December. Mathematicians looking to supplement their pencil-and-paper skills may not need most of what's in SciPy. They may be better served by installing either SymPy or Maxima, both of which can do some numerical computation and graphing in addition to symbolic manipulations.

A scientist who is starting a new project and who doesn't need to interoperate with colleagues using Python should seriously consider taking up Julia as a high-level language. The Julia article linked above outlines some of the reasons for Julia's ascendancy in the sciences. The main attractions of Python, especially in fields such as data science, are some of its widely-used and well-tested libraries, but this is not a big reason to avoid Julia, which makes access to Python libraries almost as easy as using them from Python code.

A scientist with a laptop and an internet connection can install these tools and try them all out. The depth and breadth of free software available for scientific research these days would have been unimaginable a decade or so ago. Furthermore, there are no real obstacles to keeping all of that software installed, thus maintaining the ability to use the best tool to assist in any research task.

Index entries for this article
GuestArticlesPhillips, Lee
PythonLibraries
PythonScientific computing


to post comments

An introduction to SciPy

Posted Jan 19, 2021 19:36 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (8 responses)

Please, don't use Python for anything big. Large SciPy programs are slow and usually unmaintainable.

They start small and fast, but eventually you encounter a task that can't be performed without hand-rolled loops. So you either just eat the performance penalty of interpretation or go into an endless rabbit hole of C/Cython extensions. And this brings all the pain of struggling with build systems, distribution and so on.

Just stick with Julia or C/C++ if you can.

An introduction to SciPy

Posted Jan 19, 2021 23:04 UTC (Tue) by xxiao (guest, #9631) [Link] (6 responses)

really? the heavy duty is still executed in c/c++ modules, the python/scipy here are just glue, I'd like to profile any 'large application' you mentioned and see where the real bottleneck is, it's probably have nothing to do with python/scipy if you use them 'correctly'.

An introduction to SciPy

Posted Jan 19, 2021 23:59 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

> really?
Yes.

> the heavy duty is still executed in c/c++ modules, the python/scipy here are just glue
The problem happens once you can't express an algorithm using existing primitives in scipy/numpy and forced to do a tight loop in actual Python.

My most recent example: a dynamic programming-based optimization task. It can't be readily vectorized and pure Python version is just too slow.

An introduction to SciPy

Posted Jan 20, 2021 14:36 UTC (Wed) by mikapfl (subscriber, #84646) [Link] (4 responses)

In that case, use https://numba.pydata.org/ . Just-in-time compilation happens under the hood, and the result is usually faster than any C++ code scientists write. Integration with the rest of python is great and you can easily drop the big and bad GIL. For me, it really is the missing piece in the high performance python ecosystem.

An introduction to SciPy

Posted Jan 20, 2021 18:36 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

These are all workarounds for inherent Python brokenness and they don't work well unless you tailor you code to do that. Cython is similar.

An introduction to SciPy

Posted Jan 21, 2021 15:55 UTC (Thu) by mikapfl (subscriber, #84646) [Link] (1 responses)

Well, all software are workarounds for hardware's inherent brokenness. I'm an engineer, I don't care if something is "just a workaround" as long as it works and is fast.

An introduction to SciPy

Posted Jan 21, 2021 21:23 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

The problem with workarounds is that the problem doesn't actually go away. And likely to bite you eventually, especially in large long-lived codebases. We've had this issue with Cython before when a tightly optimized code suddenly started to work incorrectly.

Sometimes you have no choice but to use workarounds. But with a nice alternative (Julia) existing, why bother?

An introduction to SciPy

Posted Jan 21, 2021 16:03 UTC (Thu) by marcel.oliver (subscriber, #5441) [Link]

Just a data point: I have some, admittedly simple, Cython scripts that survived the Python 2 to 3 transition without change, which is more than can be said for a lot of other Python code, even though the transition of scientific software was generally relatively painless.

One of these is a simple tridiagonal solver, which is more than 3 times faster than Scipy's solve_banded (which shows that built-in functions are not always best as, in this case, the built-in solver is more general) and about 120 times faster than native Python.

So there are certainly speed and GIL issues in Python in serious need of fixing (maybe Numba is the way forward, but I did not have the need to try it yet). And, of course, Julia is an interesting and clean approach (that would probably have eaten Scipy's lunch if it was around 10 years earlier) but the tradeoffs for switching Ecosystems are still not so clear, especially since heavy-duty numerical code is still dominated, for good reason, by Fortran and C/C++.

The big advantage of Python is that the language and software landscape is not domain specific, yet it works very well and very naturally for array-based computation and data processing. So it's pretty seemless from symbolic computation (Sympy) via numerics to dedicated plotting, output, GUI, and networking components.

An introduction to SciPy

Posted Jan 19, 2021 23:21 UTC (Tue) by mhvk (subscriber, #86585) [Link]

It really depends. For work involving large arrays, etc., numpy/scipy will often do better than at least simple-minded hand-rolled C loops because of CPU vectorization in the numpy functions. It really helps not having to do that type of optimization oneself. But I'll readily agree that julia seems like it might have that benefit and be faster generally. I'm hoping to get around to playing with it.

Marten

(Writing this as a past numpy maintainer and current astropy maintainer.)

An introduction to SciPy

Posted Jan 19, 2021 23:57 UTC (Tue) by Kamiccolo (subscriber, #95159) [Link]

Ufff... Every single time someone calls NumPy a part of Python ecosystem - makes me shake and spill some foam from my mouth... But yeah, this may just be personal preferences and idealogical look.

Talking about math problems and Python, I'd say that SageMath is usually overlooked and really worth a mention. Despite being popular only in some geographical locations, during years it was a tool to go. Again, personally :}

An introduction to SciPy

Posted Jan 20, 2021 3:23 UTC (Wed) by rillian (subscriber, #11344) [Link] (1 responses)

I just wanted to say I've really been enjoying these articles. As a casual science data user, it's great to have an overview of the various tools and what they're for.

Is astropy somewhere in the queue?

An introduction to SciPy

Posted Jan 20, 2021 4:14 UTC (Wed) by leephillips (subscriber, #100450) [Link]

Astropy is an interesting project, and certainly worth covering. For now I would suggest just staying tuned.

ODE: Unfortunate notation

Posted Jan 20, 2021 19:09 UTC (Wed) by tnoo (subscriber, #20427) [Link] (3 responses)

The notation of the ODE example is quite unfortunate. Better would be x' = x-x^2. Then the function would not consist of f's only. But it nicely illustrates the scoping: the f variable is only visible within the f function.

def f(t,x):
return x - x**2


ODE: Unfortunate notation

Posted Jan 20, 2021 20:31 UTC (Wed) by leephillips (subscriber, #100450) [Link] (2 responses)

I see your stylistic point; perhaps using another symbol would make it clearer that f is just a placeholder.

ODE: Unfortunate notation

Posted Jan 21, 2021 15:35 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

Yes. The first time I read it I was thinking "how much monkey patching is going on to allow the arithmetic of functions?", but then I read the parameter names and it made sense.

ODE: Unfortunate notation

Posted Jan 23, 2021 16:56 UTC (Sat) by mkbosmans (subscriber, #65556) [Link]

Or simply call the function that defines the derivative of f: df(t, f).


Copyright © 2021, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds