Note
Go to the end to download the full example code.
ASPIRE-Python Introduction¶
In this notebook we will introduce the core API components, then demonstrate basic usage corresponding to topics from Princeton’s MAT586.
Installation¶
ASPIRE can generally install on Linux, Mac, and Windows under Anaconda Python, by following the instructions in the README. The instructions for developers is the most comprehensive. Windows is provided, but generally Linux and MacOS are recommended, with Linux being the most diversely tested platform.
Princeton Research Computing¶
ASPIRE requires some resources to run, so if you wouldn’t run
typical data science codes on your machine (a netbook for example),
you may use Tiger/Adroit/Della at Princeton or another cluster.
After logging into Tiger, module load anaconda3/2020.7
and
continue to follow the Anaconda instructions for developers in the
link above. Those instructions should create a working environment
for tinkering with ASPIRE code found in this notebook.
Imports¶
First we import some typical scientific computing packages.
Along the way we will import relevant components from aspire
.
Users may also import aspire
once as a top level package.
import os
import matplotlib.pyplot as plt
import numpy as np
import aspire
from aspire.image import Image
API Primitives¶
The ASPIRE framework is a collection of modules containing interoperable extensible components. Underlying the more sophisticated components and algorithms are some core data structures. Sophisticated components are designed to interoperate by exchanging, consuming, or producing these basic structures. The most common structures encountered when starting out are:
Component |
Description |
---|---|
|
Utility class for stacks of 1D arrays. |
|
Utility class for stacks of 2D arrays. |
|
Utility class for stacks of 3D arrays. |
|
Utility class for stacks of 3D rotations. |
|
Constructs and applies Image filters. |
|
Basis conversions and operations. |
|
Produces primitive components. |
Image
Class¶
The Image
class is a thin wrapper over Numpy arrays for a stack containing 1
or more images (2D data). In this notebook we won’t be working
directly with the Image
class a lot, but it will be one of the
fundamental structures behind the scenes. A lot of ASPIRE code
passes around Image
and Volume
instances.
Create an Image
instance from random data.
img_data = np.random.random((100, 100))
img = Image(img_data)
print(f"img shape: {img.shape}") # Note this produces a stack of one.
print(f"str(img): {img}")
img shape: (1, 100, 100)
str(img): 1 float64 images arranged as a (1,) stack each of size 100x100.
Create an Image for a stack of 3 100x100 images.
img_data = np.random.random((3, 100, 100))
img = Image(img_data)
Most often, Images will behave like Numpy arrays, but you
explicitly access the underlying Numpy array via asnumpy()
.
img.asnumpy()
array([[[4.34552719e-01, 7.29654729e-01, 8.36750600e-01, ...,
3.76354926e-01, 2.65495127e-01, 1.17138097e-01],
[2.00951906e-01, 6.31069882e-01, 4.65633635e-01, ...,
8.77719829e-01, 8.51196733e-01, 1.07659767e-01],
[6.35708110e-01, 7.68790710e-01, 9.45219394e-01, ...,
5.65141649e-01, 1.68244236e-01, 3.68954654e-01],
...,
[9.14639519e-01, 8.47973986e-01, 2.62996975e-01, ...,
1.22017013e-01, 3.07107304e-01, 8.50005769e-01],
[4.71047832e-01, 1.61708103e-01, 4.35955099e-01, ...,
8.76791338e-04, 9.20673941e-01, 9.41400066e-01],
[7.84916114e-01, 1.10613362e-01, 2.25092772e-01, ...,
5.20954672e-01, 5.42104200e-01, 2.49968869e-01]],
[[4.07851100e-01, 7.85901787e-01, 3.16291140e-01, ...,
9.83459648e-01, 3.92849708e-01, 8.39893498e-01],
[5.45577757e-01, 9.22437856e-02, 3.45380598e-01, ...,
1.43859671e-01, 2.29184541e-02, 4.49518945e-01],
[2.77721275e-01, 3.65091275e-01, 3.22263570e-01, ...,
8.99016179e-03, 5.00048055e-02, 7.01097541e-02],
...,
[4.48511124e-01, 1.84966934e-01, 1.61485915e-01, ...,
7.72619065e-01, 2.53127576e-01, 7.69409872e-01],
[4.26712111e-01, 8.35328180e-01, 6.44135568e-01, ...,
6.96762164e-01, 8.35673379e-01, 5.33005240e-01],
[9.71827390e-01, 7.36810171e-01, 1.64204089e-01, ...,
8.49511792e-01, 8.68211569e-01, 8.67439518e-01]],
[[4.27315008e-01, 5.09239348e-01, 6.46705828e-01, ...,
2.26147516e-01, 8.29368634e-01, 1.08701379e-01],
[6.89570594e-01, 6.55525881e-01, 7.30912469e-01, ...,
3.88491975e-01, 6.26856318e-01, 5.48353339e-01],
[3.38603338e-01, 5.36036924e-01, 1.72328864e-01, ...,
5.54249676e-01, 1.16676474e-01, 9.86072437e-01],
...,
[6.12168951e-01, 2.49433506e-01, 6.63894230e-01, ...,
4.25785351e-01, 6.18057969e-01, 6.16366949e-02],
[8.75509298e-01, 4.69250052e-01, 5.40214514e-02, ...,
9.30567672e-02, 8.62501125e-01, 8.17533180e-02],
[5.46407798e-01, 1.76727817e-01, 4.98008108e-01, ...,
1.84165486e-01, 9.54401201e-01, 2.77238715e-01]]])
Images have a built in show()
method, which works well for
peeking at data.
img.show()
Note
The user is responsible for using show
responsibly. Avoid
asking for large numbers of images that you would not normally
plot. Ten or less is reasonable.
More examples using the Image class can be found in:
Volume
Class¶
Like Image
, the Volume
class is a thin wrapper over Numpy arrays that provides specialized
methods for a stack containing one or more volumes (3D data).
Initialize Volume - load
¶
A Volume
may be instantiated with Numpy data similarly to
Image
. Both Image
and Volume
provide save
and
load
methods which can be used to work with files. For
Volumes
.map
and .mrc
are currently supported. For
.npy
, Numpy can be used.
For example, in the following note we demonstrate instantiating an ASPIRE Volume
instance using Volume.load()
:
Note
Instantiate an ASPIRE Volume from file:
from aspire.volume import Volume
aspire_volume = Volume.load("/path/to/volume.mrc")
In addition to the Volume.load()
method, a few common
starting datasets can be downloaded from EMDB using ASPIRE’s downloading
utility. Below we download the high resolution volume map EMDB-2660, sourced from
https://www.ebi.ac.uk/pdbe/entry/emdb/EMD-2660.
from aspire.downloader import emdb_2660
vol = emdb_2660()
Downsample Volume¶
Here we downsample the above volume to a desired image size (64 should be good).
img_size = 64
# Volume.downsample() returns a new Volume instance.
# We will use this lower resolution volume later, calling it `v2`.
vol_ds = vol.downsample(img_size)
# L is often used as short hand for image and volume sizes (in pixels/voxels).
L = vol_ds.resolution
Plot Data¶
- For quick sanity checking purposes we can view some plots.
We’ll use three orthographic projections, one per axis.
orthographic_projections = np.empty((3, L, L), dtype=vol_ds.dtype)
for i in range(3):
orthographic_projections[i] = np.sum(vol_ds, axis=(0, i + 1))
Image(orthographic_projections).show()
Rotation
Class¶
While you may bring your own 3x3 matrices or generate manually (say
from your own Euler angles), ASPIRE has a Rotation class
which can do this random rotation generation for us. It also has
some other utility methods, including support for Rodrigues
rotations (ie, axis-angle). Other ASPIRE components dealing with 3D
rotations will generally expect instances of Rotation
.
A common task in computational cryo-EM is generating random
projections, by applying random 3D rotations to a volume and projecting along the z-axis.
The following code will generate some random rotations,
and use the Volume.project()
method to return an Image
instance representing the stack of projections. We can display
projection images using the Image.show()
method.
from aspire.utils import Rotation
num_rotations = 2
rots = Rotation.generate_random_rotations(n=num_rotations, seed=12345)
We can access the Numpy array holding the actual stack of 3x3 matrices:
print(rots)
print(rots.matrices)
Rotation stack consisting of 2 elements of float32 type
[[[ 0.34373102 -0.62564886 0.7002946 ]
[-0.61950386 -0.7115173 -0.33159938]
[ 0.70573646 -0.3198542 -0.63216245]]
[[ 0.3189953 0.88963485 -0.3267902 ]
[ 0.67460066 0.02905637 0.7376108 ]
[ 0.6656996 -0.45574725 -0.5908794 ]]]
Using the Volume.project()
method we compute
projections using the stack of rotations:
projections = vol.project(rots)
print(projections)
2 float32 images arranged as a (2,) stack each of size 360x360 with pixel_size=1.340000033378601 angstroms.
project()
returns an Image instance, so we can call show
.
projections.show()
Neat, we’ve generated random projections of some real data. This tutorial will go on to show how this can be performed systematically with other cryo-EM data simulation tasks.
The filter
Package¶
Filters
are a collection of classes which once configured can be applied to
Images
, typically in an ImageSource
pipeline which will be
discussed in a later section. Specifically, applying a Filter
convolves the filter with the images contained in the Image
instance.
classDiagram class Filter{ +evaluate() +basis_mat() +scale() +evaluate_grid() +dual() +sign() } Filter o-- FunctionFilter Filter o-- ArrayFilter Filter o-- ScalarFilter Filter o-- ZeroFilter Filter o-- IdentityFilter Filter o-- CTFFilter CTFFilter o-- RadialCTFFilter
CTFFilter
and RadialCTFFilter
are the most common filters
encountered when starting out and are detailed in
CTF: Contrast Transfer Function. The other filters
are used behind the scenes in components like NoiseAdders
or
more advanced customized pipelines. Several filters for internal or
advanced use cases are omitted from the diagram, but can be found in
the aspire.operators.filter module.
Basis
¶
ASPIRE provides a selection of Basis
classes designed for
working with cryo-EM data in two and three dimensions. Most of
these basis implementations are optimized for efficient rotations,
often called the “steerable” property. As of this writing most
algorithms in ASPIRE are written to work well with the fast
Fourier-Bessel (FFB) basis classes FFBBasis2D
and
FFBBasis3D
. These correspond to direct slower reference
FBBasis2D
and FBBasis3D
classes.
Recently, a related Fourier-Bessel method using fast Laplacian
eigenfunction (FLE) transforms was integrated as FLEBasis2D
.
Additional prolate spheroidal wave function (PSWF) methods are
available via FPSWFBasis2D
and FPSWFBasis3D
, but their
integration into other components like 2D covariance analysis is
incomplete, and slated for a future release.
The source
Package¶
The aspire.source
package contains a collection of data source interfaces.
Ostensibly, a Source
is a producer of some primitive type, most
notably Image
. ASPIRE components that consume (process) images
are designed to accept an ImageSource
.
The first reason for this is to normalize the way a wide variety of
higher-level components interface. ImageSource
instances have a
consistent property images
which must be implemented to serve up
images dynamically using a square-bracket []
syntax familiar to
Numpy users. This supports batch computation among other things.
Source
instances also store and serve up metadata like
rotations, dtype, and support pipelining transformations.
The second reason is so we can design an experiment using a
synthetic Simulation
source or our own provided Numpy arrays via
ArrayImageSource
and then later swap out the source for a large
experimental data set using something like RelionSource
.
Experimental datasets can be too large to practically fit or process
entirely in memory, and force the use of iteratively-batched
approaches.
Generally, the source
package attempts to make most of this
opaque to an end user. Ideally we can simply swap one source for
another. For now we will build up to the creation and application
of synthetic data set based on the various manual interactions
above.
classDiagram class ImageSource{ +L +n +dtype ... +images[] +cache() +downsample() +whiten() +phase_flip() +invert_conrast() +normalize_background() +save() +save_images() ... } ImageSource o-- ArrayImageSource ImageSource o-- Simulation ImageSource o-- RelionSource ImageSource o-- CoordinateSource CoordinateSource o-- BoxesCoordinateSource CoordinateSource o-- CentersCoordinateSource
Simulation
Class¶
Generating realistic synthetic data sources is a common task. The
process of generating then projecting random rotations is integrated
into the Simulation
class. Using Simulation
, we can generate arbitrary numbers of
projections for use in experiments. Then additional features are
introduced which allow us to create more realistic data sources.
from aspire.source import Simulation
# Total images in our source.
num_imgs = 100
Generate a Simulation instance based on the original volume data.
sim = Simulation(n=num_imgs, vols=vol)
# Display the first 10 images
sim.images[:10].show() # Hi Res
Repeat for the lower resolution (downsampled) volume vol_ds.
sim = Simulation(n=num_imgs, vols=vol_ds)
sim.images[:10].show() # Lo Res
Note both of those simulations have the same rotations because they
had the same seed by default, We recreate sim
with a distinct
seed to get different random samples (of rotations).
sim = Simulation(n=num_imgs, vols=vol_ds, seed=42)
sim.images[:10].show()
We can also view the rotations used to create these projections.
print(sim.rotations)
[[[ 0.1374932 0.9594606 -0.24603027]
[ 0.9659938 -0.07497529 0.24745627]
[ 0.21897836 -0.27168727 -0.93714166]]
[[ 0.37807834 0.132081 0.9163031 ]
[ 0.40860063 0.8643426 -0.29318497]
[-0.83072394 0.48524892 0.27282083]]
[[ 0.86643094 0.48816934 -0.10482421]
[ 0.09854265 -0.37300435 -0.92258173]
[-0.48947603 0.7890237 -0.37128803]]
[[-0.35693905 0.45857018 -0.81382304]
[ 0.47638273 -0.6600464 -0.58085996]
[-0.803526 -0.5950228 0.01714141]]
[[ 0.15889233 0.93312883 0.32252717]
[-0.87645537 -0.01707341 0.48118034]
[ 0.45450988 -0.35913652 0.8151329 ]]
[[-0.32674402 -0.8130629 0.48183718]
[-0.38360518 0.58002734 0.7186204 ]
[-0.8637624 0.04996968 -0.5014156 ]]
[[-0.34698084 -0.1871273 0.9190145 ]
[ 0.5046363 0.78870445 0.35112315]
[-0.79053557 0.5856011 -0.1792341 ]]
[[-0.8140751 -0.09364705 0.57315964]
[-0.37379512 -0.6708314 -0.64051735]
[ 0.44447604 -0.7356735 0.5111023 ]]
[[ 0.45521852 0.5792567 -0.6761936 ]
[ 0.29612732 -0.81470734 -0.49855846]
[-0.8396932 0.02671366 -0.54240364]]
[[ 0.93491715 0.32658473 -0.13882485]
[ 0.21361442 -0.83031636 -0.5147267 ]
[-0.28337044 0.45157185 -0.84604025]]
[[ 0.43646738 -0.00271971 0.89971596]
[-0.24721187 -0.9618694 0.11701906]
[ 0.865091 -0.27349547 -0.420497 ]]
[[ 0.06204434 -0.68872344 0.7223645 ]
[-0.9629127 -0.23171897 -0.13822247]
[ 0.26258263 -0.686998 -0.6775574 ]]
[[-0.9549782 -0.15465972 0.2531738 ]
[ 0.02725631 -0.8954967 -0.44423273]
[ 0.2954212 -0.417332 0.8593953 ]]
[[-0.9362025 -0.29906207 0.1846258 ]
[ 0.32780164 -0.553521 0.7656113 ]
[-0.12677105 0.77728784 0.6162408 ]]
[[ 0.8579973 0.321847 0.40031877]
[-0.46083617 0.13809928 0.8766748 ]
[ 0.2268714 -0.93666595 0.26680756]]
[[-0.88663757 -0.37402767 0.27198744]
[ 0.45935175 -0.64413124 0.6116297 ]
[-0.05357084 0.6672318 0.7429212 ]]
[[-0.74606377 0.6106348 -0.26554468]
[-0.55859023 -0.35687232 0.748745 ]
[ 0.36244422 0.7069422 0.60734415]]
[[-0.16853625 0.61572605 -0.7697252 ]
[ 0.9855458 0.11886839 -0.12070518]
[ 0.01717467 -0.7789427 -0.6268599 ]]
[[ 0.7561905 -0.33249253 -0.563582 ]
[ 0.5429612 0.79951745 0.2568366 ]
[ 0.3651974 -0.50022054 0.785118 ]]
[[ 0.77301854 -0.58071446 -0.25536844]
[ 0.25237784 -0.08780504 0.9636367 ]
[-0.5820204 -0.8093584 0.07868452]]
[[-0.28316504 -0.7468036 -0.6017491 ]
[ 0.8560133 0.08614567 -0.5097256 ]
[ 0.43250299 -0.6594417 0.6148803 ]]
[[-0.14112386 0.9097205 0.3905032 ]
[-0.8198382 -0.3284985 0.46899253]
[ 0.5549318 -0.2539634 0.7921826 ]]
[[-0.456592 -0.85562974 -0.2437653 ]
[-0.4374926 -0.02264298 0.89893687]
[-0.7746767 0.5170929 -0.36399308]]
[[-0.90090746 0.11707234 -0.4179232 ]
[-0.10193092 0.8789243 0.4659422 ]
[ 0.4218718 0.4623701 -0.7798962 ]]
[[-0.3212181 -0.4951543 -0.80724293]
[-0.94361854 0.2393791 0.22865206]
[ 0.08001903 0.8351765 -0.54412967]]
[[ 0.97377276 0.06877314 0.21687981]
[ 0.22443582 -0.13386007 -0.96525127]
[-0.03735181 0.98861104 -0.14578445]]
[[ 0.3553472 -0.9034085 0.23996124]
[ 0.5433438 0.4085248 0.7334064 ]
[-0.76059574 -0.13023247 0.63602954]]
[[ 0.61806667 0.37726 -0.68968725]
[-0.56937826 0.8197457 -0.06184895]
[ 0.54203504 0.4309197 0.7214612 ]]
[[ 0.29776493 0.94447404 -0.13894197]
[ 0.9458294 -0.31160298 -0.09116108]
[-0.129394 -0.10427082 -0.9860957 ]]
[[ 0.19716634 0.21056637 0.9574901 ]
[-0.7070152 -0.64604855 0.2876643 ]
[ 0.6791575 -0.73367774 0.0214946 ]]
[[-0.5650243 0.2975922 -0.7695365 ]
[ 0.7782365 -0.11755076 -0.616871 ]
[-0.27403557 -0.94742846 -0.16517805]]
[[ 0.28065482 0.8733582 0.39809337]
[ 0.47991556 -0.4868837 0.72981185]
[ 0.83121234 -0.013774 -0.55578446]]
[[ 0.8024307 0.02780645 0.5960972 ]
[-0.1587728 -0.9529598 0.25818375]
[ 0.5752358 -0.30181855 -0.7602692 ]]
[[ 0.32265848 -0.30083787 0.8974342 ]
[-0.05781392 -0.9526389 -0.29855752]
[ 0.94474816 0.04444793 -0.32476965]]
[[ 0.4935579 -0.7422682 0.4532532 ]
[ 0.85551316 0.5081454 -0.09942588]
[-0.15651786 0.4368365 0.8858194 ]]
[[-0.8976416 -0.2857157 0.33556828]
[-0.41022733 0.26334086 -0.8731352 ]
[ 0.1610996 -0.92142177 -0.35359412]]
[[-0.9217232 0.19335265 -0.3362159 ]
[-0.33698434 0.02991741 0.9410348 ]
[ 0.19201028 0.9806731 0.03758125]]
[[ 0.24043691 -0.61969984 0.74710256]
[ 0.35552713 0.7724179 0.5262804 ]
[-0.90321124 0.13907799 0.40603787]]
[[-0.79218084 -0.47242948 -0.38634163]
[ 0.16552027 0.4430016 -0.88110876]
[ 0.5874117 -0.76194483 -0.27274075]]
[[-0.71742296 0.624828 -0.30804914]
[-0.68107444 -0.7220424 0.12162399]
[-0.14643046 0.29706025 0.9435642 ]]
[[ 0.87354153 -0.40243074 0.27381516]
[ 0.33415928 0.9048438 0.26380917]
[-0.35392484 -0.13895038 0.9248946 ]]
[[ 0.48421115 0.11260304 -0.8676751 ]
[ 0.27090228 -0.96224743 0.02630238]
[-0.83195645 -0.24779108 -0.49643534]]
[[ 0.10466753 -0.18718329 0.9767329 ]
[-0.49893624 0.83970237 0.21438888]
[-0.86029494 -0.509767 -0.00550296]]
[[ 0.62265253 -0.1254365 0.7723791 ]
[ 0.46602726 -0.7334752 -0.49480575]
[ 0.6285876 0.66804177 -0.39824337]]
[[-0.07320143 -0.99607444 -0.04977208]
[-0.4329861 -0.01321644 0.9013037 ]
[-0.8984234 0.08752733 -0.43031895]]
[[ 0.06439321 0.97830254 -0.19692044]
[ 0.938444 -0.12647285 -0.32144594]
[-0.33937648 -0.16409987 -0.92622614]]
[[-0.32914323 0.8692355 -0.36890975]
[-0.34743685 0.2517924 0.9032653 ]
[ 0.87803894 0.4254765 0.21912868]]
[[-0.03135481 0.12190165 -0.99204683]
[ 0.20567426 -0.9705069 -0.12575543]
[-0.97811806 -0.20798153 0.00535803]]
[[ 0.7707965 0.4763445 -0.42304692]
[ 0.4696665 -0.8735372 -0.12785189]
[-0.43044877 -0.10014319 -0.89704245]]
[[ 0.62561727 -0.6937239 0.3568615 ]
[-0.5623879 -0.08402125 0.8225936 ]
[-0.54066896 -0.7153234 -0.44270706]]
[[-0.0384063 -0.82296616 0.5667907 ]
[ 0.98650855 -0.12155963 -0.10965481]
[ 0.15914106 0.5549324 0.8165318 ]]
[[ 0.71410507 -0.68704677 0.13424107]
[-0.28213766 -0.45796394 -0.8430109 ]
[ 0.6406655 0.5641239 -0.52087617]]
[[-0.6900714 -0.31053716 0.65373397]
[-0.5909326 0.76325965 -0.26121518]
[-0.41785175 -0.56656986 -0.7102102 ]]
[[ 0.56709814 -0.23533636 0.7893139 ]
[ 0.71543354 -0.3340903 -0.6136274 ]
[ 0.40811095 0.9126886 -0.02109447]]
[[ 0.12042121 0.97353137 -0.1942559 ]
[-0.98601013 0.09457708 -0.13725594]
[-0.1152508 0.20806682 0.9713009 ]]
[[ 0.26161245 -0.60064805 0.75550044]
[-0.5359779 -0.74138415 -0.40382826]
[ 0.80267465 -0.29928502 -0.51588935]]
[[ 0.4752947 -0.37172344 0.7974438 ]
[-0.1643763 0.852889 0.4955409 ]
[-0.86433524 -0.36660883 0.3442711 ]]
[[ 0.83131063 0.47794443 0.28371063]
[-0.4780851 0.35454985 0.80357265]
[ 0.28347355 -0.80365634 0.5232392 ]]
[[ 0.5751311 0.03900156 0.817131 ]
[-0.27594915 -0.93107 0.23866446]
[ 0.7701144 -0.36274996 -0.5247249 ]]
[[-0.6807762 -0.6099538 -0.4055862 ]
[ 0.07149926 -0.6063983 0.79194 ]
[-0.7289936 0.51013476 0.45643273]]
[[ 0.27400538 0.6167242 -0.73795146]
[ 0.63375145 0.46136382 0.6208885 ]
[ 0.72338104 -0.63780457 -0.2644337 ]]
[[ 0.02657236 -0.9912923 -0.12897094]
[ 0.27007505 -0.11709963 0.955692 ]
[-0.96247256 -0.06022682 0.2646117 ]]
[[ 0.75919664 0.46289822 0.4575431 ]
[ 0.21626568 0.4836275 -0.84813535]
[-0.61388075 0.7428524 0.2670594 ]]
[[ 0.13419726 0.7730551 -0.6199814 ]
[ 0.01575051 0.62389755 0.7813474 ]
[ 0.99082947 -0.11461972 0.07154934]]
[[ 0.7969387 -0.5938283 -0.11070973]
[-0.44239753 -0.6985636 0.5623996 ]
[-0.41130662 -0.39922032 -0.8194205 ]]
[[-0.2271311 -0.6608967 -0.71528107]
[ 0.9729143 -0.12155605 -0.19662634]
[ 0.04300295 -0.74056715 0.670605 ]]
[[ 0.8058346 0.03859079 0.5908818 ]
[-0.50750047 -0.4691095 0.7227583 ]
[ 0.3050801 -0.88229644 -0.35843983]]
[[-0.84541976 -0.47147918 0.25094387]
[-0.48244947 0.47254407 -0.737526 ]
[ 0.22914612 -0.74458677 -0.62696296]]
[[ 0.17157505 0.919783 0.35293233]
[ 0.95262057 -0.24622187 0.17857464]
[ 0.25114956 0.30557165 -0.9184497 ]]
[[ 0.03022777 -0.19667654 0.98000234]
[ 0.96350527 -0.25516328 -0.0809276 ]
[ 0.26597717 0.9466837 0.18178587]]
[[-0.9041375 0.4069089 0.1302327 ]
[-0.2578785 -0.276723 -0.9257014 ]
[-0.34063777 -0.8705455 0.3551287 ]]
[[ 0.76876014 -0.6344005 0.0808938 ]
[-0.632561 -0.7356355 0.24229503]
[-0.09420373 -0.23743702 -0.96682435]]
[[ 0.04193976 -0.00535577 0.9991058 ]
[-0.7404277 0.6712408 0.03467938]
[-0.6708263 -0.74122 0.02418612]]
[[-0.6622805 0.6703488 0.33468935]
[ 0.2214073 0.6018422 -0.7673101 ]
[-0.7157956 -0.43407184 -0.5470085 ]]
[[ 0.00894121 -0.96657425 -0.2562309 ]
[ 0.2980998 0.24716657 -0.92197895]
[ 0.9544928 -0.06813876 0.29034555]]
[[-0.09509465 -0.9904526 -0.09980346]
[-0.64432764 0.13766429 -0.75225693]
[ 0.75881416 -0.00722948 -0.6512671 ]]
[[-0.93059725 0.34470695 0.12314969]
[-0.2402517 -0.32136244 -0.91597235]
[-0.27616635 -0.8819883 0.38187546]]
[[ 0.48192862 0.10002194 0.87048286]
[-0.66315746 -0.60768265 0.43697143]
[ 0.57268405 -0.7878562 -0.22652936]]
[[ 0.90507764 -0.29453734 -0.30672824]
[ 0.39986134 0.83494043 0.37813416]
[ 0.14472517 -0.46488953 0.87346 ]]
[[ 0.48447588 -0.70774704 0.51417625]
[-0.8526892 -0.25072426 0.4583214 ]
[-0.19545916 -0.66047823 -0.7249581 ]]
[[-0.64288104 0.45185107 0.6184938 ]
[-0.23015156 0.6562045 -0.71862775]
[-0.73057115 -0.6043395 -0.31786725]]
[[ 0.30517054 -0.8374807 -0.4533178 ]
[-0.89213204 -0.0849021 -0.44372523]
[ 0.33312368 0.53983116 -0.773053 ]]
[[-0.32206497 0.9112062 -0.25686076]
[-0.869473 -0.17734958 0.46104652]
[ 0.37455428 0.37182042 0.8493872 ]]
[[ 0.3610673 -0.7101037 0.6044693 ]
[ 0.7567135 0.60192007 0.2551017 ]
[-0.5449909 0.36530122 0.7546787 ]]
[[ 0.27847555 0.90301734 -0.3271255 ]
[ 0.5723424 0.11749125 0.811554 ]
[ 0.7712817 -0.41322574 -0.48411667]]
[[-0.3412885 -0.83527166 -0.43109557]
[ 0.17580529 -0.50726277 0.84366876]
[-0.9233714 0.21214558 0.3199681 ]]
[[ 0.2919039 -0.9513335 -0.09877586]
[ 0.5871929 0.25977072 -0.76663136]
[ 0.75498116 0.16578221 0.63444436]]
[[-0.1322668 -0.75247324 -0.645205 ]
[ 0.2528478 0.60377645 -0.75599074]
[ 0.9584224 -0.26313114 0.11040162]]
[[ 0.62349445 -0.19195779 0.75789636]
[ 0.7480926 -0.13520728 -0.6496741 ]
[ 0.2271831 0.9720449 0.05930121]]
[[-0.33402812 0.42084855 -0.84339297]
[ 0.6074924 0.7802727 0.14875284]
[ 0.720679 -0.4626672 -0.5162954 ]]
[[-0.7137928 -0.55692726 0.424655 ]
[-0.4062189 0.8231522 0.39674512]
[-0.57051384 0.11069094 -0.8137945 ]]
[[ 0.04948919 0.98905236 -0.13901873]
[-0.8053667 -0.04280363 -0.59122956]
[-0.5907075 0.14122052 0.7944315 ]]
[[-0.87180173 0.4881775 0.04055187]
[-0.41275278 -0.6874722 -0.597509 ]
[-0.26381218 -0.5376473 0.8008361 ]]
[[ 0.44658735 -0.04983776 -0.89335096]
[-0.64957654 0.6685733 -0.3620222 ]
[ 0.615313 0.74197435 0.26620296]]
[[ 0.68048185 0.72213393 0.12436624]
[ 0.31681323 -0.13690118 -0.938556 ]
[-0.6607373 0.67807126 -0.32194036]]
[[-0.29298428 0.08057309 -0.95271623]
[ 0.15083717 0.98785996 0.03715903]
[ 0.94414425 -0.132818 -0.30158085]]
[[-0.19727235 -0.42594904 -0.8829785 ]
[ 0.97395647 -0.18783951 -0.12698455]
[-0.1117693 -0.8850332 0.4519114 ]]
[[-0.58162045 0.6032319 -0.545737 ]
[-0.80308783 -0.53260213 0.2671795 ]
[-0.12948951 0.59367186 0.7942205 ]]
[[-0.43661723 0.6471804 0.62491834]
[-0.77699894 -0.621401 0.10066503]
[ 0.45347333 -0.4416088 0.77417284]]
[[ 0.05435671 -0.76187783 0.64543587]
[ 0.69972587 0.49019638 0.51970303]
[-0.71234053 0.42337883 0.5597511 ]]]
Given any Source
, we can also take slices using typical slicing
syntax, or provide our own iterable of indices.
sim_evens = sim[0::2]
sim_odds = sim[1::2]
# We can also generate random selections.
# Shuffle indices then take the first 5.
shuffled_inds = np.random.choice(sim.n, 5, replace=False)
sim_shuffled_subset = sim[shuffled_inds]
Underneath those slices, ASPIRE relies on IndexedSource
, which
we can also call direcly to remap indices.
from aspire.source import IndexedSource
sim_shuffled_subset = IndexedSource(sim, shuffled_inds)
The noise
Package¶
The aspire.noise package contains several useful classes for generating and estimating different types of noise.
NoiseAdder
¶
NoiseAdder
subclasses are used to add common or customized noise
to Simulation
image generation pipelines.
WhiteNoiseAdder
¶
WhiteNoiseAdder
is the most common type of synthetic noise.
from aspire.noise import WhiteNoiseAdder
Get the sample variance, then create a NoiseAdder based on that variance.
var = np.var(sim.images[:].asnumpy())
print(f"Sample Variance: {var}")
target_noise_variance = 10.0 * var
print(f"Target Noise Variance: {target_noise_variance}")
white_noise_adder = WhiteNoiseAdder(target_noise_variance)
Sample Variance: 3.230215224903077e-05
Target Noise Variance: 0.0003230215224903077
We can customize Sources by adding stages to their generation
pipeline. In this case of a Simulation source, we want to corrupt
the projection images with noise. Internally the
WhiteNoiseAdder
creates a ScalarFilter
which is multiplied
(convolution) by a Gaussian random sample. Similar to before, if
you require a different sample, this can be controlled via a
seed
.
# Creating the new simulation with this additional noise is easy:
sim = Simulation(n=num_imgs, vols=vol_ds, noise_adder=white_noise_adder)
# These should be rather noisy now ...
sim.images[:10].show()
WhiteNoiseEstimator
¶
We can estimate the noise across an ImageSource
, and
we’ve generated a simulation with known noise variance.
Lets see how the estimate compares.
In this case, we know the noise to be white, so we can proceed directly to
WhiteNoiseEstimator.
The noise estimators consume from an ImageSource
.
The white noise estimator should log a diagnostic variance value.
Internally, it also uses the estimation results to build a
Filter
which can be used in more advanced denoising methods.
from aspire.noise import WhiteNoiseEstimator
noise_estimator = WhiteNoiseEstimator(sim)
noise_estimator.estimate()
0%| | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 2.26it/s]
100%|██████████| 1/1 [00:00<00:00, 2.25it/s]
0.00031902745831757784
A Custom FunctionFilter
¶
We will now apply some more interesting noise, using a custom
function, and then apply a whitening
process to our data.
Using FunctionFilter
we can create our own custom functions to
apply in a pipeline. Here we want to apply a custom noise function.
We will use a function of two variables for this example.
from aspire.noise import CustomNoiseAdder
from aspire.operators import FunctionFilter
def noise_function(x, y):
return 1e-7 * np.exp(-(x * x + y * y) / (2 * 0.3**2))
# In Python, functions are first class objects. We take advantage of
# that to pass this function around as a variable. The function is
# evaluated later, internally, during pipeline execution.
custom_noise = CustomNoiseAdder(noise_filter=FunctionFilter(noise_function))
# Create yet another Simulation source to tinker with.
sim = Simulation(n=num_imgs, vols=vol_ds, noise_adder=custom_noise)
sim.images[:10].show()
Noise Whitening¶
We will now combine a more advanced noise estimation technique with
an ImageSource
preprocessing method whiten
.
First an anisotropic noise estimate is performed.
from aspire.noise import AnisotropicNoiseEstimator
# Estimate noise.
aiso_noise_estimator = AnisotropicNoiseEstimator(sim)
0%| | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 2.94it/s]
100%|██████████| 1/1 [00:00<00:00, 2.94it/s]
Applying the Simulation.whiten()
method requires passing a
corresponding NoiseEstimator
instance. Then we can inspect some
of the whitened images. While noise is still present, we can see a
dramatic change.
# Whiten based on the estimated noise.
sim = sim.whiten(aiso_noise_estimator)
What do the whitened images look like?
sim.images[:10].show()
Common Image Corruptions¶
Simulation
provides several configurable types of common cryo-EM
image corruptions. Users should be aware that amplitude and offset
corruption is enabled by default.
Amplitudes¶
Simulation automatically generates random amplitude variability.
To disable, set to amplitudes=1
.
Offsets¶
Simulation automatically generates random offsets.
To disable, set to offsets=0
.
Noise¶
By default, no noise corruption is configured.
To enable, see NoiseAdder
components.
CTF¶
By default, no CTF corruption is configured.
To enable, we must configure one or more CTFFilter
instances.
Usually we will create a range of filters for a variety of
defocus levels.
from aspire.operators import RadialCTFFilter
# Radial CTF Filter params.
defocus_min = 15000 # unit is angstroms
defocus_max = 25000
defocus_ct = 7
# Generate several CTFs.
ctf_filters = [
RadialCTFFilter(pixel_size=vol_ds.pixel_size, defocus=d)
for d in np.linspace(defocus_min, defocus_max, defocus_ct)
]
Combining into a Simulation¶
Here we’ll combine the parameters above into a new simulation.
sim = Simulation(
n=num_imgs,
vols=vol_ds,
amplitudes=1,
offsets=0,
noise_adder=white_noise_adder,
unique_filters=ctf_filters,
seed=42,
)
# Simulation has two unique accessors ``clean_images`` which disables
# noise, and ``projections`` which are clean uncorrupted projections.
# Both act like calls to `image` and return show-able ``Image``
# instances.
Clean projections.
sim.projections[:3].show()
Images with only CTF applied.
sim.clean_images[:3].show()
And now the first four corrupted images.
sim.images[:4].show(columns=2, colorbar=False)
Note
Above the show
call has been customized as a 2 column grid
with out colorbar legend.
Real Experimental Data - RelionSource
¶
Now that we have some basics, we can try to replace the simulation with a real experimental data source.
from aspire.source import RelionSource
src = RelionSource(
"data/sample_relion_data.star",
data_folder="",
pixel_size=5.0,
max_rows=1024,
)
Add downsampling to the src
pipeline.
src = src.downsample(img_size)
RelionSource
will auto-populate CTFFilter
instances from the
STAR file metadata when available. Having these filters allows us to
perform a phase flipping correction.
src = src.phase_flip()
Display the experimental data images.
src.images[:10].show()
Pipeline Roadmap¶
Now that the primitives have been introduced we can explore higher-level components. The higher-level components are designed to be modular and cacheable (to memory or disk) to support experimentation with entire pipelines or focused algorithmic development on specific components. Most pipelines will follow a flow of data and components moving mostly left to right in the table below. This table is not exhaustive, but represents some of the most common components.
Image Processing |
Ab initio |
|||
---|---|---|---|---|
Data |
Preprocessing |
Denoising |
Orientation |
3D Reconstruction |
Simulation |
NoiseEstimator |
Class Averaging |
CLSyncVoting |
MeanVolumeEstimator |
RelionSource |
downsample |
cov2d (CWF) |
CLSymmetryC2 |
|
CoordinateSource |
whiten |
CLSymmetryC3C4 |
||
phase_flip |
CLSymmetryCn |
|||
normalize_background |
CommonlineSDP |
|||
CTFEstimator |
We’re now ready to explore a small example end-to-end ab initio pipeline using simulated data. Ab-initio Pipeline Demonstration
Larger simulations and experiments based on EMPIAR data can be found in Experiments.
Total running time of the script: (0 minutes 29.666 seconds)