aspire.source package

Submodules

aspire.source.coordinates module

class aspire.source.coordinates.BoxesCoordinateSource(files, particle_size=None, max_rows=None, B=0, symmetry_group=None)

Bases: CoordinateSource

Represents a data source consisting of micrographs and coordinate files in box format.

Parameters:
  • files – A list of tuples of the form (path_to_mrc, path_to_coord)

  • max_rows – Maximum number of particles to read. (If None, will attempt to load all particles)

  • symmetry_group – A SymmetryGroup object or string corresponding to the symmetry of the molecule.

Particle_size:

Desired size of cropped particles (will override the size specified in coordinate file)

class aspire.source.coordinates.CentersCoordinateSource(files, particle_size, max_rows=None, B=0, symmetry_group=None)

Bases: CoordinateSource

Represents a data source consisting of micrographs and coordinate files specifying particle centers only. Files can be text (.coord) or STAR files.

Parameters:
  • files – A list of tuples of the form (path_to_mrc, path_to_coord)

  • max_rows – Maximum number of particles to read. (If None, will

Particle_size:

Desired size of cropped particles (mandatory)

attempt to load all particles) :param symmetry_group: A SymmetryGroup object or string corresponding to the symmetry of the molecule.

class aspire.source.coordinates.CoordinateSource(files, particle_size, max_rows, B, symmetry_group)

Bases: ImageSource, ABC

Base class defining common methods for data sources consisting of full micrographs coupled with files specifying the locations of picked particles in each micrograph.

Broadly, there are two ways this information is represented. Sometimes each coordinate is simply the (X,Y) center location of the picked particle. This is sometimes stored in a .coord text file, and sometimes in a STAR file. The particle box is then computed using an externally supplied box size. These sources may be loaded via the CentersCoordinateSource class for both filetypes.

Other formats adhere to the box file specification, which specifies a particle via four numbers: (lower left X coordinate, lower left Y coordinate, X size, Y size). These can be loaded via the BoxesCoordinateSource class.

Regardless of source, the coordinates of each particle are represented internally in the box format.

Particle information is extracted from the micrographs and coordinate files and put into a common data structure (self.particles).

The _images() method, called via ImageSource.images() crops the particle images out of the micrograph and returns them as a stack. This also allows the CoordinateSource to be saved to an .mrcs stack.

A cryo-EM ImageSource object that supplies images along with other parameters for image manipulation.

Parameters:
  • L – resolution of (square) images (int)

  • n – The total number of images available Note that images() may return a different number of images based on its arguments.

  • metadata – A Dataframe of metadata information corresponding to this ImageSource’s images

  • memory – str or None The path of the base directory to use as a data store or None. If None is given, no caching is performed.

  • symmetry_group – A SymmetryGroup instance or string indicating the underlying symmetry of the molecule. Defaults to the IdentitySymmetryGroup, which represents an asymmetric particle, if none provided.

  • pixel_size – Pixel size of the images in angstroms, default None.

import_aspire_ctf(ctf)

Import CTF information from STAR file(s) generated by ASPIRE’s CTF Estimator.

Parameters:

ctf – A path or iterable of paths to STAR files from ASPIRE’s CTF Estimator. Note that number of files provided must match number of micrographs in this CoordinateSource.

import_relion_ctf(ctf)

Import CTF information Relion micrograph STAR files containing CTF information.

Parameters:

ctf – Path to a Relion micrograph STAR file containing CTF information. Note that number of files provided must match number of micrographs in this CoordinateSource.

aspire.source.image module

class aspire.source.image.ArrayImageSource(im, metadata=None, angles=None, symmetry_group=None, pixel_size=None)

Bases: ImageSource

An ImageSource object that holds a reference to an underlying Image object (a thin wrapper on an ndarray) representing images. It does not produce its images on the fly, but keeps them in memory. As such, it should not be used where large Image objects are involved, but can be used in situations where API conformity is desired.

Note that this class does not provide an _images method, since it populates the _cached_im attribute which, if available, is consulted directly by the parent class, bypassing _images.

Initialize from an Image object.

Parameters:
  • im – An Image or Numpy array object representing image data served up by this ImageSource. In the case of a Numpy array, attempts to create an ‘Image’ object.

  • metadata – A Dataframe of metadata information corresponding to this ImageSource’s images

  • angles – Optional n-by-3 array of rotation angles corresponding to im.

  • symmetry_group – A SymmetryGroup instance or string indicating the underlying symmetry of the molecule.

  • pixel_size – Pixel size of the images in angstroms, default None.

class aspire.source.image.ImageSource(L, n, dtype='double', metadata=None, memory=None, symmetry_group=None, pixel_size=None)

Bases: ABC

When creating an ImageSource object, a ‘metadata’ table holds metadata information about all images in the ImageSource. The number of rows in this metadata table will equal the total number of images supported by this ImageSource (available as the ‘n’ attribute), though reading/writing of images is usually done in chunks.

This metadata table is implemented as a dictionary of numpy arrays.

The ‘values’ in this metadata table are usually primitive types (floats/ints/strings) that are suitable for being read from STAR files, and being written to STAR files. The columns corresponding to these fields begin with a single underscore ‘_’.

In addition, the metadata table may also contain references to Python objects. Filter objects, for example, are stored in this metadata table as references to unique Filter objects that correspond to images in this ImageSource. Several rows of metadata may end up containing a reference to a small handful of unique Filter objects, depending on the values found in other columns (identical Filter objects). For example, a smaller number of CTFFilter objects may apply to subsets of particles depending on the unique “_rlnDefocusU”/”_rlnDefocusV” Relion parameters.

A cryo-EM ImageSource object that supplies images along with other parameters for image manipulation.

Parameters:
  • L – resolution of (square) images (int)

  • n – The total number of images available Note that images() may return a different number of images based on its arguments.

  • metadata – A Dataframe of metadata information corresponding to this ImageSource’s images

  • memory – str or None The path of the base directory to use as a data store or None. If None is given, no caching is performed.

  • symmetry_group – A SymmetryGroup instance or string indicating the underlying symmetry of the molecule. Defaults to the IdentitySymmetryGroup, which represents an asymmetric particle, if none provided.

  • pixel_size – Pixel size of the images in angstroms, default None.

property amplitudes
property angles

Rotation angles in radians.

Returns:

Rotation angles in radians, as a n x 3 array

cache(batch_size=512)

Computes all queued pipeline transformations and stores the generated images in an array. This trades memory for fast image access, and is useful when images will be repeatedly queried since it avoids recomputing on-the-fly.

property class_distances

Returns table of class image distances as (src.n, n_nbors) Numpy array.

Follows same layout as class_indices but holds floats representing the distance (returned by classifier) to the zeroth image in each class.

Note n_nbors is managed by self.classifier and used here for documentation.

Returns:

Numpy array, self.dtype.

property class_indices

Returns table of class image indices as (src.n, n_nbors) Numpy array.

Each row reprsents a class, with the columns ordered by smallest class_distances from the reference image (zeroth columm).

Note n_nbors is managed by self.classifier and used here for documentation.

Returns:

Numpy array, integers.

property class_refl

Returns table of class image reflections as (src.n, n_nbors) Numpy array.

Follows same layout as class_indices but holds booleans that are True when the image should be reflected before averaging.

Note n_nbors is managed by self.classifier and used here for documentation.

Returns:

Numpy array, boolean.

downsample(L)
estimate_noise_power(sample_n=None, support_radius=None, batch_size=512)

Estimate the noise energy of sample_n images using WhiteNoiseEstimator.

Parameters:
  • sample_n – Number of images used for estimate. Defaults to all images in source.

  • support_radius – Pixel radius used for masking signal support. Default of None will compute inscribed circle, self.L // 2.

  • batch_size – Images per batch, defaults 512.

Returns:

Estimated noise energy (variance).

estimate_signal_mean_energy(sample_n=None, support_radius=None, batch_size=512, image_accessor=None)

Estimate the signal mean of sample_n projections.

Parameters:
  • sample_n – Number of images used for estimate. Defaults to all images in source.

  • support_radius – Pixel radius used for masking signal support. Default of None will compute inscribed circle, self.L // 2.

  • batch_size – Images per batch, defaults 512.

  • image_accessor – Optionally override images. Defaults self.images.

Returns:

Estimated signal mean

estimate_signal_power(sample_n=None, support_radius=None, batch_size=512, signal_power_method='estimate_signal_mean_energy', image_accessor=None)

Estimate the signal energy of sample_n projections using prescribed method.

Parameters:
  • sample_n – Number of images used for estimate. Defaults to all images in source.

  • support_radius – Pixel radius used for masking signal support. Default of None will compute inscribed circle, self.L // 2.

  • batch_size – Images per batch, defaults 512.

  • signal_power_method – Method used for computing signal energy. Defaults to mean via estimate_signal_mean_energy. Can use variance method via estimate_signal_var.

  • image_accessor – Optionally override images. Defaults self.images.

Returns:

Estimated signal variance.

estimate_signal_var(sample_n=None, support_radius=None, batch_size=512, image_accessor=None)

Estimate the signal variance of sample_n projections.

Parameters:
  • sample_n – Number of images used for estimate. Defaults to all images in source.

  • support_radius – Pixel radius used for masking signal support. Default of None will compute inscribed circle, self.L // 2.

  • batch_size – Images per batch, defaults 512.

  • image_accessor – Optionally override images. Defaults self.images.

Returns:

Estimated signal variance.

estimate_snr(sample_n=None, support_radius=None, batch_size=512, noise_power=None, signal_power_method='estimate_signal_mean_energy')

Estimate the SNR of the simulated data set using estimated signal power / noise power.

Note signal power depends on choice of signal_power_method, but differences should be small in practice when background noise is zero centered.

Parameters:
  • sample_n – Number of images used for estimate. Defaults to all images in source.

  • support_radius – Pixel radius used for masking signal support. Default of None will compute inscribed circle, self.L // 2.

  • batch_size – Images per batch, defaults 512.

  • signal_power_method – Method used for computing signal energy. Defaults to mean via estimate_signal_mean_energy. Can use variance method via estimate_signal_var.

Returns:

Estimated signal to noise ratio.

property filter_indices
get_metadata(metadata_fields=None, indices=None, default_value=None, as_dict=False)

Get metadata field information of this ImageSource for a selection of fields of indices. The default should return the entire metadata table.

Parameters:
  • metadata_fields – A string, or list of strings, representing the metadata field(s) to be queried. Defaults to None, which yields all populated columns.

  • indices – A list of 0-based indices indicating the indices for which to get metadata. If indices is None, then values corresponding to all indices in this Source object are returned.

  • default_value – Default scalar value to use for any fields not found in the metadata. If None, no default value is used, and missing field(s) cause a RuntimeError.

  • as_dict – Boolean indicating whether we want to return metadata as a dictionary (True), or as a numpy ndarray (False). In the latter case, all returned values are typecast to a common numpy dtype, so use with caution.

Returns:

An ndarray of values (any valid np types) representing metadata info. If is_dict is True, then returns a dictionary mapping metadata names to numpy arrays of values.

has_metadata(metadata_fields)

Find out if one or more metadata fields are available for this ImageSource.

Parameters:

metadata_fields – A string, of list of strings, representing the metadata field(s) to be queried.

Returns:

Boolean value indicating whether the field(s) are available.

im_backward(im, start, weights=None, symmetry_group=None)

Apply adjoint mapping to set of images

Parameters:
  • im – An Image instance to which we wish to apply the adjoint of the forward model.

  • start – Start index of image to consider

  • weights – Optional vector of weights to apply to images. Weights should be length self.n.

  • symmetry_group – A SymmetryGroup instance. If supplied, uses symmetry to increase number of images used in back-projectioon.

Returns:

An L-by-L-by-L volume containing the sum of the adjoint mappings applied to the start+num-1 images.

property images

Subscriptable property which returns the images contained in this source corresponding to the indices given.

invert_contrast(batch_size=512)

invert the global contrast of images

Check if all images in a stack should be globally phase flipped so that the molecule corresponds to brighter pixels and the background corresponds to darker pixels. This is done by comparing the mean in a small circle around the origin (supposed to correspond to the molecule) with the mean of the noise, and making sure that the mean of the molecule is larger. From the implementation level, we modify the ImageSource in-place by appending a Multiple filter to the generation pipeline.

Parameters:

batch_size – Batch size of images to query.

Returns:

On return, the ImageSource object has been modified in place.

property n
property n_ctf_filters

Return the number of CTFFilters found in this Source.

normalize_background(bg_radius=1.0, do_ramp=True)

Normalize the images by the noise background

This is done by shifting the image density by the mean value of background and scaling the image density by the standard deviation of background. From the implementation level, we modify the ImageSource in-place by appending the Add and Multiple filters to the generation pipeline.

Parameters:
  • bg_radius – Radius cutoff to be considered as background (in image size)

  • do_ramp – When it is True, fit a ramping background to the data and subtract. Namely perform normalization based on values from each image. Otherwise, a constant background level from all images is used.

Returns:

On return, the ImageSource object has been modified in place.

property offsets
phase_flip()

Perform phase flip on images in the source object using CTF information. If no CTFFilters exist this will emit a warning and otherwise no-op.

property rotations

Returns rotation matrices.

Returns:

Rotation matrices as a n x 3 x 3 array

save(starfile_filepath, batch_size=512, save_mode=None, overwrite=None)

Save the output metadata to STAR file and/or images to MRCS file.

Parameters:
  • starfile_filepath – Path to STAR file where we want to save metadata of image_source

  • batch_size – Batch size of images to query. Note, batch_size=1 implies single MRC extension .mrc, while batch_size>=1 implies stack MRC extension .mrcs.

  • save_mode – Whether to save all images in a single or multiple files in batch size. Default is multiple, supply ‘single’ for single mode.

  • overwrite – Options to control overwrite behavior (default is None): - True: Overwrites the existing file if it exists. - False: Raises an error if the file exists. - None: Renames the old file by appending a time/date stamp.

Returns:

A dictionary containing “starfile”–the path to the saved starfile– and “mrcs”, a list of the saved particle stack MRC filenames.

save_images(starfile_filepath, filename_indices=None, batch_size=512, overwrite=False)

Save an ImageSource to MRCS files

Note that .mrcs files are saved at the same location as the STAR file.

Parameters:
  • filename_indices – Filename list for save all images

  • starfile_filepath – Path to STAR file where we want to save image_source

  • batch_size – Batch size of images to query from the ImageSource object. if save_mode is not single, images in the same batch will save to one MRCS file.

  • overwrite – Whether to overwrite any .mrcs files found at the target location.

Returns:

None

save_metadata(starfile_filepath, batch_size=512, save_mode=None)

Save updated metadata to a STAR file

Parameters:
  • starfile_filepath – Path to STAR file where we want to save image_source

  • batch_size – Batch size of images to query from the ImageSource object. Every batch_size rows, entries are written to STAR file.

  • save_mode – Whether to save all images in a single or multiple files in batch size.

Returns:

None

property selection_indices
set_metadata(metadata_fields, values, indices=None)

Modify metadata field information of this ImageSource for selected indices

Parameters:
  • metadata_fields – A string, or list of strings, representing the metadata field(s) to be modified

  • values – A scalar or vector (of length |indices|) of replacement values.

  • indices – A list of 0-based indices indicating the indices for which to modify metadata. If indices is None, then all indices in this Source object are modified. In this case, values should either be a scalar or a vector of length equal to the total number of images, |self.n|.

Returns:

On return, the metadata associated with the specified indices has been modified.

property states
property symmetry_group

A SymmetryGroup instance associated with the symmetry type of the ImageSource object. Access rotation matrices of the symmetry_group via symmetry_group.matrices.

update(**kwargs)

Update certain properties that modify the underlying metadata, and return a new ImageSource object with the new properties. The original object is unchanged.

vol_forward(vol, start, num)

Apply forward image model to volume

Parameters:
  • vol – A volume instance.

  • start – Start index of image to consider

  • num – Number of images to consider

Returns:

The images obtained from volume by projecting, applying CTFs, translating, and multiplying by the amplitude.

whiten(noise_estimate=None, epsilon=None)

Modify the ImageSource in-place by appending a whitening filter to the generation pipeline.

When no noise_estimate is provided, a WhiteNoiseEstimator will be instantiated at this preprocessing stage behind the scenes.

Parameters:
  • noise_estimate – Optional, NoiseEstimator or Filter. When passed a NoiseEstimator the filter attribute will be queried. Alternatively, the noise PSD may be passed directly as a Filter object.

  • epsilon – Threshold used to determine which frequencies to whiten and which to set to zero. By default all PSD values in the noise_estimate less than eps(self.dtype) are zeroed out in the whitening filter.

Returns:

On return, the ImageSource object has been modified in place.

class aspire.source.image.IndexedSource(src, indices, memory=None)

Bases: ImageSource

Map into another into ImageSource.

Instantiates a new source along given indices.

Parameters:
  • src – ImageSource to be used as the source.

  • index_map – index_map

  • memory – str or None The path of the base directory to use as a data store or None. If None is given, no caching is performed.

class aspire.source.image.OrientedSource(src, orientation_estimator=None)

Bases: IndexedSource

Source for oriented 2D images using orientation estimation methods.

Constructor of an oriented ImageSource object. Orientation is determined by performing orientation estimation using a supplied orientation_estimator.

Parameters:
  • src – Source used for orientation estimation

  • orientation_estimator – CLOrient3D subclass used for orientation estimation. Default uses the CLSync3N method.

get_metadata(metadata_fields=None, indices=None, default_value=None, as_dict=False)

Get metadata field information of this ImageSource for a selection of fields of indices. The default should return the entire metadata table.

Parameters:
  • metadata_fields – A string, or list of strings, representing the metadata field(s) to be queried. Defaults to None, which yields all populated columns.

  • indices – A list of 0-based indices indicating the indices for which to get metadata. If indices is None, then values corresponding to all indices in this Source object are returned.

  • default_value – Default scalar value to use for any fields not found in the metadata. If None, no default value is used, and missing field(s) cause a RuntimeError.

  • as_dict – Boolean indicating whether we want to return metadata as a dictionary (True), or as a numpy ndarray (False). In the latter case, all returned values are typecast to a common numpy dtype, so use with caution.

Returns:

An ndarray of values (any valid np types) representing metadata info. If is_dict is True, then returns a dictionary mapping metadata names to numpy arrays of values.

property offsets
save_metadata(starfile_filepath, batch_size=512, save_mode=None)

Save updated metadata to a STAR file

Parameters:
  • starfile_filepath – Path to STAR file where we want to save image_source

  • batch_size – Batch size of images to query from the ImageSource object. Every batch_size rows, entries are written to STAR file.

  • save_mode – Whether to save all images in a single or multiple files in batch size.

Returns:

None

aspire.source.micrograph module

class aspire.source.micrograph.ArrayMicrographSource(micrographs, dtype=None, pixel_size=None)

Bases: MicrographSource

Instantiate a MicrographSource with micrographs.

Parameters:
  • micrographs – Numpy array (count, L, L) where L represents the size in pixels.

  • dtype – Data type of returned MicrographSource. Default of None will infer dtype from the array. Explicitly setting dtype will attempt a cast. Currently only float32 and float64 are supported. Note, due to limitations of common MRC implementations, saving is limited to single precision.

  • pixel_size – Pixel size of the images in angstroms, default None.

class aspire.source.micrograph.DiskMicrographSource(micrographs_path, dtype=None, pixel_size=None)

Bases: MicrographSource

Instantiate a MicrographSource with micrographs_path.

Parameters:
  • micrographs_path – Path string representing directory of file, or list of file path strings.

  • dtype – Data type of returned MicrographSource. Default of None will attempt to infer dtype from the first micrograph. Explicitly setting dtype will attempt a cast when returning micrographs. Currently only float32 and float64 are supported. Note, due to limitations of common MRC implementations, saving is limited to single precision.

class aspire.source.micrograph.MicrographSimulation(volume, micrograph_size=4096, micrograph_count=1, particles_per_micrograph=100, particle_amplitudes=None, projection_angles=None, seed=None, ctf_filters=None, noise_adder=None, boundary=None, interparticle_distance=None)

Bases: MicrographSource

A cryo-EM MicrographSimulation object that supplies micrographs.

dtype and particle_box_size are inferred from volume, where dtype is the data type of the micrographs and particle_box_size is the size of the particle images.

Parameters:
  • volumeVolume instance to be used in Simulation. An (L,L,L) Volume will generate (L,L) particle images.

  • micrograph_size – Size of micrograph in pixels, defaults to 4096.

  • micrograph_count – Number of micrographs to generate (integer). Defaults to 1.

  • particles_per_micrograph – The amount of particles generated for each micrograph. Defaults to 10.

  • particle_amplitudes – Optional, amplitudes to pass to Simulation. Default None uses Simulation defaults. When provided must be array with size particles_per_micrograph * micrograph_count.

  • projection_angles – Optional, projection rotation angles to pass to Simulation. Default None uses Simulation defaults. When provided must have shape (particles_per_micrograph * micrograph_count, 3).

  • seed – Random seed.

  • noise_adder – Append instance of NoiseAdder to generation pipeline.

  • ctf_filters – Optional list of Filter objects to apply to particles. This list should be 1, n_micrographs, or particles_per_micrograph * micrograph_count. These will apply filters to all particles, per-micrograph, or per-particle respectively. Default None will not apply any additional filters.

  • boundary – Set boundaries for particle centers, positive values move the boundary inward from the edge of the micrograph. Defaults to half of the particle size (particle_box_size // 2).

  • interparticle_distance – Set minimum distance between particle centers, in pixels. Defaults to particle_box_size.

Returns:

A MicrographSimulation object.

property clean_images

Returns the micrographs without any noise.

Returns:

An ImageAccessor for the unnoisy micrographs.

get_micrograph_index(particle_index)

Using the global ID of the particle, returns the micrograph ID and the local particle ID.

Parameters:

particle_id – Global ID of the particle.

Returns:

The micrograph ID and the local ID of the particle.

get_particle_indices(micrograph_index, particle_index=None)

Using the micrograph ID, returns every global particle ID from that micrograph. Returns specific global IDs if the local IDs are given.

Parameters:
  • micrograph_index – ID of the micrograph.

  • particle_index – Local ID of the particle.

Returns:

The global ID of the particle.

save(path, name_prefix='micrograph', overwrite=True)

Save micrograph simulation to path.

Currently saves micrographs to .mrc. Saves simulated centers, projection rotations, and CTF filter parameters when applicable.

Parameters:
  • path – Directory to save data.

  • name_prefix – Optional, name prefix string for micrograph files.

  • overwrite – Optional, bool. Allow writing to existing directory, and overwriting existing files.

Returns:

List of tuples [(.mrc, .star)..], compatible with CentersCoordinateSource.

class aspire.source.micrograph.MicrographSource(micrograph_count, micrograph_size, dtype, pixel_size=None)

Bases: ABC

asnumpy()

Return micrograph data as dense Numpy array.

Returns:

Numpy array.

property images

Returns micrographs.

Returns:

An ImageAccessor for the noisy micrographs.

save(path, name_prefix='micrograph', overwrite=None)

Save micrographs to path.

path should be a directory. If it does not exist, ASPIRE will attempt to create it.

Currently saves micrographs to .mrc.

Parameters:
  • path – Directory to save data.

  • name_prefix – Optional, name prefix string for micrograph files.

  • overwrite – Options to control overwrite behavior (default is None): - True: Overwrites the existing path if it exists. - False: Raises an error if the path exists. - None: Renames the old path by appending a time/date stamp.

Returns:

List of saved .mrc files.

show(*args, **kwargs)

Helper function to display micrograph. See Image.show().

aspire.source.relion module

class aspire.source.relion.RelionSource(filepath, data_folder=None, pixel_size=1, B=0, n_workers=-1, max_rows=None, symmetry_group=None, memory=None, dtype=None)

Bases: ImageSource

A RelionSource represents a source of picked and cropped particles stored as slices in a .mrcs stack. It must be instantiated via a STAR file, which–at a minumum–lists the particles in each .mrcs stack in the _rlnImageName column. The STAR file may also contain Relion-specific metadata columns. This information is read into dictionaries containing rows for each particle specifying its location and its metadata. The metadata table may be augmented or modified via helper methods found in ImageSource. It may store, for example, Filter objects added during preprocessing.

Load STAR file at given filepath

Parameters:
  • filepath – Absolute or relative path to STAR file

  • data_folder – Path to folder w.r.t which all relative paths to .mrcs files are resolved. If None, the folder corresponding to filepath is used.

  • pixel_size – the pixel size of the images in angstroms (Default 1)

  • B – the envelope decay of the CTF in inverse square angstrom (Default 0)

  • n_workers – Number of threads to spawn to read referenced .mrcs files (Default -1 to auto detect)

  • max_rows – Maximum number of rows in STAR file to read. If None, all rows are read. Note that this refers to the max number of images to load, not the max. number of .mrcs files (which may be equal to or less than the number of images).

  • symmetry_group – A SymmetryGroup object or string corresponding to the symmetry of the molecule.

  • memory – str or None The path of the base directory to use as a data store or None. If None is given, no caching is performed.

  • dtype – Optional datatype override. Default None infers dtype from underlying image (MRC) files. Can be used to upcast STAR files for processing in double precision.

populate_metadata()

Relion STAR files may contain a large number of metadata columns in addition to the locations of particles. We read this into a dict and add some of our own columns for convenience.

aspire.source.simulation module

class aspire.source.simulation.Simulation(L=None, n=1024, vols=None, states=None, unique_filters=None, filter_indices=None, offsets=None, amplitudes=None, dtype=None, C=2, angles=None, seed=0, memory=None, noise_adder=None, symmetry_group=None, pixel_size=None)

Bases: ImageSource

A Simulation represents a synthetic dataset of realistic cryo-EM images with corresponding metadata. The images are generated via projections of a supplied Volume object, vols, over orientations define by the Euler angles, angles. Various types of corruption, such as noise and CTF effects, can be added to the images by supplying a Filter object to the noise_filter or unique_filters arguments.

A Simulation object that supplies images along with other parameters for image manipulation.

Parameters:
  • L – Resolution of projection images (integer). Default is 8. If a Volume is provided L and vols.resolution must agree.

  • n – The number of images to generate (integer).

  • vols – A Volume object representing a stack of volumes. Default is generated with volume.volume_synthesis.AsymmetricVolume.

  • states – A 1d array of n integers in the interval [0, C). The i’th integer indicates the volume stack index used to produce the i’th projection image. Default is a random set.

  • unique_filters – A list of Filter objects to be applied to projection images.

  • filter_indices – A 1d array of n integers indicating the unique_filter indices associated with each image. Default is a random set of filter indices, .ie the filters from unique_filters are randomly assigned to the stack of images.

  • offsets – A n-by-2 array of coordinates to offset the images. Default is a normally distributed set of offsets. Set offsets = 0 to disable offsets.

  • amplitude – A 1d array of n amplitudes to scale the projection images. Default is a random set in the interval [2/3, 3/2]. Set amplitude = 1 to disable amplitudes.

  • dtype – dtype for the Simulation

  • C – Number of Volumes used to generate projection images. The default is C=2. If a Volume object is provided this parameter is overridden and self.C = self.vols.n_vols.

  • angles – A n-by-3 array of Euler angles for use in projection. Default is a random set.

  • seed – Random seed.

  • memory – str or None. The path of the base directory to use as a data store or None. If None is given, no caching is performed.

  • noise_adder – Optionally append instance of NoiseAdder to generation pipeline.

  • symmetry_group – A SymmetryGroup instance or string indicating symmetry of the molecule.

  • pixel_size – Pixel size of the images in angstroms, default None.

Returns:

A Simulation object.

property clean_images

Return projections with filters/shifts/amplitudes applied, but without noise. Subscriptable property.

covar_true()
eigs()

Eigendecomposition of volume covariance matrix of simulation

Returns:

A 2-tuple: eigs_true: The eigenvectors of the volume covariance matrix in the form of Volume instance. lambdas_true: The eigenvalues of the covariance matrix in the form of a (C-1)-by-(C-1) diagonal matrix.

eval_clustering(vol_idx)

Evaluate clustering estimation using an adjusted Rand score.

Parameters:

vol_idx – Indexes of the volumes determined (0-indexed)

Returns:

Accuracy [-0.5, 1] in terms of proportion of correctly assigned labels. Identical clusters (up to a permutation) have a score of 1, random labeling will be close to 0, and discordant clusterings will be negative.

eval_coords(mean_vol, eig_vols, coords_est)

Evaluate coordinate estimation

Parameters:
  • mean_vol – A mean volume in the form of a Volume instance.

  • eig_vols – A set of eigenvolumes in an Volume instance.

  • coords_est – The estimated coordinates in the affine space defined centered at mean_vol and spanned by eig_vols.

Returns:

Dictionary containing error, relative error, and correlation for each set of estimated coordinates.

eval_covar(covar_est)
eval_eigs(eigs_est, lambdas_est)

Evaluate covariance eigendecomposition accuracy

Parameters:
  • eigs_est – The estimated volume eigenvectors in an L-by-L-by-L-by-K array.

  • lambdas_est – The estimated eigenvalues in a K-by-K diagonal matrix (default diag(ones(K, 1))).

Returns:

eval_mean(mean_est)
eval_vol(vol_true, vol_est)
eval_volmat(volmat_true, volmat_est)

Evaluate volume matrix estimation accuracy

Parameters:
  • volmat_true – The true volume matrices in the form of an L-by-L-by-L-by-L-by-L-by-L-by-K array.

  • volmat_est – The estimated volume matrices in the same form.

Returns:

mean_true()
property projections

Return projections of generated volumes, without applying filters/shifts/amplitudes/noise

Parameters:
  • start – start index (0-indexed) of the start image to return

  • num – Number of images to return. If None, all images are returned.

  • indices – A numpy array of image indices. If specified, start and num are ignored.

Returns:

An Image instance.

true_signal_power(*args, **kwargs)

Estimate the signal power of clean_images.

For usage, see ImageSource.estimate_signal_power.

Returns:

Estimated signal power of clean_images

true_snr(*args, **kwargs)

Compute SNR using true_signal_power and the noise power known to simulation.

See Simulation.true_signal_power() for parameters.

vol_coords(mean_vol=None, eig_vols=None)

Coordinates of simulation volumes in a given basis

Parameters:
  • mean_vol – A mean volume in the form of a Volume Instance (default mean_true).

  • eig_vols – A set of k volumes in a Volume instance (default eigs).

Returns:

Module contents