An assembly is a collection of fitting functions. This provides the model representation that is the basis of the park fitting engine.
Models can range from very simple one dimensional theory functions to complex assemblies of multidimensional datasets from different experimental techniques, each with their own theory function and a common underlying physical model.
First define the models you want to work with. In the example below we will use an example of a simple multilayer system measured by specular reflection of xrays and neutrons. The gold depth is the only fitting parameter, ranging from 10-30 A. The interface depths are tied together using expressions. In this case the expression is a simple copy, but any standard math functions can be used. Some model developers may provide additional functions for use with the expression.
Example models:
import reflectometry.model1d as refl
xray = refl.model('xray')
xray.incident('Air',rho=0)
xray.interface('iAu',sigma=5)
xray.layer('Au',rho=124.68,depth=[10,30])
xray.interface('iSi',sigma=5)
xray.substrate('Si',rho=20.07)
datax = refl.data('xray.dat')
neutron = refl.model('neutron')
neutron.incident('Air',rho=0)
neutron.interface('iAu',sigma='xray.iAu')
neutron.layer('Au',rho=4.66,depth='xray.Au.depth')
neutron.interface('iSi',sigma='xray.iSi')
neutron.substrate('Si',rho=2.07)
datan = refl.data('neutron.dat')
As you can see from the above, parameters can be set to a value if the parameter is fixed, to a range if the parametemr is fitted, or to a string expression if the parameter is calculated from other parameters. See park.Parameter.set for further details.
Having constructed the models, we can now create an assembly:
import park
assembly = park.Assembly([(xray,datax), (neutron,datan)])
Note: this would normally be done in the context of a fit using fit = park.Fit([(xray,datax), (neutron,datan)]), and later referenced using fit.assembly.
Individual parts of the assembly are accessable using the model number 0, 1, 2... or by the model name. In the above, assembly[0] and assembly[‘xray’] refer to the same model. Assemblies have insert and append functions for adding new models, and “del model[idx]” for removing them.
Once the assembly is created computing the values for the system is a matter of calling:
assembly.eval()
print "Chi**2",assembly.chisq
print "Reduced chi**2",assembly.chisq/assembly.degrees_of_freedom
plot(arange(len(assembly.residuals)), assembly.residuals)
This defines the attributes residuals, degrees_of_freedom and chisq, which is what the optimizer uses as the cost function to minimize.
assembly.eval uses the current values for the parameters in the individual models. These parameters can be changed directly in the model. In the reflectometry example above, you could set the gold thickness using xray.layer.Au.depth=156, or something similar (the details are model specific). Parameters can also be changed through the assembly parameter set. In the same example, this would be assembly.parameterset[‘xray’][‘Au’][‘depth’]. See parameter set for details.
In the process of modeling data, particularly with multiple datasets, you will sometimes want to temporarily ignore how well one of the datasets matches so that you can more quickly refine the model for the other datasets, or see how particular models are influencing the fit. To temporarily ignore the xray data in the example above use:
assembly.parts[0].isfitted = False
The model itself isn’t ignored since its parameters may be needed to compute the parameters for other models. To reenable checking against the xray data, you would assign a True value instead. More subtle weighting of the models can be controlled using assembly.parts[idx].weight, but see below for a note on model weighting.
Changing the weight is equivalent to scaling the error bars on the given model by a factor of weight/n where n is the number of data points. It is better to set the correct error bars on your data in the first place than to adjust the weights. If you have the correct error bars, then you should expect roughly 2/3 of the data points to lie within one error bar of the theory curve. If consecutive data points have largely overlapping errorbars, then your uncertainty is overestimated.
Another case where weights are adjusted (abused?) is to compensate for systematic errors in the data by forcing the errorbars to be large enough to cover the systematic bias. This is a poor approach to the problem. A better strategy is to capture the systematic effects in the model, and treat the measurement of the independent variable as an additional data point in the fit. This is still not statistically sound as there is likely to be a large correlation between the uncertainty of the measurement and the values of all the other variables.
That said, adjusting the weight on a dataset is a quick way of reducing its influence on the entire fit. Please use it with care.
Bases: object
Collection of fit models.
Assembly implements the park.fit.Objective interface.
See park.assembly for usage.
Instance variables:
These fields are defined for the individual models as well, with degrees of freedom adjusted to the length of the individual data set. If the model is not fitted or the weight is zero, the residual will not be calculated.
The residuals fields are available only after the model has been evaluated.
Interrupt the current function evaluation.
Forward this to the currently executing model if possible.
Extend result from the fit with the calculated parameters.
Add a model to the end of set.
Parameters: |
|
---|
Return the covariance matrix inv(J’J) at point p.
Recalculate the theory functions, and from them, the residuals and chisq.
Note: | Call this after the parameters have been updated. |
---|
Return an alphabetical list of the fitting parameters.
This function is called once at the beginning of a fit, and serves as a convenient place to precalculate what can be precalculated such as the set of fitting parameters and the parameter expressions evaluator.
Add a model to a particular position in the set.
Returns true if the parameter set is in a feasible region of the modeling space.
Query if a particular model is fitted.
Set isfitted to value if value is supplied.
Parameters: |
|
---|
Returns the derivative wrt the fit parameters at point p.
Numeric derivatives are calculated based on step, where step is the portion of the total range for parameter j, or the portion of point value p_j if the range on parameter j is infinite.
Details to send back to the fitting client on an improved fit.
status is ‘start’, ‘step’ or ‘end’ depending if this is the first result to return, an improved result, or the final result.
[Not implemented]
Set the parameters resulting from the fit into the parameter set, and update the calculated expression.
The parameter values may be retrieved by walking the assembly.parameterset tree, checking each parameter for isfitted, iscomputed, or isfixed. For example:
assembly.set_result(result)
for p in assembly.parameterset.flatten():
if p.isfitted():
print "%s %g in [%g,%g]"%(p.path,p.value,p.range[0],p.range[1])
elif p.iscomputed():
print "%s computed as %g"%(p.path.p.value)
This does not calculate the function or the residuals for these parameters. You can call assembly.eval() to do this. The residuals will be set in assembly[i].residuals. The theory and data are model specific, and can be found in assembly[i].fitness.data.
Return parameter uncertainty.
This is just the sqrt diagonal of covariance matrix inv(J’J) at point p.
Query the weight on a particular model.
Set weight to value if value is supplied.
Parameters: |
|
---|---|
Returns: | model weight |
Bases: object
Container for theory and data.
The fit object compares theory with data.
TODO: what to do with fittable metadata (e.g., footprint correction)?
Set parameters in the model.
User convenience function. This allows a user with an assembly of models in a script to for example set the fit range for parameter ‘a’ of the model:
assembly[0].set(a=[5,6])
Raises KeyError if the parameter is not in parameterset.
Park 1-D data objects.
The class Data1D represents simple 1-D data objects, along with an ascii file loader. This format will work well for many uses, but it is likely that more specialized models will have their own data file formats.
The minimal data format for park must supply the following methods:
- residuals(fn)
- returns the residuals vector for the model function.
- residuals_deriv(fn_deriv,par)
- returns the residuals vector for the model function, and for the derivatives with respect to the given parameters.
The function passed is going to be model.eval or in the case where derivatives are available, model.eval_deriv. Normally this will take a vector of dependent variables and return the theory function for that vector but this is only convention. The fitting service only uses the parameter set and the residuals method from the model.
The park GUI will make more demands on the interface, but the details are not yet resolved.
Bases: object
Data representation for 1-D fitting.
Attributes
Notes on calc_x
The contribution of Q to a resolution of width dQo at point Qo is:
p(Q) = 1/sqrt(2 pi dQo**2) exp ( (Q-Qo)**2/(2 dQo**2) )
We are approximating the convolution at Qo using a numerical approximation to the integral over the measured points, with the integral is limited to p(Q_i)/p(0)>=0.001.
Sometimes the function we are convoluting is rapidly changing. That means the correct convolution should uniformly sample across the entire width of the Gaussian. This is not possible at the end points unless you calculate the theory function beyond what is strictly needed for the data. For a given dQ and step size, you need enough steps that the following is true:
(n*step)**2 > -2 dQ**2 * ln 0.001
The choice of sampling density is particularly important near critical points where the shape of the function changes. In reflectometry, the function goes from flat below the critical edge to O(Q**4) above. In one particular model, calculating every 0.005 rather than every 0.02 changed a value above the critical edge by 15%. In a fitting program, this would lead to a somewhat larger estimate of the critical edge for this sample.
Sometimes the theory function is oscillating more rapidly than the instrument can resolve. This happens for example in reflectivity measurements involving thick layers. In these systems, the theory function should be oversampled around the measured points Q. With a single thick layer, oversampling can be limited to just one period 2 pi/d. With multiple thick layers, oscillations will show interference patterns and it will be necessary to oversample uniformly through the entire width of the resolution. If this is not accommodated, then aliasing effects make it difficult to compute the correct model.
Load a multicolumn datafile.
Data should be in columns, with the following defaults:
x,y or x,y,dy or x,dx,y,dy
Note that this resets the selected fitting points calc_x and the computed results calc_y and fx.
Data is sorted after loading.
Any extra keyword arguments are passed to the numpy loadtxt function. This allows you to select the columns you want, skip rows, set the column separator, change the comment character, amongst other things.
Over/under sampling support.
Compute the calc_x points required to adequately sample the function y=f(x) so that the value reported for each measured point is supported by the resolution. minstep is the minimum allowed sampling density that should be used.
Compute the residuals of the data wrt to the given function.
y = fn(x) should be a callable accepting a list of points at which to calculate the function, returning the values at those points.
Any resolution function will be applied after the theory points are calculated.
Compute residuals and derivatives wrt the given parameters.
fdf = fn(x,pars=pars) should be a callable accepting a list of points at which to calculate the function and a keyword argument listing the parameters for which the derivative will be calculated.
Returns a list of the residuals and the derivative wrt the parameter for each parameter.
Any resolution function will be applied after the theory points and derivatives are calculated.
A selection vector for points to use in the evaluation of the residuals, or None if all points are to be used.
Functions for manipulating dependencies.
Parameter values must be updated in the correct order. If parameter A depends on parameter B, then parameter B must be evaluated first.
Order elements from pairs so that b comes before a in the ordered list for all pairs (a,b).
Differential evolution test implementation Implements the Scheme_DE_rand_1 variant
- F: float
- weighting factor which determines the magnitude of perturbation occurring in each generation.
- crossover_rate: float
- In general, 0 < crossover_rate < 1. Usually considerably lower than 1. Convergence slows as the value increases, but higher crossover rates may be necessary when
- func: w = f(p)
- The function to be minimized, of the form w = f(p), where p is a vector of length func_dim and w is a scalar
- func_dim: int
- The dimension of the objective function
- pop: array
- The initial population. This should be an iterable composed of Rank-1 numpy arrays. The population size must be at least 4, preferably much greater.
- l_bound, u_bound: vector
- arrays of size func_dim representing the upper and lower bounds for the parameters in the solution vectors
- tol: float
- if (max(pop_values) - min(pop_values) <= conv), the population has converged and the evolution will stop
Adjust F to account for stagnation of the population
Perform a crossover between the candidate and the ith member of the previous generation.
Evolve the population for numgens generations, or until it converges. Returns the best vector from the run
find the most fit gene in the current population
Generate a random gene within the bounds constraints. Used to replace out-of-bounds genes
check for convergence
Determine whether the gene meets the bounds constraints
generate r1, r2, and r3 randomly from the range [0, NP-1] such that they are distinct values not equal to i
Run differential evolution, returning x,fx,ncalls
generate a random population of vectors within the given bounds. dimension indicates the length of the vectors. l_bound and u_bound should be vectors of length dimension (any iterable container should work)
Functions for manipulating expressions.
Build and return a function to evaluate all parameter expressions in the proper order.
This function is not terribly sophisticated, and it would be easy to trick. However it handles the common cases cleanly and generates reasonable messages for the common errors.
This code has not been fully audited for security. While we have removed the builtins and the ability to import modules, there may be other vectors for users to perform more than simple function evaluations. Unauthenticated users should not be running this code.
Parameter names are assumed to contain only _.a-zA-Z0-9#[]
The list of parameters is probably something like:
parset.setprefix()
pars = parset.flatten()
Note that math uses acos while numpy uses arccos. To avoid confusion we allow both.
Should try running the function to identify syntax errors before running it in a fit.
Use help(fn) to see the code generated for the returned function fn. dis.dis(fn) will show the corresponding python vm instructions.
Returns a list of pair-wise dependencies from the parameter expressions.
For example, if p3 = p1+p2, then find_dependencies([p1,p2,p3]) will return [(p3,p1),(p3,p2)]. For base expressions without dependencies, such as p4 = 2*pi, this should return [(p4, None)]
This parameter set has no constraints between the parameters.
Find the parameter substitution we need so that expressions can be evaluated without having to traverse a chain of model.layer.parameter.value
Replace all occurrences of symbol s with mapping[s] for s in mapping.
Given an expression string and a symbol table, return the set of symbols used in the expression. Symbols are only returned once even if they occur multiple times. The return value is a set with the elements in no particular order.
This is the first step in computing a dependency graph.
Fitting service interface.
A fit consists of a set of models and a fitting engine. The models are collected in an assembly, which manages the parameter set and the constraints between them. The models themselves are tightly coupled to the data that they are modeling and the data is invisible to the fit.
The fitting engine can use a variety of methods depending on model.
The fitter can be run directly on the local machine:
import park
M1 = park.models.Peaks(datafile=park.sampledata('peak.dat'))
M1.add_peak('P1', 'gaussian', A=[4,6], mu=[0.2, 0.5], sigma=0.1)
result = park.fit(models=[M1])
print result
The default settings print results every time the fit improves, and print a global result when the fit is complete. This is a suitable interface for a fitting script.
For larger fit jobs you will want to run the fit on a remote server. The model setup is identical, but the fit call is different:
service = park.FitService('server:port')
result = park.fit(models=[M1], service=service)
print result
Again, the default settings print results every time the fit improves, and print a global result when the fit is complete.
For long running fit jobs, you want to be able to disconnect from the server after submitting the job, and later reconnect to fetch the results. An additional email field will send notification by email when the fit starts and ends, and daily updates on the status of all fits:
service = park.FitService('server:port')
service.notify(email='me@my.email.address',update='daily')
fit = park.Fit(models=[M1])
id = service.submit_job(fit, jobname='peaks')
print id
The results can be retrieved either by id returned from the server, or by the given jobname:
import park
service = park.FitService('server:port',user='userid')
fitlist = service.retrieve('peaks')
for fit in fitlist:
print fit.summary()
The fit itself is a complicated object, including the model, the optimizer, and the type of uncertainty analysis to perform.
When used from a graphical user interface, a different programming interface is needed. In this case, the user may want to watch the progress of the fit and perhaps stop it. Also, as fits can take some time to complete, the user would like to be able to set up additional fits and run them at the same time, switching between them as necessary to monitor progress.
Bases: object
Fit job.
This implements park.job.Job.
Bases: object
Abstract interface for a fitness optimizer.
A fitter has a single method, fit, which takes an objective function (park.fit.Objective) and a handler.
For a concrete instance see park.fitmc.FitMC.
Global optimizer.
This function should return immediately
Bases: object
Simple interface to the local job queue. Currently supports start and wait. Needs to support stop and status. Also, needs to be a proper queue, and needs to allow jobs to run in separate processes according to priority, etc. All the essentials of the remote queuing system without the remote calls.
Unlike the remote queue, the local queue need not maintain persistence.
Wait for the job to complete. This is used in scripts to impose a synchronous interface to the fitting service.
Bases: object
Abstract interface to the fitness function for the park minimizer classes.
Park provides a specific implementation park.assembly.Assembly.
TODO: add a results() method to return model specific info to the TODO: fit handler.
Halts the current function evaluation, and has it return inf. This will be called from a separate thread. If the function contains an expensive calculation, it should reset an abort flag before each evaluation and test it periodically.
This method is optional.
Returns a list of fit parameters. Each parameter has a name, an initial value and a range.
See park.fitresult.FitParameter for an example.
On each function evaluation a new parameter set will be passed to the fitter, with values in the same order as the list of parameters.
Some fitters, notably Levenberg-Marquardt, operate directly on the residuals vector. If the individual residuals are not available, then LM cannot be used.
This method is optional.
Returns residuals and derivatives with respect to the given parameters.
If these are unavailable in the model, then they can be approximated by numerical derivatives, though it is generally better to use a derivative free optimizer such as coliny or cobyla which can use the function evaluations more efficiently. In any case, your objective function is responsible for calculating these.
This method is optional.
Multiple minima example
Start a fit with a set of models. The model set must be in a form accepted by park.assembly.Assembly.
This is a convenience function which sets up the default optimizer and uses the local fitting engine to do the work. Progress reports are printed as they are received.
The choice of fitter, service and handler can be specified by the caller.
The default fitter is FitMC, which is a monte carlo Nelder-Mead simplex local optimizer with 100 random start points.
The default handler does nothing. Instead, ConsoleUpdate could be used to report progress during the fit.
The default service is to run in a separate thread with FitThread. Note that this will change soon to run in a separate process on the local machine so that python’s global interpreter lock does not interfere with parallelism.
Run a monte carlo fit.
This procedure maps a local optimizer across a set of n initial points. The initial parameter value defined by the fitness parameters defines one initial point. The remainder are randomly generated within the bounds of the problem.
localfit is the local optimizer to use. It should be a bounded optimizer following the park.fitmc.LocalFit interface.
handler accepts updates to the current best set of fit parameters. See park.fitresult.FitHandler for details.
Bases: park.fitresult.FitHandler
Print progress to the console.
Model had an error; print traceback
Called when a result is observed which is better than previous results from the fit.
Number of seconds between improvement updates
Record whether results improved since last update
Report on progress.
Number of seconds between progress updates
Bases: object
Abstract interface for fit thread handler.
The methods in this class are called by the optimizer as the fit progresses.
Note that it is up to the optimizer to call the fit handler correctly, reporting all status changes and maintaining the ‘done’ flag.
Fit was aborted.
True when the fit job is complete
Model had an error; print traceback
Fit is complete; best results are reported
Called when a result is observed which is better than previous results from the fit.
result is a FitResult object, with parameters, #calls and fitness.
Called each cycle of the fit, reporting the current and the expected amount of work. The meaning of these values is optimizer dependent, but they can be converted into a percent complete using (100*current)//expected.
Progress is updated each iteration of the fit, whatever that means for the particular optimization algorithm. It is called after any calls to improvement for the iteration so that the update handler can control I/O bandwidth by suppressing intermediate improvements until the fit is complete.
The current best result of the fit
Bases: object
Fit result for an individual parameter.
Return parameter range string.
E.g., ” Gold .....|.... 5.2043 in [2,7]”
Bases: object
Container for reporting fit results.
Return the covariance matrix inv(J’J) at point p.
Number of function calls
Covariance matrix
Value of the goodness of fit metric
Fit parameter list, each with name, range and value attributes.
Parameter vector
Return the covariance matrix inv(J’J) at point p.
Parameter uncertainties
Format numbers nicely for printing.
Usage:
>> from danse.common.util.formatnum import *
>> v,dv = 757.2356,0.01032
>> print format_uncertainty_pm(v,dv)
757.235 +/- 0.010
>> format_uncertainty_compact(v,dv)
757.235(10)
>> format_uncertainty(v,dv)
757.235(10)
Set format_uncertainty.compact to False to use the +/- format by default, otherwise leave it at True for compact value(##) format.
Given value v and uncertainty dv, return a string v +/- dv.
The returned string uses only the number of digits warranted by the uncertainty in the measurement.
If the uncertainty is 0 or not otherwise provided, the simple %g floating point format option is used.
Infinite and indefinite numbers are represented as inf and NaN.
Given value v and uncertainty dv, return the compact representation v(##), where ## are the first two digits of the uncertainty.
The returned string uses only the number of digits warranted by the uncertainty in the measurement.
If the uncertainty is 0 or not otherwise provided, the simple %g floating point format option is used.
Infinite and indefinite numbers are represented as inf and NaN.
Asynchronous message streams.
When you have multiple listeners to a process, some of which can connect and disconnect at different times, you need to dispatch the incoming messages to all the listeners. When a listener joins a running stream, they need to get an immediate status update for the computation. This is stored as the stream header. Then, without dropping any messages, the listeners should receive all subsequent messages in the stream in order.
The message stream is multi-channelled, and indexed by a key. Within the service framework the key is likely to be the jobid associated with the message stream.
The contents of the message stream are expected to be monitor messages (see park.monitor for details), with the stream header being a park.monitor.Join message. When a new listener is registered, the header is immediately put on the queue (if there is one), then all subsequent message are sent until the listener calls hangup().
To attach to a message stream you need an object which accepts put(). An asynchronous queue object is a good choice since it allows you to buffer the messages in one thread and pull them off as needed in another. You can also use a park.monitor.Monitor object to process the messages directly.
Define a park fitting model.
The simplest sort of fitting model is something like the following:
import numpy
import park
def G(x,mu,sigma):
return numpy.exp(-0.5*(x-mu)**2/sigma**2)
class Gauss(park.Model):
parameters = ['center','width','scale']
def eval(self,x):
return self.scale * G(x,self.center,self.width)
It has a function which is evaluated at a series of x values and a set of adjustable parameters controlling the shape of f(x).
You can check your module with something like the following:
$ ipython -pylab
from gauss import Gauss
g = Gauss(center=5,width=1,scale=10)
x = asarray([1,2,3,4,5])
y = g(x)
plot(x,y)
This should produce a plot of the Gaussian peak.
You will then want to try your model with some data. Create a file with some dummy data, such as gauss.dat:
# x y
4 2
5 6
6 7
7 3
In order to match the model to data, you need to define a fitness object. This shows the difference between the model and the data, which you can then plot or sum to create a weighted chisq value:
f = park.Fitness(g, 'gauss.dat')
plot(f.data.fit_x, f.residuals())
The data file can have up to four columns, either x,y or x,y,dy or x,dx,y,dy where x,y are the measurement points and values, dx is the instrument resolution in x and dy is the uncertainty in the measurement value y. You can pass any keyword arguments to data.load that are accepted by numpy.loadtxt. For example, you can reorder columns or skip rows. Additionally, you can modify data attributes directly x,y,dx,dy and calc_x. See help on park.Data1D for details.
Once you have your model debugged you can use it in a fit:
g = Gauss(center=[3,5],scale=7.2,width=[1,3])
result = park.fit((g, 'gauss.dat'))
result.plot()
In this example, center and width are allowed to vary but scale is fixed.
Existing models can be readily adapted to Park:
class Gauss(object):
"Existing model"
center,width,scale = 0,1,1
def __init__(self,**kw):
for k,v in kw.items(): setattr(self,k,v)
def eval(self,x):
return self.scale *G(x,self.center,self.width)
class GaussAdaptor(Gauss,Model):
"PARK adaptor"
parameters = ['center','width','scale']
def __init__(self,*args,**kw):
Model.__init__(self,*args)
Gauss.__init__(self,**kw)
g = GaussAdaptor(center=[3,5],scale=7.2,width=[1,3])
result = park.fit((g, 'gauss.dat'))
result.plot()
Models can become much more complex than the ones described above, including multilevel models where fitting parameters can be added and removed dynamically.
In many cases the park optimizer will need an adaptor for pre-existing models. The adaptor above relies on python properties to translate model.par access into model._par.get() and model._par.set() where _par is the internal name for par. This technique works for simple static models, but will not work well for sophisticated models which have, for example, a dynamic parameter set where the model parameters cannot be set as properties. A solution to this problem is to subclass the park.Parameter and override the value attribute as a property.
Once models are defined they can be used in a variety of contexts, such as simultaneous fitting with constraints between the parameters. With some care in defining the model, computationally intensive fits can be distributed across multiple processors. We provide a simple user interface for interacting with the model parameters and managing fits. This can be extended with model specialized model editors which format the parameters in a sensible way for the model, or allow direct manipulation of the model structure. The underlying fitting engine can also be used directly from your own user interface.
Bases: object
Model definition.
The model manages attribute access to the fitting parameters and also manages the dataset.
List of parameters for which the model can calculate derivatives. The derivs The model function can compute the derivative with respect to this parameter. The function model.derivs(x,[p1,p2,...]) will return (f(x),df/dp1(x), ...). The parameters and their order are determined by the fitting engine.
Note: This is a property of the model, not the fit. Numerical derivatives will be used if the parameter is used in an expression or if no analytic derivative is available for the parameter. Automatic differentiation on parameter expressions is possible, but beyond the scope of this project.
The minimum fittng model if you choose not to subclass park.Model requires parameterset and a residuals() method.
Evaluate the model at x.
This method needs to be specialized in the model to evaluate the model function. Alternatively, the model can implement is own version of residuals which calculates the residuals directly instead of calling eval.
Evaluate the model and derivatives wrt pars at x.
pars is a list of the names of the parameters for which derivatives are desired.
This method needs to be specialized in the model to evaluate the model function. Alternatively, the model can implement is own version of residuals which calculates the residuals directly instead of calling eval.
Set the initial value for a set of parameters.
E.g., model.set(width=3,center=5)
Asychronous execution monitoring service.
Long running computations need to convey status information to the user. This status can take multiple forms, such as output to the console or activity on a GUI, or even mail to your inbox.
park.monitor defines several standard message types:
`Start` for job start
`Join` first message when joining an already running job
`Progress` for job activity
`Improvement` for partial results
`Complete` for final result
`Abort` when job is killed
`Error` when job has an error
`Log` for various debugging messages
Individual services may have specialized message types.
park.monitor also defines Monitor to process the various kinds of messages, and dispatch them to the various user defined handlers.
For each message type, the Monitor dispatcher will look for a function named onMonitorQQQ where QQQ is the message type. For example, onMonitorStart(self, message) will be called in response to a Start message. If onMonitorQQQ is not defined, then onMonitorMessage will be called. The default behaviour of onMonitorMessage is to print the message on the console.
Log messages are sent to the standard system logger. See logging in the python standard library for details.
The Monitor class has methods for onMonitorStart(message), etc. In panel, be sure to have methods for onMonitorStart(message), onMonitorProgress(message), etc., for the kinds of monitor messages the application will send. The catch-all method is onMonitorMessage.
See park.monitor for details on the message types. Individual services may have additional message types.
Bases: object
Messages that are received during the processing of the job.
Standard message types:
`Start`, `Progress`, `Improvement`, `Complete`, `Error`, `Abort`, `Log`
Specific job types may have their own monitor messages.
The messages themselves should all produce nicely formatted results in response to str(message).
The message dispatch calls on<Class>(message) if the on<Class> method exists for the message type. If not, then dispatch calls otherwise(message). By default onLog(message) submits the log record to the logger.
Subclass Monitor to define your own behaviours.
Called when the job sends a logging record.
The logging record contains a normal python logging record.
The default behaviour is to tie into the application logging system using:
logger = logging.getLogger(message.record.name)
logger.handle(message.record)
Logging levels are set in the job controller.
What to do if the message handler is not found.
Default is to ignore the message.
Called from thread when new message has arrived.
Parameters and parameter sets.
Parameter defines an individual parameter, and ParameterSet groups them into a hierarchy.
Individual models need to provide a parameter set with the correct properties, either by using park.ParameterSet in their model definition, or by providing a wrapper which can translate assignment to parameter.value into the appropriate change in the wrapped model. See wrapper.py for an example.
Bases: object
A parameter is a box for communicating with the fitting service. Parameters can have a number of properties,
Parameters have a number of properties:
name of the parameter within the parameter set.
The name is read only. You can rename a parameter but only in the context of the parameter set which contains it, using parameterset.rename(par,name). This will change all expressions containing the named parameter.
expression for the parameter in the model. This is a string containing a formula for updating the parameter value based on the values of other parameters in the system. The expression is ignored if ‘calculated’ is False.
Note: the list of symbols available to the expression evaluator defaults to the contents of the math module. The caller will be able to override this within the fitting fitting class.
function to return the negative log likelihood of seeing a particular parameter value. 2*likelihood(value) will be added to the total cost function for the particular parameter set during the fit. This will be on par with the probabilty of seeing the particular theory function given the observed datapoints when performing the fit (the residual term is closely related to the log likelihood of the normal distribution).
Note: we are minimizing chi^2 = sum [ ((y-f(x;p))/dy)^2 ] rather than -log P = sum [ ((y-f(x;p))/dy)^2/2 + log(2 pi dy^2) ], where P is the probability of seeing f(x;p) given y,dy as the mean and standard deviation of a normal distribution. Because chi^2_p = - 2 * log P_p + constant, the minima of p are the same for chi^2 and negative log likelihood. However, to weight the likelihood properly when adding likelihood values to chisq, we need the extra factor of 2 mentioned above. The usual statistical implications of normalized chi^2 will of course be suspect, both because the assumption of independence between the points in chi^2 (which definitely do not hold for the new ‘point’ p_k), and because of the additional 2 log(2 pi dp_k^2) constant, but given the uncertainty in the estimate of the distribution parameters, this is likely a minor point.
Return the current value for a parameter.
Return true if the value is in the range
parameter name
parameter range
Set a parameter to a value, a range or an expression. If it is a value, the parameter will be fixed for the fit. If it is a range, the value will be varying for the fit. If it is an expression, the parameter will be calculated from the values of other parameters in the fit.
Raises ValueError if the value could not be interpreted.
Set the full path to the parameter as used in expressions involving the parameter name.
Return parameter range string.
E.g., ” Gold .....|.... 5.2043 in [2,7]”
Bases: list
The set of parameters used to fit theory to data.
ParameterSet forms a hierarchy of parameters. The parameters themselves are referred to by the path through the hierarchy, usually:
fitname.component.parameter
Though more or fewer levels are permitted. Parameters are assumed to have a unique label throughout the fit. This is required so that expressions tying the results of one fit to another can uniquely reference a parameter.
Attributes:
Lookup parameter from dotted path
Return the subset of the parameters which are calculated
Return the subset of the paramters which are varying
Return the subset of the parameters which are fixed
Iterate over the elements in depth first order.
Gather all additional symbols that can be used in expressions.
For example, if reflectometry provides a volume fraction function volfrac(rho1,rho2,frac) to compute densities, then this function can be added as a context dictionary to the reflectometry parameter set. Note that there is no guarantee which function will be used if the same function exists in two separate contexts.
parameter name
Rename the parameter to something new. Called from root of the parameter hierarchy, rename the particular parameter object to something else.
This changes the internal name of the parameter, as well as all expressions in which it occurs. If the parameter is actually a parameter set, then it renames all parameters in the set.
Return the subset of the parameters which have a likelihood function associated with them.
Fill in the full path name for all the parameters in the tree.
Note: this function must be called from the root parameter set to build proper path names.
This is required before converting parameter expressions into function calls.
Bases: park.fit.Fitter
Differential evolution optimizer
This implements park.fit.Fitter.
Step size along difference vector
Amount of mixing in population
Maximum number of iterations
Number of active points per dimension
Fit tolerance
A park model implementing a multipeak fitter.
* WARNING * this example was used to inform the design process, and has not yet been updated to correspond to the current implementation. Do not use this as a tutorial.
This is an example model showing how to put together a multi-part fit objective function.
group main: | Peaks |
---|---|
group peaks: | Gaussian, Lorentzian, Voigt |
group backgrounds: | |
Constant, Quadratic, Linear |
Bases: object
Constant background: C
Bases: object
Gaussian peak: scale exp ( -0.5 (x-center)**2 / sigma**2 )
Bases: object
Linear background: B x**2 + C
Bases: object
Lorentzian peak (HWHM): scale/pi gamma/((x-center)**2 + gamma**2)
Bases: park.model.Model
Peak fitter
Add a peak to the model.
The name of the peak is used to distinguish it from other peaks in the model, and appears as part of the parameter name in constraint expressions. For example, if name is ‘P3’ and this is part of model ‘M1’ then the sigma parameter of a gaussian peak would be ‘M1.P3.sigma’.
The peak can be specified either by peak type and initial arguments or by the peak itself, which is a function of x with a parameters attribute.
Additional peak types can be registered. The complete dictionary of available types is in Peaks.peak_types.
Returns the raw theory value at x, unconvoluted and unweighted.
The residuals will be calculated by park.Model.residuals by calling
Register a new peak generator. This will return a callable object with a parameter set that can be used in the peak fitting class.
The individual peak functions a number of attributes:
Remove the named peak from the model.
Bases: object
Quadratic background: A x**2 + B x + C
Bases: object
Voigt peak (HWHM,sigma): A [G(sigma) * L(gamma)](x-center)
Register peak types with the Peaks model.
Return the voigt function, which is the convolution of a Lorentz function with a Gaussian.
Parameters: |
|
---|
Ref: W.I.F. David, J. Appl. Cryst. (1986). 19, 63-64
Note: adjusted to use stddev and HWHM rather than FWHM parameters
Parallel map-reduce implementation using threads.
Bases: object
Abstract interface to map-reduce accumulator function.
Exception seen on executing map or reduce. The collector can adjust the accumulated result appropriately to reflect the error.
Called when all parts have been accumulated
Bases: object
Abstract interface to map-reduce mapper function.
Stop the mapper
Apply function mapper to all inputs.
This is the serial version of a parallel iterator, yielding the next sequence value as soon as it is available. There is no guarantee that the order of the inputs will be preserved in the parallel version, so don’t depend on it!
Apply function mapper to inputs, accumulating the results in collector.
Collector is a function which accepts the result of mapper(item) for each item of inputs. There is no guarantee that the outputs will be received in order.
The map is executed in a separate thread so the function returns to the caller immediately.
Collect all outputs, calling collector(item) for each item in the sequence.
True if the mapper cost should be profiled.
Downhill simplex optimizer.
Minimize a function using Nelder-Mead downhill simplex algorithm.
This optimizer is also known as Amoeba (from Numerical Recipes) and the Nealder-Mead simplex algorithm. This is not the simplex algorithm for solving constrained linear systems.
Downhill simplex is a robust derivative free algorithm for finding minima. It proceeds by choosing a set of points (the simplex) forming an n-dimensional triangle, and transforming that triangle so that the worst vertex is improved, either by stretching, shrinking or reflecting it about the center of the triangle. This algorithm is not known for its speed, but for its simplicity and robustness, and is a good algorithm to start your problem with.
Parameters:
- f : callable f(x,*args)
- The objective function to be minimized.
- x0 : ndarray
- Initial guess.
- bounds : (ndarray,ndarray) or None
- Bounds on the parameter values for the function.
- radius: float
- Size of the initial simplex. For bounded parameters (those which have finite lower and upper bounds), radius is clipped to a value in (0,0.5] representing the portion of the range to use as the size of the initial simplex.
Returns: Result (park.simplex.Result)
- x : ndarray
- Parameter that minimizes function.
- fx : float
- Value of function at minimum: fopt = func(xopt).
- iters : int
- Number of iterations performed.
- calls : int
- Number of function calls made.
- success : boolean
- True if fit completed successfully.
Other Parameters:
- xtol : float
- Relative error in xopt acceptable for convergence.
- ftol : number
- Relative error in func(xopt) acceptable for convergence.
- maxiter : int=200*N
- Maximum number of iterations to perform. Defaults
- update_handler : callable
- Called after each iteration, as callback(xk,fxk), where xk is the current parameter vector and fxk is the function value. Returns True if the fit should continue.
Notes
Uses a Nelder-Mead simplex algorithm to find the minimum of function of one or more variables.
version and package information about PARK.
Asynchronous monitoring service for wx applications.
Define a monitor using park.wxmonitor.wxMonitor(panel) where panel is the window which will receive the monitor updates.
In panel, be sure to have methods for onMonitorStart(message), onMonitorProgress(message), etc., for the kinds of monitor messages the application will send. The catch-all method is onMonitorMessage, which by default will print the messages on the console. If you don’t catch onMonitorLog messages then the log messages will be sent to the standard python logger.
See park.monitor for details on the message types.
The following defines a panel which responds to monitor messages:
import wx
class Panel(wx.Panel):
def __init__(self, *args, **kw):
wx.Panel.__init__(self, *args, **kw)
self.text = wx.TextCtrl(self, size=(200,100), style=wx.TE_MULTILINE)
self.gauge = wx.Gauge(self, range=100)
sizer = wx.BoxSizer(wx.VERTICAL)
sizer.Add(self.text, 0, wx.LEFT | wx.EXPAND)
sizer.Add(self.gauge, 0, wx.LEFT | wx.EXPAND)
self.SetSizer(sizer)
self.text.SetValue('starting value')
def onMonitorMessage(self, message):
self.text.SetValue(str(message))
def onMonitorStart(self, message):
self.text.SetValue(str(message))
self.gauge.SetValue(0)
def onMonitorProgress(self, message):
self.text.SetValue(str(message))
self.gauge.SetValue(int(100*message.complete/message.total))
def onMonitorComplete(self, message):
self.text.SetValue(str(message))
self.gauge.SetValue(100)
We can put this panel in a simple app:
app = wx.PySimpleApp()
frame = wx.Frame(None, -1, 'Test Monitor')
panel = Panel(frame)
frame.Show()
Next we attach attach the monitor to this panel and feed some messages from another thread:
import time,thread
import park.wxmonitor, park.monitor
from park.monitor import Start, Progress, Improvement, Complete
monitor = park.wxmonitor.wxMonitor(panel)
msgs = [Start(), Progress(1,10), Progress(3,10),
Improvement('Better!'), Progerss(6,10), Complete('Best!')]:
def message_stream(monitor,msgs):
time.sleep(1)
for message in msgs:
monitor.put(message)
time.sleep(1)
thread.start_new_thread(message_stream, (monitor,msgs))
app.MainLoop()
You should see the progress bar jump from 10% to 30% to 60% then all the way to the end.
Bases: park.monitor.Monitor
Attach a job monitor to a panel.
The monitor will perform callbacks to onMonitorStart(message), onMonitorProgress(message), etc. if the associated method is defined. If the type specific method is not defined, then the monitor will call onMonitorMessage(message). Otherwise the message is dropped.
See park.monitor for a description of the usual messages.
Dispatch the event from the asynchronous monitor. This is running in the GUI thread.
Called when the job sends a logging record.
The logging record contains a normal python logging record.
The default behaviour is to tie into the application logging system using:
logger = logging.getLogger(message.record.name)
logger.handle(message.record)
Logging levels are set in the job controller.
Generic message handler: do nothing.
Intercept an event received from an asynchronous monitor. This is running in the asynchronous thread.
PARK fitting service.
The PARK fitting service is a set of python packages to support fitting in datasets. Using common python infrastructure such as scipy optimizers, numpy arrays, and matplotlib plotting, park allows you to define models, associate them with datasets and simultaneously fit them. Park provides a simple job queue to manage multiple fits using separate processes or running across the network on separate machines.
The latest version of Park is available from http://www.reflectometry.org/danse/park.
Currently this is supplied as source from a zip file. You can also retrieve the latest version from svn:
svn co svn://danse.us/park/branches/park-1.2
If you are installing from source, you will need a python environment with numpy and scipy. For the GUI version, you will also need matplotlib and wx.
You will need a C compiler to build the resolution convolution function. On Windows this may require installing MinGW and adding distutils.cfg to your distutils directory:
[build]
compiler = mingw
Once you have the required supporting packages, use the following to build and install:
python setup.py install
To get started with park, you will need to first define the models that you are using. These can be very basic models, listing all possible fitting parameters. When setting up the fit later, you will be able to combine models, using parameter expressions to relate the values in one model with those in another. See park.model for details.
Once your models are constructed you can use them in a fit. See park.fit for details.
Important classes and functions: park.model.Model, park.parameter.ParameterSet, park.fit.Fit
group models: | model, assembly, parameter, data |
---|---|
group examples: | peaks |
group optimizer: | |
fit, fitresult, fitmc, simplex | |
group support: | expression, deps, version, setup, serial |
group server: | fitservice |