Articles

How is The LogSumExp function used in machine learning?

How is The LogSumExp function used in machine learning?

The LogSumExp (LSE) function is a smooth maximum – a smooth approximation to the maximum function, mainly used by machine learning algorithms. It’s defined as the logarithm of the sum of the exponentials of the arguments: In tropical analysis, this is the sum in the log semiring.

How to use LogSumExp on a masked array?

If False, only one result is returned. NumPy has a logaddexp function which is very similar to logsumexp, but only handles two arguments. logaddexp.reduce is similar to this function, but may be less stable. Notice that logsumexp does not directly support masked arrays. To use it on a masked array, convert the mask into zero weights:

Is the log function in LogSumExp strictly convex?

The LogSumExp function is convex, and is strictly monotonically increasing everywhere in its domain (but not strictly convex everywhere ). L S E ( x ) = log ⁡ ( ∑ i exp ⁡ x i ) ≈ log ⁡ ( exp ⁡ x j ) + ( ∑ i ≠ j exp ⁡ x i ) / exp ⁡ x j ≈ x j = max i x i .

Who are the authors of the pmtk toolkit?

PMTK is a collection of Matlab/Octave functions, written by Matt Dunham, Kevin Murphy and various other people. The toolkit is primarily designed to accompany Kevin Murphy’s textbook Machine learning: a probabilistic perspective, but can also be used independently of this book.

How is the log-sum-EXP function similar to the max function?

The log-sum-exp function can be thought of as a smoothed version of the max function, because whereas the max function is not differentiable at points where the maximum is achieved in two different components, the log-sum-exp function is infinitely differentiable everywhere. The following plots of and for show this connection.

Is the convex conjugate of LogSumExp the negative entropy?

The convex conjugate of LogSumExp is the negative entropy . The LSE function is often encountered when the usual arithmetic computations are performed on a logarithmic scale, as in log probability.