Here’s a link to a book trailer which was straightforward to create. Since my books are compiled with a Python-based framework, changing the compilation process to produce a video file instead of a book PDF was just a few dozen lines of code.

If you’re interesting in getting a copy of the book, please check out the listing here.

Also, if you’ve enjoyed *Data Science for Babies*, please consider leaving a positive review on Amazon and letting others know what you think!

Some mornings around 7am, Silas stumbles into my office, still half asleep, and asks,

*“Dad, did you write another cookie book?”*

A moment passes as I finish typing out a line of code, and I usually answer something like,

*“No, Buddy, not today.”*

However, recently I was able to say, *“Yup, a draft is ready! Can you read it and make suggestions?”* and then see a smile pop onto his face.

A few revisions later, following lots of good feedback from Silas and Maribeth, I’m happy to release *“Entrepreneurship for Young Minds”*, a book about entrepreneurship for young kids, as the name suggests.

Although it’s hard to capture all aspects of entrepreneurship in a small number of pages, I’ve drawn from a few themes which I think are important:

- Looking for solutions to problems
- Persevering in the midst of failure
- Avoiding complacency
- Dreaming big

As with my previous books, I’ve written the story to be both educational and fun, and I hope it will inspire future entrepreneurs and young minds who will grow up to change the world.

Please check it out here, and feel free to let me know your thoughts if you do.

]]>The LZSS code I’m presenting here is based on this GitHub project, but I’ve created my own fork with some improvements and optimizations.

For some background on LZSS, Wikipedia has a pretty good description.

The remainder of this post will walk through an implementation of compression followed by decompression.

The main function for compression is fairly short:

```
def compress(data: bytes) -> bytes:
output_buffer = bitarray(endian="big")
output_buffer.fromlist = lambda x: output_buffer.frombytes(bytes(x))
i = 0
while i < len(data):
if match := find_longest_match(data, i):
match_distance, match_length = match
output_buffer.append(IS_MATCH_BIT)
dist_hi, dist_lo = match_distance >> 4, (match_distance) & 0xF
output_buffer.fromlist([dist_hi, (dist_lo << 4) | (match_length - LENGTH_OFFSET)])
i += match_length
else:
output_buffer.append(not IS_MATCH_BIT)
output_buffer.fromlist([data[i]])
i += 1
output_buffer.fill() # Pad to complete last byte
return output_buffer.tobytes()
```

This function takes in a `bytes`

object
of uncompressed bytes
and returns a
`bytes`

object of compressed bytes.
To understand how this code works, we’ll walk through a few compression examples.

For our first compression example, we’ll compress the
`bytes`

object
`b"zzzzz"`

.

On the first iteration of our loop `while i < len(data):`

,
`find_longest_match(data, i)`

will return
`None`

because we’re looking at the first byte in our raw
data and there are no previous bytes to match against. We’ll walk through
`find_longest_match`

in more detail later.
So, we append a single bit, `0`

,
followed by the byte `b"z"`

(`IS_MATCH_BIT`

is a
`bool`

set to `True`

).

On the second loop iteration, things are a bit more interesting. Since our next 4 bytes
`b"zzzz"`

share a common value with our first byte
`b"z"`

,
`find_longest_match`

will return a
non-`None`

`Tuple`

containing a `match_distance`

equal to 1
and `match_length`

equal to 4.

Let’s discuss what these values mean.
`match_distance`

equal to 1 means that we found a match which
is just one previous position away from our current position.
`match_length`

equal to 4 means that we matched 4 bytes. In this
case, it means that our match of `b"z"`

is repeated 4 times.

Since we successfully found a match, we append a single bit value of 1,
`IS_MATCH_BIT`

, followed by values for
`match_distance`

and
`match_length`

packed into 16 bits
(12 bits for `match_distance`

and 4 bits for `match_length`

).

Overall, `b"zzzzz"`

gets packed into
9 bits + 17 bits == 26 bits!
Since we pad our `bytes`

object with
`output_buffer.fill()`

, our final output is
4 bytes, compressed down from 5 bytes.

For our second example, we’ll use a set of bytes which are a little bit more interesting,
`b"wxyzwxy`

(although still pretty boring, I know).

For the first 4 bytes,
`b"w"`

,
`b"x"`

,
`b"y"`

, and
`b"z"`

, `find_longest_match`

returns `None`

for each byte, so we pack these 4 bytes into a total of
4 * 9 bits == 36 bits.

For our next byte, `w`

,
`find_longest_match`

returns
`match_distance`

of 4 and
`match_length`

of 3.
Thus, the remaining bytes `b"wxy"`

are compressed to 17 bits.
Overall, `b"wxyzwxy`

compresses down to
36 + 17 == 53 bits, down from 56 bits (not a significant change, but hang on for the next example).

For this 3rd and final example, let’s suppose our
`bytes`

object is
`b"wxyzwxyzwxy"`

.

As with the previous example, the first 4 bytes will require a total of 36 bits to store.
However, the remaining `b"wxyzwxy"`

will be summarized
with a
`match_distance`

of 4 and
`match_length`

of 7 such that our entire
`bytes`

object compresses down to
56 bits from 88!

Based on the previous 3 examples, you probably have an idea of what
`find_longest_match`

is doing, and now
we’ll walk through specifics of how it actually works. Below is its code:

```
def find_longest_match(data: bytes, current_position: int) -> Optional[Tuple[int, int]]:
end_of_buffer = min(current_position + MATCH_LENGTH_MASK + LENGTH_OFFSET, len(data))
search_start = max(0, current_position - WINDOW_SIZE)
for match_candidate_end in range(end_of_buffer, current_position + LENGTH_OFFSET + 1, -1):
match_candidate = data[current_position:match_candidate_end]
for search_position in range(search_start, current_position):
if match_candidate == get_wrapped_slice(data[search_position:current_position], len(match_candidate)):
return current_position - search_position, len(match_candidate)
```

The outer loop of this function,
`for match_candidate_end in ...`

,
is used to create match candidates, starting with the
longest possible candidate and shrinking the candidate length by 1 after every iteration.

For example, if we are working with input data
`b"ghxyz"`

, and `current_position`

is 1, corresponding to `b"h"`

, our first value for
`match_candidate`

will be
`b"hxyz`

. Since our match search starts and ends with
`b"g"`

, a match won’t be found and our next
`match_candidate`

will be
`b"hxy`

.

The inner loop of this function,
`for search_position in ...`

,
checks to see if
`match_candidate`

is identical to
any previous byte sequences.
The function
`get_wrapped_slice`

allows
us to find matches where the search sequence is actually shorter than
`match_candidate`

as we saw in the
first compression example involving
`b"zzzzz"`

.
The code for
`get_wrapped_slice`

can be seen here:

```
def get_wrapped_slice(x: bytes, num_bytes: int) -> bytes:
"""
Examples:
f(b"1234567", 5) -> b"12345"
f(b"123", 5) -> b"12312"
"""
repetitions = num_bytes // len(x)
remainder = num_bytes % len(x)
return x * repetitions + x[:remainder]
```

Also, if you’re wondering about the `LENGTH_OFFSET`

constant, it exists because
we only consider substrings of length 2 and greater and just
output any substring of length 1 (9 bits uncompressed is better than a 17-bit
reference for the flag, distance, and length).
Since lengths 0 and 1 are unused, we can encode lengths 2-17 in only 4 bits.

Here is the declaration for `LENGTH_OFFSET`

along with our other constants:

```
MATCH_LENGTH_MASK: Final[int] = 0xF
WINDOW_SIZE: Final[int] = 0xFFF
IS_MATCH_BIT: Final[bool] = True
# We only consider substrings of length 2 and greater, and just
# output any substring of length 1 (9 bits uncompressed is better than a 17-bit
# reference for the flag, distance, and length)
# Since lengths 0 and 1 are unused, we can encode lengths 2-17 in only 4 bits.
LENGTH_OFFSET: Final[int] = 2
```

That’s all there is to it for the compression code!

The decompression code is much shorter overall and is covered with the single function shown below:

```
def decompress(compressed_bytes: bytes) -> bytes:
data = bitarray(endian="big")
data.frombytes(compressed_bytes)
assert data, f"Cannot decompress {compressed_bytes}"
output_buffer = []
while len(data) >= 9: # Anything less than 9 bits is padding
if data.pop(0) != IS_MATCH_BIT:
byte = data[:8].tobytes()
del data[:8]
output_buffer.append(byte)
else:
hi, lo = data[:16].tobytes()
del data[:16]
distance = (hi << 4) | (lo >> 4)
length = (lo & MATCH_LENGTH_MASK) + LENGTH_OFFSET
for _ in range(length):
output_buffer.append(output_buffer[-distance])
return b"".join(output_buffer)
```

Essentially, this function is just a
`while`

loop
that decodes bytes until only padding remains.

For our first compression example of
`b"zzzzz"`

, our compressed bits
should look like
`00111101 01000000 00000101 00000000`

.
To make this a bit more readable, I’ll change the spacing between groups of bits:
`0 01111010 1 00000000 00010100 000000`

.

For the order in which they appear,

`0`

corresponds to the no match flag, `not IS_MATCH_BIT`

,

`01111010`

corresponds to the binary representation for
`b"z"`

,

`1`

corresponds to the match flag, `IS_MATCH_BIT`

,

`00000000`

corresponds to the most significant bits for
`match_distance`

,

`00010100`

corresponds to the least significant bits for
`match_distance`

and bits for
`match_length`

, and

`000000`

corresponds to padding bits.

On the first iteration of our
`while`

loop,
the leading
`0`

is popped
so the `if`

branch is executed and
`01010001`

is interpreted/stored as
`b"z"`

.

On the second iteration of our loop,
`1`

is popped so the
`else`

branch is executed and
the bits
`00000000 00010100`

are parsed into
match `distance`

and
`length`

values.

This simple loop:

```
for _ in range(length):
output_buffer.append(output_buffer[-distance])
```

elegantly utilizes the
match `distance`

and
`length`

values just decoded.

That’s all there is to it! If you check out the repo containing this code, you’ll see that everything fits cleanly into less than 100 lines of code.

If you notice any mistakes or have any feedback, please feel free to reach out.

]]>I’ve titled this subsequent book *“DevOps for Sponges”*, and it teaches software development and operations
concepts such as version control, automation, testing, deployment, and incremental design.

Why is the book *“for Sponges”*? Children are likened to sponges because of their ability to absorb information,
and this book was written primarily for young minds.

The book is now available on Amazon, so please check it out and consider sharing with young, future engineers!

]]>To make my own work a bit more relatable, I recently wrote a book to explain data science in the simplest terms possible with the hope of inspiring and educating the youngest of minds.

With very basic examples, this book covers regression, statistical overfitting & underfitting, classification, data visualization, and time series analysis.

The book is now available on Amazon, so please check it out and consider sharing with a young, possibly future data scientist.

Special thanks to my wife and son for proofreading and to my daughter who loves cookies almost more than anything!

]]>If you haven’t read my previous post on performing the Haar wavelet transform, be sure to check it out to develop a good foundation for the content we’ll be exploring in this post. I also found this post to be a helpful reference when reading up on LGT transforms.

The LGT (5/3) wavelet transform we’ll be working with is a bit more complex than the
Haar wavelet transform but offers characteristics which may be preferable depending
on the application.
One application, for example, is
the lossless version of JPEG 2000 compression.
The *(5/3)* part of the transform name
refers to the 5 low pass filter coefficients and 3
high pass filter coefficients that we’ll discuss shortly.

We’re going to perform our LGT wavelet transform by iteratively decomposing a signal into low and high frequency components. We’ll refer to the low frequency component as $l_{i}$ and to the high frequency component as $h_{i}$ and will calculate them with these 2 equations:

$l_{i} = -x_{2i-1} + 2x_{2i} + 6x_{2i+1} + 2x_{2i+2} - x_{2i+3}$ $h_{i} = -x_{2i-1} + 2x_{2i} - x_{2i+1}$Technically $l_i$ and $h_i$ should have multiplying coefficients of $\frac 1 8$ and $\frac 1 2$, respectively, but we drop these coefficients so that all values of $l_i$ and $h_i$ are guaranteed to be integers when all values of $x_i$ are integers.

These components $l_i$ and $h_i$ can be calculated with the following Python code:

```
lowpass = [-1, 2, 6, 2, -1]
highpass = [-1, 2, -1]
x_lo = convolve(x, lowpass, mode="mirror")[1::2]
x_hi = convolve(x, highpass, mode="mirror")[0::2]
```

Conversely, we can reconstruct our original signal with the following equations:

$x_{2i+1} = \frac {-h_{i} + l_{i} - h_{i + 1}} 8$ $x_{2i} = \frac {-4h_{i-1} + 4l_{i-1} + 24h_{i} + 4l_{i} -4h_{i + 1}} {64}$The Python code for applying these equations is a bit more involved but still relatively straight forward:

```
inv_lowpass = [-1, 1, -1]
inv_highpass = [-4, 4, 24, 4, -4]
interleaved = np.empty(len(x))
interleaved[0::2] = x_hi
interleaved[1::2] = x_lo
x_odds = convolve(interleaved, inv_lowpass, mode="mirror")
x_evens = convolve(interleaved, inv_highpass, mode="mirror")
x_reconstructed = np.empty(len(x))
x_reconstructed[0::2] = x_evens[0::2] // 64
x_reconstructed[1::2] = x_odds[1::2] // 8
```

Applying these equations to images is straight forward since we can first apply them along the horizontal axis followed by the vertical axis. For example, working across the columns of an image $X$ with a pixel value of $x_{r,c}$ at row $r$ and column $c$ leverages these equations:

$l_{r,c} = -x_{r,2c-1} + 2x_{r,2c} + 6x_{r,2c+1} + 2x_{r,2c+2} - x_{r,2c+3}$ $h_{r,c} = -x_{r,2c-1} + 2x_{r,2c} - x_{r,2c+1}$The corresponding code looks like this:

```
lo = convolve(image, [lowpass], mode="mirror")[:, 1::2]
hi = convolve(image, [highpass], mode="mirror")[:, 0::2]
image[:, : cols // 2] = norm_image(lo)
image[:, cols // 2 :] = norm_image(hi)
```

Note that `norm_image`

is for visualization purposes.

Now let’s apply this transform a single time to the following image:

The transformed image (after converting the image to grayscale) looks like this:

This output looks similar to what we’d see with an equivalent Haar transform.

Below is a class which makes it easy to iteratively transform an image both horizontally and vertically.

```
class WaveletImage:
def __init__(self, image: ndarray, axis: int = 1, levels: int = 2) -> None:
self.axis = axis
self.lo, self.hi = self.transform(image, self.axis, levels)
@property
def pixels(self) -> ndarray:
lo = norm_image(self.lo if isinstance(self.lo, ndarray) else self.lo.pixels)
hi = norm_image(self.hi if isinstance(self.hi, ndarray) else self.hi.pixels)
return np.concatenate([lo, hi], axis=self.axis)
@staticmethod
def convolve(x: ndarray, kernel: List[int], axis: int, index: int) -> ndarray:
k = np.array([kernel])
if axis == 0:
k = k.T
y = convolve(x, k, mode="mirror")
if axis == 0:
return y[index::2]
elif axis == 1:
return y[:, index::2]
else:
raise ValueError(f"axis '{axis}' must be 0 or 1")
def lowpass(self, x: ndarray, axis: int) -> ndarray:
return self.convolve(x, [-1, 2, 6, 2, -1], axis, 1)
def highpass(self, x: ndarray, axis: int) -> ndarray:
return self.convolve(x, [-1, 2, -1], axis, 0)
def inv_lowpass(self, x: ndarray, axis: int) -> ndarray:
return self.convolve(x, [-1, 1, -1], axis, 1)
def inv_highpass(self, x: ndarray, axis: int) -> ndarray:
return self.convolve(x, [-4, 4, 24, 4, -4], axis, 0)
def inverse_transform(self) -> ndarray:
lo: ndarray = self.lo if isinstance(self.lo, ndarray) else self.lo.inverse_transform()
hi: ndarray = self.hi if isinstance(self.hi, ndarray) else self.hi.inverse_transform()
x = interleave(hi, lo, self.axis)
x_evens = self.inv_highpass(x, self.axis) // 64
x_odds = self.inv_lowpass(x, self.axis) // 8
return interleave(x_evens, x_odds, self.axis)
def transform(self, x: ndarray, axis: int, levels: int) -> Tuple[ndarray, ndarray]:
lo = self.lowpass(x, axis)
hi = self.highpass(x, axis)
lo = WaveletImage(lo, abs(axis - 1), levels - axis) if levels else lo
hi = WaveletImage(hi, axis=0, levels=0) if axis == 1 else hi
return lo, hi
```

If we transform our image with
`levels=2`

as shown with this code:

```
image = WaveletImage(image, levels=2).pixels
plt.imshow(image, cmap="gray")
plt.show()
```

We get the following:

That’s all for this post! If you have any feedback, please feel free to reach out.

Also, if you enjoyed this post, you may also enjoy my posts DIY Metropolis-Hastings and DIY pseudorandom number generator.

]]>Why do we care about wavelet transforms? At a high level, wavelet transforms allow you to analyze the frequency content of your data while achieving different temporal (or spatial) resolutions for different frequencies. They’re useful in a variety of applications including JPEG 2000 compression.

According to the uncertainty principle for signal processing, there is a trade-off between temporal/spatial resolution and frequency resolution. As we’ll see later in this post, our wavelet transformed image will have low spatial resolution with high frequency resolution for lower frequencies but high spatial resolution with low frequency resolution for higher frequencies. A Fourier transform, in contrast, would produce medium spatial resolution with medium frequency resolution for all frequencies.

With that background in place, we’ll demonstrate these concepts with an implementation of the Haar wavelet transform.

For our wavelet implementation, an image will be iteratively decomposed into low and high frequency components.
The low and high frequency components are created such that the original image can be reconstructed without any loss of information.
The low frequency component we’ll refer to as the *sum* component ($s_{r,c}$)
and the high frequency component we’ll refer to as *difference* ($d_{r,c}$).

where $x_{5,3}$ would be the pixel value at the $5^{th}$ row and $3^{rd}$ column of the original image.

This can be represented with the following Python code:

```
evens = x[:, 0::2]
odds = x[:, 1::2]
s = evens + odds
d = evens - odds
```

Reversing this transform is relatively easy:

$x_{r,2c} = \frac {s_{r,c} + d_{r,c}} 2$ $x_{r,2c+1} = \frac {s_{r,c} - d_{r,c}} 2$Similarly, the Python code looks like this:

```
x[:, 0::2] = (s + d) // 2
x[:, 1::2] = (s - d) // 2
```

Now let’s apply this transform a single time to the following image:

The transformed image (after converting the image to grayscale) looks like this:

As is usually done for images, we can also perform the same transform across rows instead of columns. I.e.

$s_{r,c} = x_{2r,c} + x_{2r+1,c}$ $d_{r,c} = x_{2r,c} - x_{2r+1,c}$Which gives us this:

Typically, the low frequency component in the upper left corner of the image will undergo additional stages of decomposition as shown in the next section.

With the computational pieces just discussed, we can put together the following class which decomposes images into an arbitrary number of levels as we’ll visualize shortly.

```
class WaveletImage:
def __init__(self, image: ndarray, axis: int = 1, levels: int = 2) -> None:
self.axis = axis
self.lo, self.hi = self.transform(image, self.axis, levels)
@property
def pixels(self) -> ndarray:
lo = norm_image(self.lo if isinstance(self.lo, ndarray) else self.lo.pixels)
hi = norm_image(self.hi if isinstance(self.hi, ndarray) else self.hi.pixels)
return np.concatenate([lo, hi], axis=self.axis)
def inverse_transform(self) -> ndarray:
lo: ndarray = self.lo if isinstance(self.lo, ndarray) else self.lo.inverse_transform()
hi: ndarray = self.hi if isinstance(self.hi, ndarray) else self.hi.inverse_transform()
evens = (lo + hi) // 2
odds = (lo - hi) // 2
return interleave(evens, odds, axis=self.axis)
@staticmethod
def transform(image: ndarray, axis: int, levels: int) -> Tuple[ndarray, ndarray]:
if axis == 0:
evens, odds = image[::2, :], image[1::2, :]
elif axis == 1:
evens, odds = image[:, ::2], image[:, 1::2]
else:
raise ValueError(f"axis '{axis}' must be 0 or 1")
lo = WaveletImage(evens + odds, abs(axis - 1), levels - axis) if levels else evens + odds
hi = WaveletImage(evens - odds, axis=0, levels=0) if axis == 1 else evens - odds
return lo, hi
def norm_image(x: ndarray) -> ndarray:
return (x - x.min()) / (x.max() - x.min())
def interleave(a: ndarray, b: ndarray, axis: int) -> ndarray:
rows, cols = a.shape
rows, cols = (rows * 2, cols) if axis == 0 else (rows, cols * 2)
out = np.empty((rows, cols), dtype=a.dtype)
if axis == 0:
out[0::2] = a
out[1::2] = b
elif axis == 1:
out[:, 0::2] = a
out[:, 1::2] = b
else:
raise ValueError("interleave only supports axis of 0 or 1")
return out
```

If we create a `WaveletImage`

object with
`levels=1`

, the result will look like the image we just
created in the last section that has both horizontal and vertical decompositions, resulting in 2x2 tiles.

However, things get a bit more interesting when we transform with
`levels=2`

.
We can see what the result looks like with:

```
image = WaveletImage(image, levels=2).pixels
plt.imshow(image, cmap="gray")
plt.show()
```

Which gives us this:

Pretty cool, right? We can add an additional level of decomposition by setting
`levels=3`

as shown below.

```
image = WaveletImage(image, levels=3).pixels
plt.imshow(image, cmap="gray")
plt.show()
```

And the result looks like this:

As mentioned at the beginning of this post, low frequency components have higher frequency resolution at the cost of lower spatial resolution while while high frequency components have higher spatial resolution at the cost of lower frequency resolution.

You can see difference in spatial resolution between low and high frequency components by comparing the low frequency component in the upper left hand corner which is at $\frac 1 {64}$ resolution of the original image versus the high frequency component in the lower right hand corner which is at $\frac 1 4$ resolution of the original image.

That’s all for this post! If you have any feedback, please feel free to reach out.

Also, if you enjoyed this post, you may also enjoy my posts LGT wavelet transform from scratch, DIY Metropolis-Hastings and DIY pseudorandom number generator.

]]>If you haven’t read my previous post on pseudorandom number generation, be sure to check it out since we’ll be reusing the pseudorandom number generator discussed there.

Metropolis-Hastings is a Markov chain Monte Carlo technique to sample from distributions for which we don’t know how to perform direct sampling. In particular, we can use this algorithm to sample from a distribution without a known normalization constant.

For this post, we’ll sample from a Gaussian distribution without needing to know the Gaussian normalization constant of $\frac 1 {\sigma \sqrt {2 \pi}}$.

Here is the high-level summary of how we will accomplish this:

- Set initial sample $x_0 = 0$ (this is arbitrary)
- Propose new sample $x_{proposed}$
- Use $g(x)$ which is proportional to target distribution $f(x)$ to probabilistically accept or reject $x_{proposed}$ over $x_0$
- Set $x_1$ equal to the value of $x_0$ or $x_{proposed}$ based on the outcome of step 3.
- Repeat steps 2-4 to determine the next sample, $x_2$, and continue repeating for $x_3, x_4, ..., x_N$.

Of the 5 steps above, steps 2 and 3, **sample proposal** and **sample selection**, are the least straightforward
and will be explained in this section and the next.
In the **sample proposal** step, we generate a new sample which might be returned as the next output from
the algorithm while in the **sample selection** step we probabilistically return either the newly proposed
sample or the sample which was returned in the previous iteration of the algorithm.

**Sample proposal** can be straight forward, and for this tutorial we’ll use a very simple method. Given
a previous sample value of $x_t$, the proposed sample will have a value of

where $\mathcal{U}_{[- \frac 1 2,\frac 1 2]}$ is a continuous uniform random variable sampled between $- \frac 1 2$ and $\frac 1 2$. For the first iteration of Metropolis-Hastings, $x_0$ will be an arbitrary value such as $0$ but should ideally have a probability $\gt 0$ for the target distribution.

The equivalent Python code for **sample proposal** is simple:

```
def propose_sample(current_sample: float) -> float:
return current_sample + uniform() - 0.5
proposed_sample = propose_sample(x0)
```

`uniform()`

returns a continuous random variable with values between $0$ and $1$
($\mathcal{U}_{[0,1]}$), and the
code behind it can be read about here.

In step 3, **sample selection**, we probabilistically choose either $x_{proposed}$ or
$x_t$ to be the next sample value $x_{t+1}$.

In order to perform sample selection, we need a function $g(x)$ which is proportional to our distribution of interest $f(x)$. I.e. $g(x) \propto f(x)$ such that:

$\frac {f(a)} {f(b)} = \frac {cg(a)} {cg(b)} = \frac {g(a)} {g(b)}$Since the target distribution from which we’d like to sample is the Gaussian distribution $f(x)=\frac 1 {\sigma \sqrt {2 \pi}} e^{-\frac 1 2 ({\frac {x - \mu} \sigma})^2}$, we’ll choose $g(x)=e^{-\frac 1 2 ({\frac {x - \mu} \sigma})^2}$.

To determine whether $x_{t+1}$ takes the value of $x_t$ or $x_{proposed}$, we sample a value $u$ between 0 and 1 from a continuous uniform distribution $\mathcal{U}_{[0,1]}$ and let $x_{t+1}=x_t$ if $u \leq \frac {g(x_t)} {g(x_{proposed})}$ else $x_{t+1}=x_{proposed}$.

This logic can be seen in the following Python code:

```
def score(x: float, mu: float = 0, sigma: float = 1) -> float:
norm_x = (x - mu) / sigma
return math.exp(-(norm_x ** 2) / 2)
def get_next_sample(current_sample: float, current_sample_score: float) -> Tuple[float, float]:
# Calculate proposed value for x_{t+1}
proposed_sample = propose_sample(current_sample)
proposed_sample_score = score(proposed_sample)
# NOTE: This code was written with simplicity in mind, but there is no reason to
# sample from the uniform distribution if proposed_sample_score > self.current_sample_score
if uniform() <= (proposed_sample_score / current_sample_score):
current_sample = proposed_sample
current_sample_score = proposed_sample_score
return current_sample, current_sample_score
# x0 is the arbitrary starting point
x0 = 0
x0_score = score(x0)
x1, x1_score = get_next_sample(x0, x0_score)
x2, x2_score = get_next_sample(x1, x1_score)
```

Lastly, we’ll implement a stateful class to generate samples:

```
class MetropolisHastings:
def __init__(self, x0: float = 0) -> None:
self.sample = x0
self.sample_score = score(self.sample)
def __call__(self) -> float:
self.sample, self.sample_score = get_next_sample(self.sample, self.sample_score)
return self.sample
```

Now we can generate a set of samples and make sure the distribution looks as expected.

```
def gen_samples(f: Callable[[], float]) -> Iterator[float]:
for _ in tqdm(range(1000000)):
yield f()
metropolis = MetropolisHastings()
gaussian_samples = list(gen_samples(metropolis))
```

Sure enough, our Gaussian samples look pretty good.

That’s all for this post! If you have any feedback, please feel free to reach out.

]]>Veritasium has a great video about the logistic map, particularly how it can demonstrate chaotic behavior. If you haven’t seen his video before, I highly recommend you check it out.

Anyways, this relatively simple equation known as the logistic map looks like this:

$x_{n+1}=rx_n(1-x_n)$For certain values of $r$, the values of $x$ follow a predictable pattern, but for other values of $r$, such as when $r=4$, the values of $x$ become unpredictable and chaotic.

Through this unpredictable behavior, the logistic map provides a promising approach to pseudorandom number
generation as discussed in the paper
*Logistic map: A possible random-number generator*
.

As suggested from this paper, the following mapping can be applied to all $x$ values such that they provide a uniform distribution:

$y_n = {\frac 1 \pi} \arccos (1 - 2x_n)$Implementing the couple of equations in our previous section is pretty straight forward as shown in the following Python class.

```
class RandomNumber:
r: Final[int] = 4
def __init__(self, seed: float = 1 / 3) -> None:
self.x = seed
@staticmethod
def map_to_uniform(x: float) -> float:
return math.acos(1 - 2 * x) / math.pi
@staticmethod
def calc_logistic_map(r: int, x: float) -> float:
return r * x * (1 - x)
def __call__(self) -> float:
self.x = self.calc_logistic_map(self.r, self.x)
return self.map_to_uniform(self.x)
```

Using this class, we can easily sample a bunch of uniformly distributed random variables.

```
rand = RandomNumber()
rands = [rand() for _ in range(1000000)]
```

Now let’s visualize these samples on a histogram:

Looks great! What does the distribution look like if we sum the value of 2 random variables? I.e.

$\textit{X} = \mathcal{U}_0 + \mathcal{U}_1$Or in Python:

```
rand = RandomNumber()
rands = [sum(rand() for _ in range(2)) for _ in range(100000)]
```

Gives us this:

Oh no! What’s going on? The sum of 2 uniformly distributed random variables should give us a triangular-looking distribution, not this.

As it turns out, consecutive samples from our pseudorandom number generator are correlated with one another. In the next section we’ll address this issue.

Here is an improved implementation of our pseudorandom number generator which drops intermediate values to reduce correlation between consecutive samples.

```
class RandomNumber:
"""@article{phatak1995logistic,
title={Logistic map: A possible random-number generator},
author={Phatak, SC and Rao, S Suresh},
journal={Physical review E},
volume={51},
number={4},
pages={3670},
year={1995},
publisher={APS}}"""
r: Final[int] = 4
# The random numbers generated from this class should be uniformly distributed between 0 and 1
upper_limit: Final[float] = 1
lower_limit: Final[float] = 0
theoretical_mean: Final[float] = (upper_limit + lower_limit) / 2
theoretical_variance: Final[float] = (upper_limit - lower_limit) ** 2 / 12
def __init__(self, seed: float = 1 / 3, skip: int = 22) -> None:
if skip < 0:
raise ValueError(f"'{skip}' is not an acceptable value for 'skip'. 'skip' must be >= 0.")
self.x = seed
# NOTE: A skip of at least 1 is necessary to remove obvious sample-to-sample correlation,
# but a skip of around 22 removes all single precision correlation (see referenced paper).
self.skip = skip
@staticmethod
def map_to_uniform(x: float) -> float:
return math.acos(1 - 2 * x) / math.pi
@staticmethod
def calc_logistic_map(r: int, x: float) -> float:
return r * x * (1 - x)
def __call__(self) -> float:
for _ in range(self.skip + 1):
self.x = self.calc_logistic_map(self.r, self.x)
return self.map_to_uniform(self.x)
```

With this new implementation, the following code:

```
rand = RandomNumber()
rands = [sum(rand() for _ in range(2)) for _ in range(100000)]
```

Gives us this:

Excellent! Also, under the central limit theorem, the sum of 100 uniformly distributed random variables should be approximately Gaussian. I.e.:

$\mathcal{N}_{\mu,\,\sigma^2} \approx \sum_{i=0}^{99} \mathcal{U}_i$This can be expressed in the following Python class:

```
class PseudoGaussian:
def __init__(self, N: int = 100) -> None:
self.N = N
self.rand = RandomNumber()
def __call__(self) -> float:
return sum(self.rand() for _ in range(self.N))
@property
def theoretical_mean(self):
return self.N * self.rand.theoretical_mean
@property
def theoretical_variance(self):
return self.N * self.rand.theoretical_variance
```

And running this short snippet:

```
rand = PseudoGaussian(N=100)
rands = [rand() for _ in range(100000)]
```

Gives us this:

Sure enough, our output distribution looks very Gaussian!

I hope you’ve enjoyed this post. If you have any feedback, please feel free to reach out.

Also, you may be interested in this post where I use the pseudorandom number generator in a Metropolis-Hastings implementation.

]]>I’m not the first and certainly won’t be the last to write about adding $\KaTeX$ to your Jekyll blog. I found this blog post particularly helpful when enabling $\KaTeX$ for my blog and would encourage you to check it out as well.

The first step is to add the *kramdown-math-katex* rendering package to your Gemfile such that your Gemfile looks something like this:

1 source 'https://rubygems.org' 2 gem "jekyll", "~> 3.9.0" 3 gem "jekyll-feed" 4 gem "jekyll-sitemap" 5 gem "kramdown-parser-gfm" 6 ~ gem "kramdown-math-katex" # <- New package

This package translates math blocks designated with *$$* into appropriate HTML.
For example, it can translate this text for Euler’s Equation

```
$$e^{ix} = \cos x + i \sin x$$
```

into the following HTML, partially copied below:

<spanclass="katex"><spanclass="katex-mathml">...</span>

The HTML should ultimately look something like this when viewed through your browser:

$e^{ix} = \cos x + i \sin x$Although we made the rendering package available through step 1 above, we need to tell kramdown to leverage this package when translating our Markdown files into HTML.
You should specify **“math_engine: katex”** in *_config.yml* as shown below:

61 kramdown: 62 # Use KaTeX to render math equations 63 math_engine: katex

After steps 1 and 2, kramdown will parse our Markdown into appropriate HTML, but we will also need some additional files so that the HTML will display properly.

You will need to download $\KaTeX$ fonts and katex.css.

I saved these files in a folder called *katex*, but how you manage these resources is up to you.

drwxrwxr-x -tim21 Feb 11:04 N-katexdrwxrwxr-x -tim21 Feb 11:04 N- └──0.15.2drwxrwxr-x -tim21 Feb 10:06 N- ├──fonts.rw-rw-r--26ktim21 Feb 9:55 N- └── katex.css

In the HTML header definition for your blog posts (e.g. in *_layouts/default.html*), you’ll need to add the following:

{% if page.katex %} <link rel="stylesheet" type="text/css" href="{{ site.baseurl }}/katex/0.15.2/katex.css" /> {% endif %}

The **{% if page.katex %}**
conditional allows you to control which blog posts actually include the CSS necessary for $\KaTeX$.
So, any blog post that uses $\KaTeX$ will need **“katex: True”** added to the header like this:

1 --- 2 layout: post 3 title: Blogging with KaTeX 4 katex: True 5 ---

That’s all there is to it! Hopefully these steps will work for you and you’ll be up and running with $\KaTeX$.

]]>