I don’t think I can adequately cover the last 5+ years in this short post, but I will say that it has been quite an adventure.

It’s been a road with many challenges yet also many successes.

A pivotal milestone was the creation of MedCognetics, bringing together a team passionate about transforming healthcare and women’s health.

And today, I’m pleased to share this team’s latest milestone of FDA clearance for our breast cancer screening service.

I also want to recognize those who made this milestone possible: not only the fantastic team at MedCognetics, but also some incredible individuals at UTDallas, our partners at UTSW, and many others.

It’s been an absolute honor to work alongside and build relationships with these people.

Thank you, everyone. I look forward to what we can accomplish in the next five years.

]]>The food I ate and the places I saw were unforgettable. Pictures do no justice, particularly for the Taj Mahal.

Visiting the Taj Mahal is the experience of observing, approaching, and finally stepping into a painting. To explore it means to feel as though you’ve entered into another’s imagination.

However, my favorite part wasn’t experiencing the food and the places.

The best part was the people.

The absolute highlights of my trip are the incredible hospitality, sharing of stories, and laughing with others.

Seeing the world is fantastic. But my favorite thing about traveling is the reminder that no matter where you find yourself on Earth, you’ll find people, just like you and me, running this race called life. We have so much in common, despite our differences.

Thank you, India. I look forward to returning someday.

]]>Here’s a link to a book trailer which was straightforward to create. Since my books are compiled with a Python-based framework, changing the compilation process to produce a video file instead of a book PDF was just a few dozen lines of code.

If you’re interesting in getting a copy of the book, please check out the listing here.

Also, if you’ve enjoyed *Data Science for Babies*, please consider leaving a positive review on Amazon and letting others know what you think!

Some mornings around 7am, Silas stumbles into my office, still half asleep, and asks,

*“Dad, did you write another cookie book?”*

A moment passes as I finish typing out a line of code, and I usually answer something like,

*“No, Buddy, not today.”*

However, recently I was able to say, *“Yup, a draft is ready! Can you read it and make suggestions?”* and then see a smile pop onto his face.

A few revisions later, following lots of good feedback from Silas and Maribeth, I’m happy to release *“Entrepreneurship for Young Minds”*, a book about entrepreneurship for young kids, as the name suggests.

Although it’s hard to capture all aspects of entrepreneurship in a small number of pages, I’ve drawn from a few themes which I think are important:

- Looking for solutions to problems
- Persevering in the midst of failure
- Avoiding complacency
- Dreaming big

As with my previous books, I’ve written the story to be both educational and fun, and I hope it will inspire future entrepreneurs and young minds who will grow up to change the world.

Please check it out here, and feel free to let me know your thoughts if you do.

]]>The LZSS code I’m presenting here is based on this GitHub project, but I’ve created my own fork with some improvements and optimizations.

For some background on LZSS, Wikipedia has a pretty good description.

The remainder of this post will walk through an implementation of compression followed by decompression.

The main function for compression is fairly short:

```
def compress(data: bytes) -> bytes:
output_buffer = bitarray(endian="big")
output_buffer.fromlist = lambda x: output_buffer.frombytes(bytes(x))
i = 0
while i < len(data):
if match := find_longest_match(data, i):
match_distance, match_length = match
output_buffer.append(IS_MATCH_BIT)
dist_hi, dist_lo = match_distance >> 4, (match_distance) & 0xF
output_buffer.fromlist([dist_hi, (dist_lo << 4) | (match_length - LENGTH_OFFSET)])
i += match_length
else:
output_buffer.append(not IS_MATCH_BIT)
output_buffer.fromlist([data[i]])
i += 1
output_buffer.fill() # Pad to complete last byte
return output_buffer.tobytes()
```

This function takes in a `bytes`

object
of uncompressed bytes
and returns a
`bytes`

object of compressed bytes.
To understand how this code works, we’ll walk through a few compression examples.

For our first compression example, we’ll compress the
`bytes`

object
`b"zzzzz"`

.

On the first iteration of our loop `while i < len(data):`

,
`find_longest_match(data, i)`

will return
`None`

because we’re looking at the first byte in our raw
data and there are no previous bytes to match against. We’ll walk through
`find_longest_match`

in more detail later.
So, we append a single bit, `0`

,
followed by the byte `b"z"`

(`IS_MATCH_BIT`

is a
`bool`

set to `True`

).

On the second loop iteration, things are a bit more interesting. Since our next 4 bytes
`b"zzzz"`

share a common value with our first byte
`b"z"`

,
`find_longest_match`

will return a
non-`None`

`Tuple`

containing a `match_distance`

equal to 1
and `match_length`

equal to 4.

Let’s discuss what these values mean.
`match_distance`

equal to 1 means that we found a match which
is just one previous position away from our current position.
`match_length`

equal to 4 means that we matched 4 bytes. In this
case, it means that our match of `b"z"`

is repeated 4 times.

Since we successfully found a match, we append a single bit value of 1,
`IS_MATCH_BIT`

, followed by values for
`match_distance`

and
`match_length`

packed into 16 bits
(12 bits for `match_distance`

and 4 bits for `match_length`

).

Overall, `b"zzzzz"`

gets packed into
9 bits + 17 bits == 26 bits!
Since we pad our `bytes`

object with
`output_buffer.fill()`

, our final output is
4 bytes, compressed down from 5 bytes.

For our second example, we’ll use a set of bytes which are a little bit more interesting,
`b"wxyzwxy`

(although still pretty boring, I know).

For the first 4 bytes,
`b"w"`

,
`b"x"`

,
`b"y"`

, and
`b"z"`

, `find_longest_match`

returns `None`

for each byte, so we pack these 4 bytes into a total of
4 * 9 bits == 36 bits.

For our next byte, `w`

,
`find_longest_match`

returns
`match_distance`

of 4 and
`match_length`

of 3.
Thus, the remaining bytes `b"wxy"`

are compressed to 17 bits.
Overall, `b"wxyzwxy`

compresses down to
36 + 17 == 53 bits, down from 56 bits (not a significant change, but hang on for the next example).

For this 3rd and final example, let’s suppose our
`bytes`

object is
`b"wxyzwxyzwxy"`

.

As with the previous example, the first 4 bytes will require a total of 36 bits to store.
However, the remaining `b"wxyzwxy"`

will be summarized
with a
`match_distance`

of 4 and
`match_length`

of 7 such that our entire
`bytes`

object compresses down to
56 bits from 88!

Based on the previous 3 examples, you probably have an idea of what
`find_longest_match`

is doing, and now
we’ll walk through specifics of how it actually works. Below is its code:

```
def find_longest_match(data: bytes, current_position: int) -> Optional[Tuple[int, int]]:
end_of_buffer = min(current_position + MATCH_LENGTH_MASK + LENGTH_OFFSET, len(data))
search_start = max(0, current_position - WINDOW_SIZE)
for match_candidate_end in range(end_of_buffer, current_position + LENGTH_OFFSET + 1, -1):
match_candidate = data[current_position:match_candidate_end]
for search_position in range(search_start, current_position):
if match_candidate == get_wrapped_slice(data[search_position:current_position], len(match_candidate)):
return current_position - search_position, len(match_candidate)
```

The outer loop of this function,
`for match_candidate_end in ...`

,
is used to create match candidates, starting with the
longest possible candidate and shrinking the candidate length by 1 after every iteration.

For example, if we are working with input data
`b"ghxyz"`

, and `current_position`

is 1, corresponding to `b"h"`

, our first value for
`match_candidate`

will be
`b"hxyz`

. Since our match search starts and ends with
`b"g"`

, a match won’t be found and our next
`match_candidate`

will be
`b"hxy`

.

The inner loop of this function,
`for search_position in ...`

,
checks to see if
`match_candidate`

is identical to
any previous byte sequences.
The function
`get_wrapped_slice`

allows
us to find matches where the search sequence is actually shorter than
`match_candidate`

as we saw in the
first compression example involving
`b"zzzzz"`

.
The code for
`get_wrapped_slice`

can be seen here:

```
def get_wrapped_slice(x: bytes, num_bytes: int) -> bytes:
"""
Examples:
f(b"1234567", 5) -> b"12345"
f(b"123", 5) -> b"12312"
"""
repetitions = num_bytes // len(x)
remainder = num_bytes % len(x)
return x * repetitions + x[:remainder]
```

Also, if you’re wondering about the `LENGTH_OFFSET`

constant, it exists because
we only consider substrings of length 2 and greater and just
output any substring of length 1 (9 bits uncompressed is better than a 17-bit
reference for the flag, distance, and length).
Since lengths 0 and 1 are unused, we can encode lengths 2-17 in only 4 bits.

Here is the declaration for `LENGTH_OFFSET`

along with our other constants:

```
MATCH_LENGTH_MASK: Final[int] = 0xF
WINDOW_SIZE: Final[int] = 0xFFF
IS_MATCH_BIT: Final[bool] = True
# We only consider substrings of length 2 and greater, and just
# output any substring of length 1 (9 bits uncompressed is better than a 17-bit
# reference for the flag, distance, and length)
# Since lengths 0 and 1 are unused, we can encode lengths 2-17 in only 4 bits.
LENGTH_OFFSET: Final[int] = 2
```

That’s all there is to it for the compression code!

The decompression code is much shorter overall and is covered with the single function shown below:

```
def decompress(compressed_bytes: bytes) -> bytes:
data = bitarray(endian="big")
data.frombytes(compressed_bytes)
assert data, f"Cannot decompress {compressed_bytes}"
output_buffer = []
while len(data) >= 9: # Anything less than 9 bits is padding
if data.pop(0) != IS_MATCH_BIT:
byte = data[:8].tobytes()
del data[:8]
output_buffer.append(byte)
else:
hi, lo = data[:16].tobytes()
del data[:16]
distance = (hi << 4) | (lo >> 4)
length = (lo & MATCH_LENGTH_MASK) + LENGTH_OFFSET
for _ in range(length):
output_buffer.append(output_buffer[-distance])
return b"".join(output_buffer)
```

Essentially, this function is just a
`while`

loop
that decodes bytes until only padding remains.

For our first compression example of
`b"zzzzz"`

, our compressed bits
should look like
`00111101 01000000 00000101 00000000`

.
To make this a bit more readable, I’ll change the spacing between groups of bits:
`0 01111010 1 00000000 00010100 000000`

.

For the order in which they appear,

`0`

corresponds to the no match flag, `not IS_MATCH_BIT`

,

`01111010`

corresponds to the binary representation for
`b"z"`

,

`1`

corresponds to the match flag, `IS_MATCH_BIT`

,

`00000000`

corresponds to the most significant bits for
`match_distance`

,

`00010100`

corresponds to the least significant bits for
`match_distance`

and bits for
`match_length`

, and

`000000`

corresponds to padding bits.

On the first iteration of our
`while`

loop,
the leading
`0`

is popped
so the `if`

branch is executed and
`01010001`

is interpreted/stored as
`b"z"`

.

On the second iteration of our loop,
`1`

is popped so the
`else`

branch is executed and
the bits
`00000000 00010100`

are parsed into
match `distance`

and
`length`

values.

This simple loop:

```
for _ in range(length):
output_buffer.append(output_buffer[-distance])
```

elegantly utilizes the
match `distance`

and
`length`

values just decoded.

That’s all there is to it! If you check out the repo containing this code, you’ll see that everything fits cleanly into less than 100 lines of code.

If you notice any mistakes or have any feedback, please feel free to reach out.

]]>I’ve titled this subsequent book *“DevOps for Sponges”*, and it teaches software development and operations
concepts such as version control, automation, testing, deployment, and incremental design.

Why is the book *“for Sponges”*? Children are likened to sponges because of their ability to absorb information,
and this book was written primarily for young minds.

The book is now available on Amazon, so please check it out and consider sharing with young, future engineers!

]]>To make my own work a bit more relatable, I recently wrote a book to explain data science in the simplest terms possible with the hope of inspiring and educating the youngest of minds.

With very basic examples, this book covers regression, statistical overfitting & underfitting, classification, data visualization, and time series analysis.

The ~~book is available on Amazon~~
2nd edition is now available on Amazon,
so please check it out and consider sharing with a young, possibly future data scientist.

Special thanks to my wife and son for proofreading and to my daughter who loves cookies almost more than anything!

]]>If you haven’t read my previous post on performing the Haar wavelet transform, be sure to check it out to develop a good foundation for the content we’ll be exploring in this post. I also found this post to be a helpful reference when reading up on LGT transforms.

The LGT (5/3) wavelet transform we’ll be working with is a bit more complex than the
Haar wavelet transform but offers characteristics which may be preferable depending
on the application.
One application, for example, is
the lossless version of JPEG 2000 compression.
The *(5/3)* part of the transform name
refers to the 5 low pass filter coefficients and 3
high pass filter coefficients that we’ll discuss shortly.

We’re going to perform our LGT wavelet transform by iteratively decomposing a signal into low and high frequency components. We’ll refer to the low frequency component as $l_{i}$ and to the high frequency component as $h_{i}$ and will calculate them with these 2 equations:

$l_{i} = -x_{2i-1} + 2x_{2i} + 6x_{2i+1} + 2x_{2i+2} - x_{2i+3}$ $h_{i} = -x_{2i-1} + 2x_{2i} - x_{2i+1}$Technically $l_i$ and $h_i$ should have multiplying coefficients of $\frac 1 8$ and $\frac 1 2$, respectively, but we drop these coefficients so that all values of $l_i$ and $h_i$ are guaranteed to be integers when all values of $x_i$ are integers.

These components $l_i$ and $h_i$ can be calculated with the following Python code:

```
lowpass = [-1, 2, 6, 2, -1]
highpass = [-1, 2, -1]
x_lo = convolve(x, lowpass, mode="mirror")[1::2]
x_hi = convolve(x, highpass, mode="mirror")[0::2]
```

Conversely, we can reconstruct our original signal with the following equations:

$x_{2i+1} = \frac {-h_{i} + l_{i} - h_{i + 1}} 8$ $x_{2i} = \frac {-4h_{i-1} + 4l_{i-1} + 24h_{i} + 4l_{i} -4h_{i + 1}} {64}$The Python code for applying these equations is a bit more involved but still relatively straight forward:

```
inv_lowpass = [-1, 1, -1]
inv_highpass = [-4, 4, 24, 4, -4]
interleaved = np.empty(len(x))
interleaved[0::2] = x_hi
interleaved[1::2] = x_lo
x_odds = convolve(interleaved, inv_lowpass, mode="mirror")
x_evens = convolve(interleaved, inv_highpass, mode="mirror")
x_reconstructed = np.empty(len(x))
x_reconstructed[0::2] = x_evens[0::2] // 64
x_reconstructed[1::2] = x_odds[1::2] // 8
```

Applying these equations to images is straight forward since we can first apply them along the horizontal axis followed by the vertical axis. For example, working across the columns of an image $X$ with a pixel value of $x_{r,c}$ at row $r$ and column $c$ leverages these equations:

$l_{r,c} = -x_{r,2c-1} + 2x_{r,2c} + 6x_{r,2c+1} + 2x_{r,2c+2} - x_{r,2c+3}$ $h_{r,c} = -x_{r,2c-1} + 2x_{r,2c} - x_{r,2c+1}$The corresponding code looks like this:

```
lo = convolve(image, [lowpass], mode="mirror")[:, 1::2]
hi = convolve(image, [highpass], mode="mirror")[:, 0::2]
image[:, : cols // 2] = norm_image(lo)
image[:, cols // 2 :] = norm_image(hi)
```

Note that `norm_image`

is for visualization purposes.

Now let’s apply this transform a single time to the following image:

The transformed image (after converting the image to grayscale) looks like this:

This output looks similar to what we’d see with an equivalent Haar transform.

Below is a class which makes it easy to iteratively transform an image both horizontally and vertically.

```
class WaveletImage:
def __init__(self, image: ndarray, axis: int = 1, levels: int = 2) -> None:
self.axis = axis
self.lo, self.hi = self.transform(image, self.axis, levels)
@property
def pixels(self) -> ndarray:
lo = norm_image(self.lo if isinstance(self.lo, ndarray) else self.lo.pixels)
hi = norm_image(self.hi if isinstance(self.hi, ndarray) else self.hi.pixels)
return np.concatenate([lo, hi], axis=self.axis)
@staticmethod
def convolve(x: ndarray, kernel: List[int], axis: int, index: int) -> ndarray:
k = np.array([kernel])
if axis == 0:
k = k.T
y = convolve(x, k, mode="mirror")
if axis == 0:
return y[index::2]
elif axis == 1:
return y[:, index::2]
else:
raise ValueError(f"axis '{axis}' must be 0 or 1")
def lowpass(self, x: ndarray, axis: int) -> ndarray:
return self.convolve(x, [-1, 2, 6, 2, -1], axis, 1)
def highpass(self, x: ndarray, axis: int) -> ndarray:
return self.convolve(x, [-1, 2, -1], axis, 0)
def inv_lowpass(self, x: ndarray, axis: int) -> ndarray:
return self.convolve(x, [-1, 1, -1], axis, 1)
def inv_highpass(self, x: ndarray, axis: int) -> ndarray:
return self.convolve(x, [-4, 4, 24, 4, -4], axis, 0)
def inverse_transform(self) -> ndarray:
lo: ndarray = self.lo if isinstance(self.lo, ndarray) else self.lo.inverse_transform()
hi: ndarray = self.hi if isinstance(self.hi, ndarray) else self.hi.inverse_transform()
x = interleave(hi, lo, self.axis)
x_evens = self.inv_highpass(x, self.axis) // 64
x_odds = self.inv_lowpass(x, self.axis) // 8
return interleave(x_evens, x_odds, self.axis)
def transform(self, x: ndarray, axis: int, levels: int) -> Tuple[ndarray, ndarray]:
lo = self.lowpass(x, axis)
hi = self.highpass(x, axis)
lo = WaveletImage(lo, abs(axis - 1), levels - axis) if levels else lo
hi = WaveletImage(hi, axis=0, levels=0) if axis == 1 else hi
return lo, hi
```

If we transform our image with
`levels=2`

as shown with this code:

```
image = WaveletImage(image, levels=2).pixels
plt.imshow(image, cmap="gray")
plt.show()
```

We get the following:

That’s all for this post! If you have any feedback, please feel free to reach out.

Also, if you enjoyed this post, you may also enjoy my posts DIY Metropolis-Hastings and DIY pseudorandom number generator.

]]>Why do we care about wavelet transforms? At a high level, wavelet transforms allow you to analyze the frequency content of your data while achieving different temporal (or spatial) resolutions for different frequencies. They’re useful in a variety of applications including JPEG 2000 compression.

According to the uncertainty principle for signal processing, there is a trade-off between temporal/spatial resolution and frequency resolution. As we’ll see later in this post, our wavelet transformed image will have low spatial resolution with high frequency resolution for lower frequencies but high spatial resolution with low frequency resolution for higher frequencies. A Fourier transform, in contrast, would produce medium spatial resolution with medium frequency resolution for all frequencies.

With that background in place, we’ll demonstrate these concepts with an implementation of the Haar wavelet transform.

For our wavelet implementation, an image will be iteratively decomposed into low and high frequency components.
The low and high frequency components are created such that the original image can be reconstructed without any loss of information.
The low frequency component we’ll refer to as the *sum* component ($s_{r,c}$)
and the high frequency component we’ll refer to as *difference* ($d_{r,c}$).

where $x_{5,3}$ would be the pixel value at the $5^{th}$ row and $3^{rd}$ column of the original image.

This can be represented with the following Python code:

```
evens = x[:, 0::2]
odds = x[:, 1::2]
s = evens + odds
d = evens - odds
```

Reversing this transform is relatively easy:

$x_{r,2c} = \frac {s_{r,c} + d_{r,c}} 2$ $x_{r,2c+1} = \frac {s_{r,c} - d_{r,c}} 2$Similarly, the Python code looks like this:

```
x[:, 0::2] = (s + d) // 2
x[:, 1::2] = (s - d) // 2
```

Now let’s apply this transform a single time to the following image:

The transformed image (after converting the image to grayscale) looks like this:

As is usually done for images, we can also perform the same transform across rows instead of columns. I.e.

$s_{r,c} = x_{2r,c} + x_{2r+1,c}$ $d_{r,c} = x_{2r,c} - x_{2r+1,c}$Which gives us this:

Typically, the low frequency component in the upper left corner of the image will undergo additional stages of decomposition as shown in the next section.

With the computational pieces just discussed, we can put together the following class which decomposes images into an arbitrary number of levels as we’ll visualize shortly.

```
class WaveletImage:
def __init__(self, image: ndarray, axis: int = 1, levels: int = 2) -> None:
self.axis = axis
self.lo, self.hi = self.transform(image, self.axis, levels)
@property
def pixels(self) -> ndarray:
lo = norm_image(self.lo if isinstance(self.lo, ndarray) else self.lo.pixels)
hi = norm_image(self.hi if isinstance(self.hi, ndarray) else self.hi.pixels)
return np.concatenate([lo, hi], axis=self.axis)
def inverse_transform(self) -> ndarray:
lo: ndarray = self.lo if isinstance(self.lo, ndarray) else self.lo.inverse_transform()
hi: ndarray = self.hi if isinstance(self.hi, ndarray) else self.hi.inverse_transform()
evens = (lo + hi) // 2
odds = (lo - hi) // 2
return interleave(evens, odds, axis=self.axis)
@staticmethod
def transform(image: ndarray, axis: int, levels: int) -> Tuple[ndarray, ndarray]:
if axis == 0:
evens, odds = image[::2, :], image[1::2, :]
elif axis == 1:
evens, odds = image[:, ::2], image[:, 1::2]
else:
raise ValueError(f"axis '{axis}' must be 0 or 1")
lo = WaveletImage(evens + odds, abs(axis - 1), levels - axis) if levels else evens + odds
hi = WaveletImage(evens - odds, axis=0, levels=0) if axis == 1 else evens - odds
return lo, hi
def norm_image(x: ndarray) -> ndarray:
return (x - x.min()) / (x.max() - x.min())
def interleave(a: ndarray, b: ndarray, axis: int) -> ndarray:
rows, cols = a.shape
rows, cols = (rows * 2, cols) if axis == 0 else (rows, cols * 2)
out = np.empty((rows, cols), dtype=a.dtype)
if axis == 0:
out[0::2] = a
out[1::2] = b
elif axis == 1:
out[:, 0::2] = a
out[:, 1::2] = b
else:
raise ValueError("interleave only supports axis of 0 or 1")
return out
```

If we create a `WaveletImage`

object with
`levels=1`

, the result will look like the image we just
created in the last section that has both horizontal and vertical decompositions, resulting in 2x2 tiles.

However, things get a bit more interesting when we transform with
`levels=2`

.
We can see what the result looks like with:

```
image = WaveletImage(image, levels=2).pixels
plt.imshow(image, cmap="gray")
plt.show()
```

Which gives us this:

Pretty cool, right? We can add an additional level of decomposition by setting
`levels=3`

as shown below.

```
image = WaveletImage(image, levels=3).pixels
plt.imshow(image, cmap="gray")
plt.show()
```

And the result looks like this:

As mentioned at the beginning of this post, low frequency components have higher frequency resolution at the cost of lower spatial resolution while while high frequency components have higher spatial resolution at the cost of lower frequency resolution.

You can see difference in spatial resolution between low and high frequency components by comparing the low frequency component in the upper left hand corner which is at $\frac 1 {64}$ resolution of the original image versus the high frequency component in the lower right hand corner which is at $\frac 1 4$ resolution of the original image.

That’s all for this post! If you have any feedback, please feel free to reach out.

Also, if you enjoyed this post, you may also enjoy my posts LGT wavelet transform from scratch, DIY Metropolis-Hastings and DIY pseudorandom number generator.

]]>If you haven’t read my previous post on pseudorandom number generation, be sure to check it out since we’ll be reusing the pseudorandom number generator discussed there.

Metropolis-Hastings is a Markov chain Monte Carlo technique to sample from distributions for which we don’t know how to perform direct sampling. In particular, we can use this algorithm to sample from a distribution without a known normalization constant.

For this post, we’ll sample from a Gaussian distribution without needing to know the Gaussian normalization constant of $\frac 1 {\sigma \sqrt {2 \pi}}$.

Here is the high-level summary of how we will accomplish this:

- Set initial sample $x_0 = 0$ (this is arbitrary)
- Propose new sample $x_{proposed}$
- Use $g(x)$ which is proportional to target distribution $f(x)$ to probabilistically accept or reject $x_{proposed}$ over $x_0$
- Set $x_1$ equal to the value of $x_0$ or $x_{proposed}$ based on the outcome of step 3.
- Repeat steps 2-4 to determine the next sample, $x_2$, and continue repeating for $x_3, x_4, ..., x_N$.

Of the 5 steps above, steps 2 and 3, **sample proposal** and **sample selection**, are the least straightforward
and will be explained in this section and the next.
In the **sample proposal** step, we generate a new sample which might be returned as the next output from
the algorithm while in the **sample selection** step we probabilistically return either the newly proposed
sample or the sample which was returned in the previous iteration of the algorithm.

**Sample proposal** can be straight forward, and for this tutorial we’ll use a very simple method. Given
a previous sample value of $x_t$, the proposed sample will have a value of

where $\mathcal{U}_{[- \frac 1 2,\frac 1 2]}$ is a continuous uniform random variable sampled between $- \frac 1 2$ and $\frac 1 2$. For the first iteration of Metropolis-Hastings, $x_0$ will be an arbitrary value such as $0$ but should ideally have a probability $\gt 0$ for the target distribution.

The equivalent Python code for **sample proposal** is simple:

```
def propose_sample(current_sample: float) -> float:
return current_sample + uniform() - 0.5
proposed_sample = propose_sample(x0)
```

`uniform()`

returns a continuous random variable with values between $0$ and $1$
($\mathcal{U}_{[0,1]}$), and the
code behind it can be read about here.

In step 3, **sample selection**, we probabilistically choose either $x_{proposed}$ or
$x_t$ to be the next sample value $x_{t+1}$.

In order to perform sample selection, we need a function $g(x)$ which is proportional to our distribution of interest $f(x)$. I.e. $g(x) \propto f(x)$ such that:

$\frac {f(a)} {f(b)} = \frac {cg(a)} {cg(b)} = \frac {g(a)} {g(b)}$Since the target distribution from which we’d like to sample is the Gaussian distribution $f(x)=\frac 1 {\sigma \sqrt {2 \pi}} e^{-\frac 1 2 ({\frac {x - \mu} \sigma})^2}$, we’ll choose $g(x)=e^{-\frac 1 2 ({\frac {x - \mu} \sigma})^2}$.

To determine whether $x_{t+1}$ takes the value of $x_t$ or $x_{proposed}$, we sample a value $u$ between 0 and 1 from a continuous uniform distribution $\mathcal{U}_{[0,1]}$ and let $x_{t+1}=x_t$ if $u \leq \frac {g(x_t)} {g(x_{proposed})}$ else $x_{t+1}=x_{proposed}$.

This logic can be seen in the following Python code:

```
def score(x: float, mu: float = 0, sigma: float = 1) -> float:
norm_x = (x - mu) / sigma
return math.exp(-(norm_x ** 2) / 2)
def get_next_sample(current_sample: float, current_sample_score: float) -> Tuple[float, float]:
# Calculate proposed value for x_{t+1}
proposed_sample = propose_sample(current_sample)
proposed_sample_score = score(proposed_sample)
# NOTE: This code was written with simplicity in mind, but there is no reason to
# sample from the uniform distribution if proposed_sample_score > self.current_sample_score
if uniform() <= (proposed_sample_score / current_sample_score):
current_sample = proposed_sample
current_sample_score = proposed_sample_score
return current_sample, current_sample_score
# x0 is the arbitrary starting point
x0 = 0
x0_score = score(x0)
x1, x1_score = get_next_sample(x0, x0_score)
x2, x2_score = get_next_sample(x1, x1_score)
```

Lastly, we’ll implement a stateful class to generate samples:

```
class MetropolisHastings:
def __init__(self, x0: float = 0) -> None:
self.sample = x0
self.sample_score = score(self.sample)
def __call__(self) -> float:
self.sample, self.sample_score = get_next_sample(self.sample, self.sample_score)
return self.sample
```

Now we can generate a set of samples and make sure the distribution looks as expected.

```
def gen_samples(f: Callable[[], float]) -> Iterator[float]:
for _ in tqdm(range(1000000)):
yield f()
metropolis = MetropolisHastings()
gaussian_samples = list(gen_samples(metropolis))
```

Sure enough, our Gaussian samples look pretty good.

That’s all for this post! If you have any feedback, please feel free to reach out.

]]>