16. August 2013

A Bloom Filter in c#

A bloom filter is a probabilistic data structure meant for checking whether a given entry does not occur in a list. It is meant to be quite fast, and is used as a way of not doing costly queries when it can be determined that no results will be returned. E.g., if you could turn this:

costly_lookup(key)

Into this:

if (!cheap_check_that_key_isnt_there()) {
    costly_lookup()
}

Then that’s a win.

more

13. August 2013

Skip List Implementation in PHP

A skip list is similar to a linked list, but it is always sorted and maintains multiple pointers. The multiple pointers allow fast traversal of the list so that you can quickly look for elements, essentially performing a binary search on the data.

more

26. July 2013

Finding near-solutions to Fermat’s Last Theorem in C#

Fermat’s Last Theorem States that there are no non-trivial integer solutions solutions to the equation:


x^{n} + y^{n} = z^{n}, n > 2

This theorem went unproven for centuries, and was proven to be true in 1995 by Andrew Wiles. The Treehouse of Horror VI episode of the Simpsons aired the same year. People with a keen interest in humourous background gags could have done a freeze frame and seen the following (source:

The gag here is that if you punched in

178212 + 184112

to a pocket calculator, you would get the same answer as if you punched in:

192212

So this equation would seem to be true, but it’s not. It’s just really close, and the reason you see the same number is due to floating point rounding. I.e. it’s not actually true, but it’s close enough to fool the calculators of the day. Sadly I don’t know who found this near-solution to give credit. I remember hearing a commentary by David X. Cohen, who is a mathematician and writer for The Simpsons and Futurama, mentioned some details about this.

Finding near-solutions to FLT is actually quite easy though. You just have to do a brute force search and you can output whatever numbers you find that are close enough to what you are looking for. Just define your ranges of exponents and bases and loop through, looking for solutions that match a certain threshold. Here’s an example written in C#:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {

            string lookingFor = ".00000000";
            for (double exponent = 11; exponent < 15; exponent++)
            {

                for (double doing = 701; doing < 20002; doing++)
                {

                    double result = Math.Pow(doing, exponent);

                    for (double x = Math.Floor(doing * 0.5); x < Math.Floor(doing * .95); x++)
                    {
                        double y1 = Math.Ceiling(Math.Pow(result - Math.Pow(x, exponent), 1 / exponent));
                        double y2 = Math.Floor(Math.Pow(result - Math.Pow(x, exponent), 1 / exponent));

                        // check the high number
                        double sum = Math.Pow(Math.Pow(y1, exponent) + Math.Pow(x, exponent), 1 / exponent);
                        string sumstring = sum.ToString();
                        if (sumstring.IndexOf(lookingFor) > -1)
                        {
                            Console.WriteLine(doing + "^" + exponent + " = " + x + "^" + exponent + " + " + y1 + "^" + exponent + ", actually: " + sumstring);
                        }

                        // then the low.
                        sum = Math.Pow(Math.Pow(y2, exponent) + Math.Pow(x, exponent), 1 / exponent);
                        sumstring = sum.ToString();
                        if (sumstring.IndexOf(lookingFor) > -1)
                        {
                            Console.WriteLine(doing + "^" + exponent + " = " + x + "^" + exponent + " + " + y2 + "^" + exponent + ", actually: " + sumstring);
                        }
                    }
                }
            }

            Console.ReadLine();
        }
    }
}

And doing this you can find some interesting near-solutions like:

[

19639^{11} = 15797^{11} + 19469^{11}, actually: 19639.0000000074

]

[

4472^{12} = 3987^{12} + 4365^{12}, actually: 4472.00000000706

]

[

14051^{13} = 11184^{13} + 13994^{13}, actually: 14051.0000000076

]

24. July 2013

A demonstration of the usefulness of Memoization in Lua

Memoization is a programming technique where you save expensive computations in memory to speed up function execution time. E.g. If you were writing a CMS and you wanted a getSignedInUserName() method, you wouldn’t want to make two database calls to show the user name at the top and bottom of the page, so you’d save it in memory for use later.

My canonical example of the speed increase you can get from this technique is with a Fibonacci calculator. Here is a non-optimized version:

fib = function(n)
    if n==1 or n==0 then
        return 1
    else
        return fib(n-1)+fib(n-2)
    end
end

print(fib(40))

This takes about 22 seconds to run on my development machine. Here’s a rewritten calculator that uses memoization:

results = {}
fib = function(n)
    if results[n] then
        return results[n]
    else
        if n==1 or n==0 then
            result = 1
        else
            result = fib(n-1)+fib(n-2)
        end
        results[n] = result
        return result
    end
end

print(fib(40))

This finishes in well under one second. In fact, calling the memoized function with fib(100) finishes in under a second too. The real benefit here is that you aren’t clogging up your call stack with hundreds of thousands of calls, each waiting on other calls. By having previous results on hand, the function can easily move on to the next step.

19. July 2013

Parsing mathematical expressions and calculating the result

Given a string like 2 * 2 + 2, how would you calculate the value? It’s necessary to tokenize the string, parse those tokens, and apply the order of operations to the parsed data.

Here is the function to throw a string at.

<?php

function calculate($input) {
    $tokens = tokenize($input);
    $parsed = parse_tokens($tokens);
    $result = calculate_from_parsed($parsed);
    return $result;
}
</pre>

And I just rely on PHP&#8217;s built in tokenizer, which I throw a wrapper around.

<pre class="brush: php; title: ; notranslate" title="">function tokenize($input) {

    $tokens = token_get_all("<?php $input");
    return $tokens;
}

So here is the first function that really does anything. Note that only pedmas rules are followed. The parser doesn’t deal with anything like exponentiation, trig functions, etc.

function parse_tokens($tokens) {

    if (!is_array($tokens)) {
        // invalid input.
        return false;
    }

    $expecting = null;

    $parsed_tokens = array();
    $skip_to = 0;

    foreach($tokens as $token_number => $token) {
        if ($token_number < $skip_to) continue;
        if (is_array($token) && isset($token[0])) {
            switch($token[0]) {
                case 305 :
                case 306 :
                    if (!is_null($expecting) && $expecting != 'number') {
                        exit('error 1: unexpected token ' . print_r($token, true) . "nn");
                    }
                    $expecting = 'operator';
                    $parsed_tokens[] = $token[1];
                case 372:
                case 375:
                    // whitespace
                    continue;

                default :
                    exit('error: unhandled token ' . print_r($token, true) . "nn");
            }
        } else {

            if (!is_null($expecting) && $expecting != 'operator' && $token != '(' && $token != ')') {
                exit('error 2: unexpected token ' . print_r($token, true) . "nn");
            }
            switch($token) {
                case '(' : 

                    $new_tokens = array();
                    $parentheses_count = 1;
                    for ($i = $token_number + 1; $i < count($tokens); $i++) {
                        if ($tokens[$i] == '(') {
                            $parentheses_count ++;

                        } else if ($tokens[$i] == ')') {
                            $parentheses_count --;
                        }

                        if ($parentheses_count != 0) {
                            $new_tokens[] = $tokens[$i];
                        } else {
                            $skip_to = $i;
                            $expecting = 'operator';
                            break;
                        }
                    }

                    $parsed_tokens[] = parse_tokens($new_tokens);
                    $expecting = 'operator';
                    break;
                case '+' :
                case '-' :
                case '*' :
                case '/' :
                    $parsed_tokens[] = $token;
                    $expecting = 'number';
                    break;
            }
        }

    }
    return $parsed_tokens;
}

Now run through the parsed tokens, apply the order of operations, and run until there is a result.

function calculate_from_parsed($parsed_tokens) {

    if (count($parsed_tokens) == 1 && !is_array($parsed_tokens[0])) {
        return $parsed_tokens[0];
    } else if (count($parsed_tokens) == 1 && is_array($parsed_tokens[0])) {
        return calculate_from_parsed($parsed_tokens[0]);
    } else {
        foreach($parsed_tokens as $token_number => $parsed_token) {
            if (is_array($parsed_token)) {
                $parsed_tokens[$token_number] = calculate_from_parsed($parsed_token);
            }
        }

        while (count($parsed_tokens) > 1) {
            $continue = false;
            foreach($parsed_tokens as $token_number => $parsed_token) {
                $previous_token_pair = get_previous_token_pair($parsed_tokens, $token_number);
                $previous_token = $previous_token_pair[0];
                $previous_token_index = $previous_token_pair[1];
                if ($parsed_token == '*' || $parsed_token == '/') {
                    if ($parsed_token == '*') {
                        $parsed_tokens[$token_number] = $previous_token * $parsed_tokens[$token_number + 1];
                        unset($parsed_tokens[$previous_token_index], $parsed_tokens[$token_number + 1]);
                        $continue = true;
                        break;
                    } else if ($parsed_token == '/') {
                        $parsed_tokens[$token_number] = $previous_token / $parsed_tokens[$token_number + 1];
                        unset($parsed_tokens[$previous_token_index], $parsed_tokens[$token_number + 1]);
                        $continue = true;
                        break;
                    }
                }
            }
            if ($continue) continue;

            $parsed_tokens = array_values($parsed_tokens);

            foreach($parsed_tokens as $token_number => $parsed_token) {
                $previous_token_pair = get_previous_token_pair($parsed_tokens, $token_number);
                $previous_token = $previous_token_pair[0];
                $previous_token_index = $previous_token_pair[1];
                if ($parsed_token == '+' || $parsed_token == '-') {
                    if ($parsed_token == '+') {
                        $parsed_tokens[$token_number] = $previous_token + $parsed_tokens[$token_number + 1];
                        unset($parsed_tokens[$previous_token_index], $parsed_tokens[$token_number + 1]);
                        break;
                    } else if ($parsed_token == '-') {
                        $parsed_tokens[$token_number] = $previous_token - $parsed_tokens[$token_number + 1];
                        unset($parsed_tokens[$previous_token_index], $parsed_tokens[$token_number + 1]);
                        break;
                    }
                }
            }

            $parsed_tokens = array_values($parsed_tokens);
        }
        if (count($parsed_tokens) == 1) {
            $parsed_tokens = array_values($parsed_tokens);
            return $parsed_tokens[0];
        }
    }
}

And since we pop results out of the array as we go we need a helper function for retrieving the previous populated element in an array.

function get_previous_token_pair($tokens, $token_number) {
    $return = false;
    for ($i = $token_number - 1; $i > -1; $i--) {
        if (isset($tokens[$i])) {
            $return = array($tokens[$i], $i);
            break;
        }
    }
    return $return;
}

And here is a little test suite, all of which pass:

$tests = array(
    array('(1.1 + ((1)))', 2.1),
    array('2 + 2', 4),
    array('1 + 1 + 1 + 1 + 1    + 1 + 1 + 1 + 1 + 1      + 1 +1 + 1 + 1 + 1', 15),
    array('2*(1 + 1 + 1 + 1 + 1)    + 3*(1 + 1 + 1 + 1 + 1)      + 4*(1 +1 + 1 + 1 + 1)', 45),
    array('(1 + 1 + 1 + 1 + 1)*2 + (1 + 4)*3 + (1 +1 + .5+.5 + 1 + 1)*5', 50),
    array('2 * 2', 4),
    array('2 * (1-1)', 0),
    array('2 * 2 + 2', 6),
    array('2+ 2 * 2', 6),
    array('(2+ 2) * 2 + 2', 10),
    array('(2+ 3 + 1) / 3 + 7', 9),
    array('2+ 2 * 2 + 2', 8),
    array('729 / 3 / 3 / 3 / 3', 9),
    array('729 / 3 / (3 / 3) / 3', 81),
    array('2 + 1', 3)
);

foreach($tests as $test) {
    $input = $test[0];
    echo "$input = ";
    $result = $test[1];
    $generated_result = calculate($input);
    if ($generated_result == $result) {
        echo "$generated_result passedn";
    } else {
        echo "$generated_result <-- FAILEDn";
    }
}

10. July 2013

Viable promises in PHP using pthreads

I’ve looked at making promises in PHP before, but it was a bit pointless due to PHP’s synchronous nature.

But PHP isn’t necessarily synchronous. You can add threading capabilities by installing pthreads. Using this library it is possible to set up functioning promises (with a few limitations) in PHP.

Pthreads supports stacking threads, which is essentially a promise. As such, this blog post is essentially a rehash of any of the pthreads examples of Stackable. The basic idea is to set up a class that inherits from Worker, initialize an instance, start it going, then stack on an instance of a class inheriting from Stackable.

My implementation of this pattern will allow the passing of arbitrary functions to these classes, which is what the promise pattern is all about. It works, but there is a limitation. You can pass function names in (e.g. count_to_a_million), but not closures (e.g function() {echo "foo";}). [Aside: for some reason PHP calls anonymous functions closures even though they are only related concepts.] It appears that pthreads has some hidden serialization of parameters going on under the hood, and PHP does not support serialization of closures. Because of this, my implementation only supports the passing of function names (although it could be modified to accept parameters as well, as those could be serializable).

Here are the classes:

<?php

class PromiseClass extends Worker {
    private $_promise = null;
    public function run() {
        $func = $this->_promise;
        $func();
    }
    public function __construct($promise) {
        $this->_promise = $promise;
    }
}

class ThenClass extends Stackable {
    private $_promise = null;
    public function __construct($promise) {
        $this->_promise = $promise;
    }
    public function run() {
        $func = $this->_promise;
        $func();
    }

}

As you can see PromiseClass and ThenClass have the same extended properties and methods, but are based on different classes.

Here is how to use these classes to implement Promises in PHP:

function then_function() {
    echo "and then...n";
}

function promise_function() {
    echo "promise function called...n";
}

$promiser = new PromiseClass('promise_function');

$then = new ThenClass('then_function');

$promiser->start();
$promiser->stack($then);

for ($i = 0; $i < 20; $i++) {
    echo "testn";
}

The output for the above example should be something like the following (the outputs of “test” are there to show that this work is asynchronous):

promise function called...
test
test
...
test
and then...
test
test
...

Promises are most useful as a means of waiting for data before performing an action with it. In that regard this is not an ideal solution as it would require some form of kludge. If PHP had proper closures it would be reasonable but in this form it would likely need to require use of the global operator which is never nice.