21. October 2013

A PHP spell checker

How to Write a Spelling Corrector, by Peter Norvig, is a popular resource for instructions on how to produce spell check functionality. That page does some statistical Analysis of how this algorithm works and is definitely worth reading.

I’ve rewritten this alghorithm in PHP. It’s not as eloquent as the python code in the original, but I suspect it gains a tiny bit of readability in the trade-off for terseness.

more

09. October 2013

A Lattice class

Consider if you need a 7-dimensional lattice data structure. An array is great for one-dimensional data but once you add dimensions things quickly get difficult to manage. Here’s a data structure that takes away some of that headache:

more

27. September 2013

Using Shared Strings to Reduce Memory Usage

As of Excel 2007, files are saved in the Open XML format. This format is comprised of a grouping of XML files and assets, which are then zipped up and given the .xlsx extension. It’s a lot more readable from other programs than an old fashioned .xls file.

One means that was used to reduce the file size was setting up a shared strings table. Strings stored in a spreadsheet are given a numeric index and this numeric index is then stored in the xml file. In general, if a string is reused frequently the overhead of the shared string map will be payed off by the saving of only storing string indices.

more