Learning mathematica from scratch

I have many defaults - IMHO the #1 being that I am not as good in math as I used to be 15 years ago (I used to do my integrals by hand - that was *before* med school). Even worse is the fact that I am not using symbolic computation software to supplement my failings.

Serendipity lead me to Mathematica, after a Hacker News post. At this moment I’m still very new with it, but since the reading list I found on Quora was not satisfying, maybe it is an opportunity to help fellow Mathematica newbies?

My suggestions : start with http://www.indiana.edu/~statmath/math/mma/gettingstarted/printable.pdf where you get simple examples and a good explanation of what to expect.

If you dig economics, follow that by http://fmwww.bc.edu/ec-p/wp364.pdf, which will explain in plain words why it is a good to use Mathematica, and then http://www.economicsnetwork.ac.uk/cheer/ch19/kochanski.pdf to play with a Cournot duopoly and do some iterations, or even better (and simpler) http://capone.mtsu.edu/jee/pdf/hodgin.PDF to graph total costs and average costs.

You may then need some tips, at least to understand all of it - http://www.bandgap.cs.rice.edu/personal/adrice_swong/public/WebPages/mathematica/MathTips.htm provides helpful things, like setting the plot range or plotting 2 functions on the same graph.

With that done, you may then benefit from a longer and more detailed text, such as http://www.econ.ucdavis.edu/faculty/kdsalyer/LECTURES/Ecn200e/mathematica_intro.pdf

The text is a bit incomplete, for example there is nothing about bases (hexadecimal, binary). You should complete that that with the “official” guide, http://reference.wolfram.com/mathematica/guide/LanguageOverview.html - for example http://reference.wolfram.com/mathematica/guide/Syntax.html if you are interested in bases.

If you are like me, you may have to look for some references. My total differential knowledge was a bit stale so I checked the excellent http://www.solitaryroad.com/c353.html page and after that read the critic of the wikipedia current entry on http://spin0r.wordpress.com/2013/01/04/the-difference-between-partial-and-total-derivatives/

I am now starting the tutorial on http://www.physics.umd.edu/courses/CourseWare/EssentialMathematica/before learning how to program with Mathematica. More on that later.

Running PostgreSQL on Mac OSX Mountain Lion in 2 minutes

I don’t like OSX Server- I tried to live with it, but it was more pain that it was worth. With Mountain Lion, I missed having a PostgresSQL server on my laptop. But I didn’t like the idea of using homebrew either - I best like Rudix approach : unpack and you are ready to go.

So I took PostgreSQL from apple “opensource” website, recompiled it with XCode, and made a nice tarball for everyone to play with. Installation takes about 2 minutes currently, and it integrates well with an “out of the box” Mountain Lion OSX. Here we will explain what to do if you want to do the same.

First, as root we will clean the existing Apple default files (it’s a broken PostgreSQL installation missing binaries) and unpack my binaries. Since we will use rm as root, you may want to backup your system first (better be safe that sorry):

sudo bash

rm /System/Library/LaunchDaemons/org.postgresql.postgres_alt.plist

rm /Library/Preferences/org.postgresql.postgres.plist

curl https://dl.dropboxusercontent.com/u/984275/osx-postgresql-91/OSXLion-Postgresql-9_1-clean.zip > /postgres.zip

cd / && unzip postgres.zip

You can also get the sources I used, and recompile them with XCode yourself : https://dl.dropboxusercontent.com/u/984275/osx-postgresql-91/PostgreSQL-56.sources.tar or even the build (maybe you just want to edit one file and recompile ?) from https://dl.dropboxusercontent.com/u/984275/osx-postgresql-91/PostgreSQL-56.build.tgz.

My XCode was missing AEP.make, so it’s included in PostgreSQL-56.sources.tar. If you don’t trust my binaries, simply grap AEP.make rebuilt everything using XCode and Apple original sourcetree from http://www.opensource.apple.com/source/PostgreSQL/PostgreSQL-56/

Unzipping the OSXLion-Postgresql-9_1-clean.zip into / will create a /clean directory, so you can manually move the files to /usr if you want to check them first.

The largest part of the work is done. We will now use these binaries to populate the default database:

sudo -u _postgres initdb -D /var/pgsql -U _postgres

But that’s not enough. If you try to start postgres manually and want to use Apple defaults, it will fail silently as you can see in your system.log

To see why, start it manually with:

sudo -u _postgres /bin/bash

postgres  -D /var/pgsql -c unix_socket_directory=/var/pgsql_socket/ -c logging_collector=on -c log_connections=on -c log_lock_waits=on -c log_statement=ddl -c log_line_prefix=%t  -c listen_addresses= -c log_directory=/Library/Logs -c log_filename=PostgreSQL.log -c unix_socket_group=_postgres -c unix_socket_permissions=0770

You will see the logfile is causing a problem:

2013-03-07 13:54:28 ASTFATAL:  could not open log file “/Library/Logs/PostgreSQL.log”: Permission denied

Simply create it as root and chown it for postgres to use with:

touch /Library/Logs/PostgreSQL.log

chown _postgres:_postgres /Library/Logs/PostgreSQL.log

Restart postgres manually with the previous command - success !!- but from now on, you can just start it automagically on reboot with launchd:

launchctl load /System/Library/LaunchDaemons/org.postgresql.postgres.plist 

If you plan to connect as your normal user, you may want to join the existing _postgres group with:

dscl . -append /Groups/_postgres GroupMembership guylhem

Then, before connecting under your normal username, test postgres on the default database with:

sudo -u _postgres psql template1

Now you can create databases, users and provide permissions as you would on any unix. Just be carefull not to use too much space on /var/pgsql

For example, I created a guylhem database and matching username with:

CREATE USER guylhem WITH PASSWORD ‘secret’;

CREATE DATABASE guylhem;

GRANT ALL PRIVILEGES ON DATABASE guylhem to guylhem;

Now you can close the root session, and load all kind of data under your username. If you also want Apache HTTP server, check http://en.blog.guylhem.net/post/48296064787/enabling-the-local-web-server-on-osx-mountain-lion

You can also try Induction.app locally - it is a nice app. Be sure to check the worldbank data article to have something to test your new PostgresSQL : see http://en.blog.guylhem.net/post/45890117071/how-to-handle-wdi-data-from-the-world-bank and to extract data for an augmented Solow growth model (Mankiw style!) http://en.blog.guylhem.net/post/47150641009/querying-the-world-bank-sql-database-to-make-an

A final warning : we are not using the latest version here. For local test and development, I did not see the need to have the latest security patches, especially when considering it could encourage readers to put that online. That is not safe. You should not use binaries you wget from a stranger on the web on a production setup. Just use XCode yourself from Apple sourcetree.

Adding a Hacker News ticker to your website

A little while ago, I had this weird idea of making contact easier with people who read my essays, thanks to a single website were my “online interactions” would be gathered. But in the time of Facebook updates or Twitter comments, I had to gather them too.

So I create guylhem.net, but the only problem was I was not an avid Facebook and Twitter user yet, so I added my blog posts there too.

A couple of years later, while I’m still not really into Facebook and Twitter, another problem emerged: since I have started to use Hacker News, more and more of my comments are created there - and even more than on my blog, since there is more interaction with other comments on HN. (at least, this validates post hoc the “guylhem.net” idea!)

The solution was simple - adding a HN ticker, with my latest posts !

While there is a lot of code to do that with Facebook, Twitter, Linkedin, etc. I could not find any code to do that with HN, so I did what every hacker would do and created one myself.

HN does not offer a JSON API, so started I playing with simple regexes like with curl to create a static file I could include, for ex: curl https://news.ycombinator.com/threads?id=guylhem  | grep “id=guylhem">guylhem</a>” g|grep “| on”

This was not a satisfactory long term solution, since it requires server-side scripts.

I also thought about client-side parsing, but since were are talking about non-JSON files, cross domain scripting will prevent anything unless you add something like the following to your server (see https://www.bionicspirit.com/blog/2011/03/24/cross-domain-requests.html),

Access-Control-Allow-Methods: GET, POST, OPTIONS

Access-Control-Allow-Credentials: true

Access-Control-Allow-Origin: news.ycombinator.com

Access-Control-Allow-Headers: Content-Type, *

Turns out the solution was simpler than imagined - thriftdb provides an API, and a reverse time query with the following gives a JSON object:

http://api.thriftdb.com/api.hnsearch.com/items/_search?limit=10&pretty_print=true&filter[fields][username]=guylhem&sortby=create_ts%20desc

With that in hand, creating a ticker was easy. Simply edit the following PHP code to get links to your own comments -  obviously at least replace “guylhem” with your own username and date_default_timezone_set with your location:

date_default_timezone_set(‘America/Martinique’);

$user = “guylhem”;

$count = 10;

$date_format = ‘Y-m-d H:i:s’;

$url =’http://api.thriftdb.com/api.hnsearch.com/items/_search?limit=’.$count.’&filter[fields][username]=’.$user.’&sortby=create_ts%20desc’;

$json=file_get_contents($url);

$decode = json_decode($json);

foreach ($decode->results as $comment) {

$datetime = date_create($comment->item->create_ts);

$date = date_format($datetime, $date_format).”\n”;

$article_id=$comment->item->discussion->id;

echo ‘<a href=”https://news.ycombinator.com/item?id=’.$article_id.’”>’;

echo $date . ’ - Comment on “’ . $comment->item->discussion->title .’”’;

echo “</a><br>”;

}

You can create a fancier HTML version (marquee text and the like), personally I like it simple. The goal is just to let my friends know when the latest update was (french blog - 2011 - oops ; HN - today 10 minutes ago - much better) and which topics I enjoy.

Enabling the local web server on OSX mountain lion

You can no longer use “web sharing” in preferences. Too bad, that was easy, quick and dirty.

It is still possible to use a local web server, but you have to :

  • Edit the conf file as explained in http://osxdaily.com/2012/09/02/start-apache-web-server-mac-os-x/
  • Add PHP or Perl as explained in http://osxdaily.com/2012/09/10/enable-php-apache-mac-os-x/
  • Either do “apachect start”, or to make that permananent, turn it to enabled in /System/Library/LaunchDaemons/org.apache.httpd.plist, then unload and load apache with: sudo launchctl unload -w /System/Library/LaunchDaemons/org.apache.httpd.plist ; sudo launchctl load -w /System/Library/LaunchDaemons/org.apache.httpd.plist

Querying the World Bank SQL database to make an augmented Solow growth model

A few weeks ago we created an SQL database with all the world bank data. You can do all king of cool things with it!

For example, let’s look at the brazilian GDP in 2000 USD, per year:

SELECT year as x, data as y from wdi where country=’BRA’ and indicator=’NY.GDP.MKTP.KD’ order by year;

What about putting next to each other this GDP, per year, for France and Germany?

SELECT fra.year as x, ger.data AS y2, fra.data AS y1                                                                        FROM wdi as fra, wdi as ger

WHERE ger.country=’DEU’ and

 fra.country=’FRA’ and

 ger.indicator=’NY.GDP.MKTP.KD’ and

 fra.indicator=’NY.GDP.MKTP.KD’ and

 ger.year=fra.year;

Even better, let’s add more than 2 columns - say enough to make an augmented Solow growth model, using the Gross fixed capital formation and a Mankiw like “SCHOOL” variable, for Peru:

select year,

MAX(case when indicator=’NY.GDP.MKTP.KD’ then data else NULL end) as Y,

MAX(case when indicator=’SP.POP.TOTL’ then data else NULL end) as population,

MAX(case when indicator=’SP.POP.1564.TO.ZS’ then data else NULL end) as population_1564_percent,

MAX(case when indicator=’NE.GDI.FTOT.ZS’ then data else NULL end) as GFCF_percent,

MAX(case when indicator=’SL.EMP.1524.SP.ZS’ then data else NULL end) as SCHOOL_employ,

MAX(case when indicator=’SE.SEC.ENRR’ then data else NULL end) as SCHOOL_enroll

from wdi where country=’PER’ group by year order by year;

You may not have enough datapoints to do an OLS regression for a single country, but the data is more complete with other countries - or you can take them all together.

Enjoy!

Linux does it better

Filevault access is sometimes a problem to service OSX computers. Guess what - there’s some nice code out there on http://code.google.com/p/libfvde/ to solve that very problem. Enjoy.

How to handle WDI data from the World Bank ?

Recently, I wanted to get some data to answer a question I had.

I started with the graphs done by google using WDI data : try http://www.google.com/publicdata/explore?ds=d5bncppjof8f9_&ctype=l&met_y=sh_xpd_pcap_pp_kd#!ctype=l&strail=false&bcs=d&nselm=h&met_y=completion_rate&fdim_y=education_level:2&scale_y=lin&ind_y=false&rdim=region&idim=region:LAC&idim=country:ARG:BRA:CHL:PER&ifdim=region&hl=en_US&dl=en_US&ind=false  

Ain’t that colorful and pretty? As good as they look, these graphs did not provide any definitive answer, just an intuition.

The next step was therefore to perform some good old number crunching (you know the drill - think different, think econometrics!).

Of course, to properly regress a model, besides the intuition you need the actual data.

The worldbank is quite developer friendly : you can get a nice CSV file at http://databank.worldbank.org/data/download/WDI_csv.zip but unless you are well versed in CSV to SQL conversion, it will not be very useful for your purposes. Alternatively, if you select the XLS file, you will spent a lot of time with a spreadsheet.

Some website like http://fmwww.bc.edu/ec-p/data/obsolete/wdi97.html refer to an elusive “old version”, which was available as an Oracle database on CDROM, but there does not seems to be any option to download anything close to that.

“No problemo” - that will be the topic of this blog entry : let’s create a good SQL database to build our econometric model. Let’s create something just as good as the Oracle database the wordbank no longer provides, and if we are clever enough, maybe even better :-)

As usual when you create a database, your first question is how big the fields should be given your data. This is important, because if you oversize your fields - say using a numeric where a smallint would do, you will create unnecessary slowness when you perform calculations.

In the WDI_Data.CSV file, one will quickly notice the indicators are using floating point arithmetic with a 16 chars significant.

Let’s load the other files first.

Analysing and deciding on the optimal field size is an art by itself, an art I do not pretend to master - here I will just refer you to the results of a custom analysis perl script I wrote, based on PostgreSQL 9 data type specification, to make the following best guesses to create my tables:

create table WDI_CS_Notes (

CountryCode char(3),

SeriesCode char(16),

DESCRIPTION varchar);

create table WDI_Country (

countrycode char(4),

long_name char(74),

income_group char(21),

region char(27),

lending_category char(6),

other_groups char(10),

currency_unit char(43),

latest_population_census char(22),

latest_household_survey char(95),

special_notes char(256),

national_accounts_base_year char(120),

national_accounts_reference_year smallint,

system_of_national_accounts char(63),

sna_price_valuation char(37),

alternative_conversion_factor char(19),

balance_of_payments_manual_in_use char(52),

external_debt_reporting_status char(12),

system_of_trade char(21),

government_accounting_concept char(32),

imf_data_dissemination_standard char(43),

source_of_most_recent_income_and_expenditure_data char(91),

vital_registration_complete char(4),

latest_agricultural_census char(10),

latest_industrial_data smallint,

latest_trade_data smallint,

latest_water_withdrawal_data char(28),

alpha_code char(3),

wb_code char(3),

table_name char(47),

short_name char(47)

);

create table WDI_Description (

world_development_indicators text

);

create table WDI_Data (

country_name char(46),

country_code char(3),

indicator_name char(129),

indicator_code char(20),

y1960 double precision,

(….)

y2012 double precision

);

create table WDI_Footnotes (

countrycode char(3),

seriescode char(20),

timecode char(6),

description text);

create table WDI_Series (

seriescode char(20),

indicator_name char(129),

short_definition text, 

long_definition text, 

source char(255),

topic char(98),

dataset text, 

unit_of_measure text, 

power_code text, 

periodicity char(6),

base_period char(17),

derivation_method text, 

aggregation_method char(18),

conceptual_implications_and_limitations char(151),

notes_from_original_source char(255),

general_comments char(255),

reference_period text, 

other_data_characteristics text);

Then simply populate the databases with :

set client_encoding to ‘latin1’;

\COPY WDI_CS_Notes FROM ‘~/WDI_csv/WDI_CS_Notes WITH DELIMITER ‘,’ CSV HEADER;

\COPY WDI_Country FROM ‘~/WDI_csv/WDI_Country WITH DELIMITER ‘,’ CSV HEADER;

\COPY WDI_Data FROM ‘~/WDI_csv/WDI_Data WITH DELIMITER ‘,’ CSV HEADER;

\COPY WDI_Description FROM ‘~/WDI_csv/WDI_Description WITH DELIMITER ‘,’ CSV HEADER;

\COPY WDI_Footnotes FROM ‘~/WDI_csv/WDI_Footnotes WITH DELIMITER ‘,’ CSV HEADER;

\COPY WDI_Series FROM ‘~/WDI_csv/WDI_Series WITH DELIMITER ‘,’ CSV HEADER;

It will work. but you will suffer.

The WDI_data table you have just sucks. It may be easy to update, but besides that it is an ugly baby which will cause you troubles, unless you love pivot table and spending time in your queries.

Take my cheap tip: Use the following perl script intead to create a better wdi.csv  matching table, which you can then load and index :

#!/usr/bin/perl

use strict;

use warnings;

use Data::Dumper;

use Tie::Handle::CSV;

use Scalar::Util::Numeric qw(isint isfloat);

my $file = $ARGV[0] or die “Need to get WDI Data CSV file on the command line\n”;

my $csv = Tie::Handle::CSV->new($file, header => 1, sep_char => “,”);

my @fields=$csv->header;

my %result;

while( my $record = <$csv> ) {

my $cc=$record->{‘Country Code’};

my $ic=$record->{‘Indicator Code’};

 foreach my $field (@{ $fields[0]}) {

   if (isint ($field)) {

    if (length($record->{$field})>1) {

      print “$cc,$ic,$field,” . $record->{$field} . “\n”;

   } # if defined

  } # if isnum

 } # foreach field

} # while record

To load and index it, use:

create table wdi (country char(3), indicator char(20), year smallint, data double precision);

\COPY WDI_Series FROM ‘~/WDI_csv/WDI_Series WITH DELIMITER ‘,’ CSV HEADER;

Now you have something that looks similar to the specifications of the Oracle database -  let’s optimize queries with some B-trees :

create index on wdi (country);

create index on wdi (indicator);

create index on wdi (year);

You will spend more disk space using that approach, but your queries will be much simpler and faster :

 Schema |      Name       | Type  |  Owner  |  Size   | Description 

————+————————-+———-+————-+————-+——————-

 public | wdi             | table | guylhem | 360 MB  | 

 public | wdi_country     | table | guylhem | 264 kB  | 

 public | wdi_cs_notes    | table | guylhem | 480 kB  | 

 public | wdi_data        | table | guylhem | 116 MB  | 

 public | wdi_description | table | guylhem | 48 kB   | 

 public | wdi_footnotes   | table | guylhem | 46 MB   | 

 public | wdi_series      | table | guylhem | 1792 kB | 

I’m also spending some space on the index:

 Schema |       Name        | Type  |  Owner  | Table |  Size  | Description 

————+—————————-+———-+————-+———-+————+——————-

 public | wdi_country_idx   | index | guylhem | wdi   | 118 MB | 

 public | wdi_indicator_idx | index | guylhem | wdi   | 214 MB | 

 public | wdi_year_idx      | index | guylhem | wdi   | 118 MB | 

All done, you can now query your database and quickly get your results, check this :

select * from wdi where country=’FRA’                                                       and indicator=’NY.GDP.MKTP.KD’ order by year desc;

We get the GDP per capita in 2000 USD for France, all 52 values being return in less than 10 milliseconds.

Let’s check the query.

explain analyze select * from wdi where country=’FRA’                                                       and indicator=’NY.GDP.MKTP.KD’ order by year desc;

 Sort  (cost=1804.88..1805.23 rows=138 width=41) (actual time=7.965..7.974 rows=52 loops=1)

   Sort Key: year

   Sort Method: quicksort  Memory: 29kB

   ->  Bitmap Heap Scan on wdi  (cost=1268.58..1799.98 rows=138 width=41) (actual time=7.908..

         Recheck Cond: ((country = ‘FRA’::bpchar) AND (indicator = ‘NY.GDP.MKTP.KD’::bpchar))

         ->  BitmapAnd  (cost=1268.58..1268.58 rows=138 width=0) (actual time=7.891..7.891 row

               ->  Bitmap Index Scan on wdi_country_idx  (cost=0.00..511.89 rows=27604 width=0

                     Index Cond: (country = ‘FRA’::bpchar)

               ->  Bitmap Index Scan on wdi_indicator_idx  (cost=0.00..756.37 rows=27604 width

                     Index Cond: (indicator = ‘NY.GDP.MKTP.KD’::bpchar)

 Total runtime: 8.042 ms

image

8 ms is good enough, considering I’m running PostgreSQL and the query GUI ON MY MACBOOK AIR LAPTOP!!! (BTW  visual query look great using induction.app)

I can now try to create my models, safe in the belief I will not be limited by the availability of the raw numbers, or by  speed of the database  :-)

PostgreSQL new goodies

Check the slides on Postgres: The Bits You Haven’t Found - many interesting features are coming in the 9.x series - windowing functions are what I upgraded a 8.x server to 9.x a bit while ago.

Yet windowing functions are just the tip of the iceberg.

Fixing google voice “download plugin” with safari on OSX Mountain Lion

If your google voice session always ask you to download the plugin - then works when you click on the link *without* actually downloading the plugin, but only once - as mentioned on http://productforums.google.com/forum/#!topic/gmail/rMhjdK3q0eY, then the problem is an Adobe Flash permission.

Simply remove ~/Library/Preferences/Macromedia/Flash\ Player/macromedia.com/support/flashplayer/sys/settings.sol and you will be done after Safari is restarted.

Before :

^@<BF>^@^@^G<A4>TCSO^@^D^@^@^@^@^@^Hsettings^@^@^@^@^@^PcrossdomainAllow^A^@^@^@^RautoUpdateInterval^@@^\^@^@^@^@^@^@^@^@^Nsafefullscreen^A^@^@^@^QcrossdomainAlways^A^@^@^@^QwindowlessDisable^A^@^@^@^QdisallowP2PUplink^A^@^@^@^RautoUpdateDisabled^A^@^@^@^Mdefaultalways^A^@^@^@^ZsecureCrossDomainCacheSize^@<BF><F0>^@^@

^@^@^@^@^@^@^XautoUpdateDefaultUpdated^A^A^@^@^Qdefaultmicrophone^B^@^@^@^@^XallowThirdPartyLSOAccess^A^A^@^@^Oechosuppression^A^@^@^@^Gdomains (…)

After you remove it, the file will be recreated:

^@<BF>^@^@^A<E6>TCSO^@^D^@^@^@^@^@^Hsettings^@^@^@^@^@^Dgain^@@I^@^@^@^@^@^@^@^@^Oechosuppression^A^@^@^@^Qdefaultmicrophone^B^@^@^@^@^Mdefaultcamera^B^@^@^@^@

^Mdefaultklimit^@@Y^@^@^@^@^@^@^@^@^Mdefaultalways^A^@^@^@^QwindowlessDisable^A

^@^@^@^RautoUpdateDisabled^A^@^@^@^XautoUpdateDefaultUpdated^A^A^@^@^RautoUpdateInterval^@@^\^@^@^@^@^@^@^@^@^SautoUpdateLastCheck^@Bs<D4>_^B<EF><DD>^B^@^@^PcrossdomainAllow^A^@^@^@^QcrossdomainAlways^A^@^@^@^ZsecureCrossDomainCacheSize^@

<BF><F0>^@^@^@^@^@^@^@^@^XallowThirdPartyLSOAccess^A^A^@^@^LtrustedPaths^C^@^@

      ^@^@^Nsafefullscreen^A^@^@^@^QdisallowP2PUplink^A^@^@^@^XauthorizedFeaturesExpiry^@^@^@^@^@^@^@^@^@^@