Daily Archives: October 11, 2004

Magic Numbers

You learn something new every day. It seems that some strings of random numbers
are more
random than others
. That’s kind of interesting, but not really a surprise
when you think about it. Whenever we look at the characterisitcs of a string
of random digits occuring in Pi or e or some other irrational number, we are
looking at only a tiny fraction of the digits. Actually, it may not even be
accurate to describe it as a fraction.

The linked article describes how mathematician Steven Pincus made some interesting
discoveries when looking at the randomness of the first 280,000 digits of Pi,
the square root of 2, and several other irrational numbers. However, even 280,000
isn’ t really a fraction of an infinite number, now is it? How many digits
would it take before you had a representative sample of an infinite string?
I’m not a mathematician, but I’m guessing it would take an infinite string.

But before you wrap your head too tightly around that, consider what Pincus
observed when he started comparing these strings of digits: some have higher
levels of entropy (randomness), some lower. Then he started looking for the
same characteristic of entropy in real-world strings of numbers, such as you
might get from tracking, say, the stock market. He discovered that the stock
market hits its highest level of entropy right before a crash.

Pincus observes that entropy

appears to be a potentially useful marker of system stability, with rapid
increases possibly foreshadowing significant changes in a financial variable.

He goes on to conclude:

Independent of whether one chooses technical analysis, fundamental analysis,
or model building, a technology to directly quantify subtle changes in serial
structure has considerable real-world utility, allowing an edge to be gained…
And this applies whether the market is driven by earnings or by perceptions,
for both sort- and long-term investments.

Expect to hear a lot more about entropy and financial markets in the near future.
The movie Pi,
which I thought was well-made and entertaining, but suffered from a silly premise,
may just turn out to be prescient.


via GeekPress

Encyclopedia Galactica

Via Kurzweil
AI
, check out this modest proposal made at the Web
2.0 conference
in San Franciso:

Universal access to all human knowledge could be had for around $260m, a
conference about the web’s future has been told.

The idea of access for all was put forward by visionary Brewster Kahle, who
suggested starting by digitally scanning all 26 million books in the US Library
of Congress.

In his speech, Mr Kahle pointed out that most books are out of print most
of the time and only a tiny proportion are available on bookshop shelves.

He estimated that the scanned images would take up about a terabyte of space
and cost about $60,000 (£33,000) to store. Instead of needing a huge
building to hold them, the entire library could fit on a single shelf.

This is a tremendous idea; and the cost of doing it is only going to go down.
The initial scanning work is the only part of the plan that’s likely to present
much of an expense factor. According to Moore’s
Law
, that $60,000 price tag for storage should be somewhere around $2,000
eight years from now. If the estimate for the robot scanner is accurate, and
it follows a less robust drop in price — say halving once every four years
— we would be looking at a price tag of around $65 million in the same
period of time. Pretty doable, I’d say.

Unfortunately, the legal concept of public domain is rapidly
diminishing
, while copyright terms are lengthened and controls are made
more expansive. As John Bloom observed
a while back in The New Republic:

In the name of Mickey Mouse and other American icons, we have gradually lengthened
that 14-year limit on copyrights. At one time it was as much as 99 years,
then scaled back to 75 years, then — in one of the most anti-American
acts of the last century — suspended entirely in 1998. The Sonny Bono
Copyright Term Extension Act of that year says simply that there will be no
copyright expirations for 20 years, meaning that everything published between
1923 and 1943 will not be released into the public domain. Presumably they’ll
take up the matter again in 2018 and decide whether any of these books, movies,
or songs are ever set free. There are 400,000 of them.

So Kahle’s observation that few of these books are still on the shelf will
be beside the point. A scanned-in Library of Congress could conceivably serve
as a back-up to the print archive, providing an excellent disaster recover resourse,
but it would probably not be possible to distribute the whole archive. Only
those parts created before 1923.

Of course, there’s hope that, when the copyright issue is reviewed again by
Congress (presumably in 2018) the public will be more aware of what’s going
on and will not stand for any more expansions of copyright controls. Failing
that, maybe we could get an exception to copyright law into place. Perhaps we
could make this backup of the Library of Congress exempt from all copyright
restrictions as long as it’s used by schools and public libraries.

By 2018, the storage for a copy of the entire Library of Congress
online should cost less than $1000; even the cost of creating the archive
would be $15 million or less. We could put the entire Library of Congress
in every school in America.