Machine intelligence: the earthmoving equipment of the information age, and the future of meaningful lives

Everybody today is overwhelmed by information

What we need more than anything else today is some heavy-lifting equipment for the mountains of heavy information debris we find piling up around us every day.

Most people in the developed world are overwhelmed right now by the information firehose, and few people are living lives in which the the balance of their time is a good reflection of their true deepest priorities.

Machine intelligence is the earthmoving equipment of the information age

The primary reason we need to build machine intelligence in the information age is that it will help us magnify our information-processing capabilities by orders of magnitude, the way that physical machines like cars/planes/bulldozers/cranes have long magnified our physical strength by orders of magnitude.

Fundamentally, I see continuing symbiotic augmentation of human intelligence -- with human biological intelligence in the driver's seat -- as a far more plausible future than that "Singularity Extremists" (in spite of being a graduate of Singularity University myself).

Technology so far has always served humanity as a toolset -- as an extension of our own intelligence and physical capabilities. Technology hasn't yet started serving itself, because it hasn't yet become sentient. Personally though, I disagree with many futurists that the "coming technological singularity" is either a distinct point in time, the same way that an exponential curve doesn't actually have an uptick or cusp or "knee of the curve", it has the same exact overall shape regardless of what vertical scale you stretch it to. I also disagree that eventually machines will become more intelligent than humans (after which we would render ourselves redundant, because artificial intelligence would be able to innovate faster than we can).  My view is that technology will always serve as a toolset for human intelligence, regardless of how "smart" it begins to act or seem, and regardless of how powerful it becomes, just as a car or plane is a large and powerful machine that is ultimately a prosthetic extension of the body and therefore the human brain controlling it.

We will always "keep up" with any intelligence we create, because the sum of (our brain + a machine brain) will always be more advanced than just the machine brain alone.

The real reason we need AI: we need a networked I/O interface for our brain

The most important role that so-called "AGIs" (Artificial General Intelligences) will serve in the future are:
  1. To deliver only the highest-quality, most-important, most distilled/focused information possible through the bandwidth-limited channels of our physical senses, and in particular through our visual channel, which has the highest bandwidth of all our senses [input], and
  2. To empower us to transform enormous quantities of external information in meaningful ways through a few simple and easy-to-learn actions [output].
This will give us the power tools to actively reshape the piles of information debris in our lives so that we can each build something beautiful, and so that we can live lives that are true to our deepest values and priorities.

We must build intelligent machines for the future of humanity

We must build these intelligent information prostheses, because it is far too easy to get distracted and overwhelmed today. If we want to continue living meaningful lives while still swimming in information, then without such tools we will reach the end of our lives and wonder why we didn't spend more time on the things that mattered most.


The commoditization of technology, and when to open the source

The point of open source is not to kill competition, it's to enable innovation at a higher level

I have exclusively used Free (as in freedom) and open source technologies for more than 15 years, and have contributed to a number of open source projects. However, virtually every person in the developed world now uses open source technologies every single day, usually without usually even knowing it. We carry open source cellphone software in our pocket if we use Android, we use open source browser technology if we browse the Web with Chrome or Firefox, and at the very least, we request pages from open source webservers running on top of open source operating systems when we view the majority of pages on the Web.

On face value, it seems counter-intuitive that a company could survive if they were to open source their products or technologies, since if the product is available for free in source form, competitors could take these products or technologies and use them to compete without doing their own development work, and/or sell the originating company's products themselves. Consumers could obtain and use products for free without purchasing them, by building the products themselves from source or by using versions built and shared by other consumers.

However, at least from an economics perspective, the point of open source is not to kill competition, it's to enable competition and innovation to take place at a higher level of abstraction than was previously possible.

Frontier science eventually transitions from public to private development

In the early days of space exploration, you couldn’t just buy an off-the-shelf rocket engine; only governments with multi-billion dollar budgets could design and build rockets that could put an object into orbit, much less a man on the moon. However in 2004 a private company, Scaled Composites, won the $10M Ansari X-Prize for putting a civilian in space twice within a two-week period. The US government has subsequently awarded multiple heavy-launch contracts to another private company, Elon Musk's SpaceX, to replace its own aging heavy lift technologies, and recently canceled its own costly Constellation program that was intended to replace the shuttle. The US Government is still debating what technology should replace the shuttle and its launch system, but it is becoming increasingly clear that we have reached an era where the US Government is no longer able to compete with commercial offerings. This is a complete inversion of the situation in the early days of the space race, when there was no way the private sector could afford to compete with the financial resources that the government was able to throw at open-ended space research.

Commoditization is an inevitable process for all technology

Every industry goes through this transition towards the commoditization of technology, which is the stage at which a technology is commercially available in off-the-shelf form, and/or when the knowledge or parts required to build the technology are freely available and anybody with sufficient skill in the art can build it if they expend sufficient effort. When a technology has become commoditized, it becomes the newest "greatest common denominator" for the industry.

The process of commoditization has already happened in the PC hardware space (with the development of generic PCs that took the bottom out of the mainframe and minicomputer markets and made computers accessible to all), it has long been happening in the operating system space (with Linux and BSD), it continues to happen in the web technologies space (with browsers, web servers, web toolkits, the evolution of browser extensions into web standards, etc.), and is currently happening at an alarming rate in the mobile space (with Android in particular).  There are many examples of this in every industry.


Price is usually the first thing to go, then a little later it becomes clear that technologies that are free but closed-source, and the communities they serve, stand to benefit substantially from opening the source.

Commoditization of technology enables innovation at a higher level

The interesting thing about the commoditization of technology is that it affords all players a new solid foundation of common-denominator technology to build upon, and innovation can then move up to the next level of abstraction. Nobody builds their own web server anymore -- it doesn’t make sense to reinvent that wheel when there are several incredibly powerful, commoditized open source options; everybody just uses (and innovates on top of) Apache/Cherokee etc. In the mobile space, the availability and commoditization of remarkably rich app development platforms such as Android has led to a Cambrian explosion of diversity in mobile applications that was simply not possible before powerful and featureful mobile operating systems became a commodity on the majority of cellphones.

Those companies that do not innovate on top of commoditized technologies will perish

Most companies are very familiar with the maxim, "innovate or perish."  However, many companies also fear that the commoditization of technology, combined with the open sourcing of commoditized technology, is a recipe for destroying business models and even entire industries. It is understandable that this would be a big concern for many industries, especially if their market is small. However, some companies divert too much energy towards holding onto the IP they already have, doing whatever they can to resist forces that would lead to the commoditization of those technologies. Progress in that market ends up becoming enmired. Ironically, this allows the company to continue to operate within its comfort zone -- but it is not a sustainable model and is certainly not a growth model.

Nobody wins in the long term if a company is attempting to suppress innovation in order to survive. History has shown that companies that understand that commoditization is an inevitable process, disruptive though it is, i.e. companies that learn to adapt nimbly to the changing tech landscape and begin building one level above the current level of abstraction, are the very companies that will survive and become the next generation of market leaders. Those companies that cease to innovate, or cease to move their level of innovation above the level of commodity, are the very companies that inevitably perish due to disruption from below. (The defining work in this area is of course The Innovator's Dilemma by Clayton Christensen.)

Companies are born, grow, age and die

There is nothing ad hoc about when companies go through the different stages of innovating, resisting innovation, and then perishing -- in fact, it turns out that these stages are completely predictable. Geoffrey West of the Santa Fe Institute describes this process briefly but eloquently in his recent TED Talk:

  • It is possible to measure something similar to the "metabolic rate" for companies, and, if analyzed this way, all companies are born, grow, age and die just like biological organisms. 
  • Companies demonstrate a sigmoidal growth curve overall (just like any other organism), with slow initial growth, followed by a growth spurt during the company's "teenage years", followed by a decline in growth (senescence), followed by eventual collapse and death. 
  • As companies grow, they benefit from the added efficiency of economies of scale -- efficiencies brought about by the introduction of bureaucracy and levels of administration. 
  • However, eventually every company enters a stage where their metabolism slows down [the company ceases to innovate] and the company soon cannot sustain growth, collapses and dies.
Geoffrey West

Companies typically produce their best innovations in their "prime years", before their growth slows. In their later years, they become bogged down by the very bureaucracy that granted the company economies of scale in its "young adult years". Old companies have so much corporate inertia and bureaucracy that they cease to innovate and collapse under their own weight. Restated, it is in the early years of a company's life that they are able to produce technologies that have the potential to become commodities. In their older years, companies often fight commoditization (i.e. cease to innovate) and eventually lose that battle.

Today, commodity technology inevitably ends up open sourced or ceases to be relevant. It doesn't take a huge jump in logic to realize that if you fight open source, you are in a real sense fighting innovation, and it's a good sign that you have entered the life stage of corporate senescence. Old companies typically try to stifle the innovativeness of their competitors through patent litigation, rather than moving up a level and innovating on top of the new common, commoditized platform (which is doubly ironic, because patents are supposed to encourage innovation rather than stifle it). Such companies lack creativity, skill, nimbleness and innovativeness of their own, and will soon die.

It is worth noting that even though biological organisms and companies all follow the same growth curve, the lifespan of organisms and of companies depends upon their size. Thus, even though death is inevitable, it would make sense that corporate lifespan can be increased by doing whatever is possible to ensure that a company is better able to scale.

How to know when a technology has become commoditized

It's usually clear when a technology has become a commodity. One or more of the following scenarios will hold true:

  • The secret of how to build the technology is out of the bag (it's no longer protected by virtue of being a trade secret)
  • The technology has become easy to duplicate: your competitors are all starting to build the same technology
  • The technology is available for free (designs for or an implementation for the technology are widely available)
  • Alternative implementations of the technology are beginning to surface that either compete head-on with your technology, or promise their own different but competitive feature sets and strengths
  • The technology is regarded as a component part that other more interesting things can be built from
  • The industry is beginning to find ways to work around the need for your technology.
Bottled and canned air, for sale in China for
up to $860 for a jar of air from the French countryside

It makes a lot of sense for everybody to use and contribute to a common shared source code base once a technology has become commoditized. The right way to look at this is not "we're making our competitor's life easier", but rather, "we're making the world a better place for everybody, and enabling a whole new generation of innovative products that can be built upon this platform".  There's almost always plenty of room up there at the next level for everybody.

Embracing open source doesn't require abandoning your business model, it requires rethinking it

There is a lot of discussion about when or even whether a technology should be open sourced. Open source business models typically revolve around expertise for hire (e.g. support contracts) or value-added repackaging of open source products. However several points should be noted:

  1. Software that is partially or completely open source can still be sold, sometimes with additional premium features in the commercial version (depending on actual licensing). "Open source" or "Free (as in freedom) software" does not have to mean "free (as in price) software".
  2. Opening the source code of a piece of software can actually help a business in unexpected ways by revitalizing the entire software ecosystem that the company operates within.
  3. Ideally, open source software also attracts contributions by others in the community and sometimes even brings in contributions from competitors.  "Win-win" is always a healthier mentality than "dog eat dog".
Companies need to learn to respond to the commoditization or open sourcing by competitors of technologies in areas they are seeking to remain competitive, and should not continue to try hard to sell something that is not clearly better than an open or free alternative. For example, the operating system space was commoditized over a decade ago, but Microsoft is still deriving a large percentage of their income from of sales of Windows as a base operating system, even though profit margins are quickly eroding as computing moves online and people use a plethora of devices running open source operating systems to access the Web.

Related reminder: If you're not cannibalizing your own business model from below with the next big thing, then somebody else is. (See how Amazon cannibalized their own book sales with the Kindle for a good example of how to approach an eroding business model.)

A nonexhaustive list of situations when opening the source of a product may be helpful:

  • When the software’s general availability is crucial to the basic vitality of the ecosystem in which a company operates. (e.g. Google created and released their own browser as open source, upping the ante for everybody in terms of both features and speed, and as a result of this competition, all other major browsers became significantly more powerful and featureful within a couple of years. In the end, whatever browser people end up choosing, even if it isn't Chrome, Google still wins.) 
  • When a piece of software has ceased to give a company a competitive edge but would be immensely useful to others. 
  • When a company feels it can still stay ahead by innovating on top of the open sourced code, but there is a clear need in the community for the code and it makes sense to share it. 
  • When a piece of software is being retired but could still prove useful to someone. (a.k.a. “Abandonware”)


On hierarchical learning and building a brain

The brain is a hierarchical learning system, and knowledge representation is inherently hierarchical

Many systems in the human brain are structured hierarchically, with feedback loops between the levels of hierarchy. Hierarchically structuring a system creates a far more compact and flexible recognition engine, controller or model than alternatives. Knowledge and learning are inherently hierarchical, with generalizations as higher-level constructs and specifics as lower-level constructs. Assimilating new knowledge often requires breaking old hierarchies (undergoing a paradigm shift) and restructuring existing knowledge in terms of new generalizations, so that the new knowledge can be properly incorporated. The hierarchical structuring of reasoning may not be surprising given that the wiring of the brain itself is highly hierarchical in structure.


Strengths of hierarchical learning

Hierarchical reasoning systems can be immensely dextrous, just as an articulated arm is dextrous: each joint in an arm has only one or two degrees of freedom, but these degrees of freedom compound, overall yielding an immensely rich range of motion. In a hierarchical reasoning system, global, mid-level and local reasoning engines collaborate in feedback with each other to converge on a single hypothesis or plan of action that is consistent at multiple levels of abstraction. Each level only needs to absorb a small amount of noise, yet the overall system can be immensely resilient due to the synergistic combining of degrees of freedom. Note that the feedback between the layers of hierarchy can be either excitatory or inhibitory, sensitizing or desensitizing other levels towards certain hypotheses based on the current best-guess set of hypotheses at the current level. (This is effectively a manifestation of Bayesian priors.)

Also, consistent false positives and false negatives at individual levels can actually help improve performance of a hierarchical reasoning system, because each level can learn to work with systematic quirks of higher and lower levels. A hierarchical learning system is more powerful than the sum of its parts, because error correction is innate.

Note that modularity and hierarchy are ubiquitous across all of biology, at all levels of complexity, and this appears to be a direct result of trying to minimize communication cost. Biology is particularly good at producing a massive fan-out of emergent complexity at every layer of hierarchy, and then packaging up that complexity inside a module, and presenting only a small (but rich) "API" to the outer layers of complexity. This can be observed in the modularity of proteins, organelles, cells, organs and organisms.

Learning is a process of information compression

It is interesting to note that hierarchical learning is related to to progressive encoding, as found in JPEG and other image compression algorithms, where an image is encoded in low resolution first, and then progressively refined by adding in detail that remains as the difference between lower-order approximations and the original image. Progressive encoding isn't just useful for letting you see the overall content of an image before the complete image has loaded -- it also increases compression ratios by decreasing local dynamic range.

In fact, in general, learning is a process of information compression -- Grandmaster chess players never evaluate all possible moves, they compress a large number of move sequences and board positions into a much smaller number of higher-order concepts, patterns and strategies -- so it would make sense that learning is innately hierarchical.

Error correction improves accuracy of inference

As noted, error correction is innate in hierarchical systems. In fact, the principles of error correction, as defined in information theory, can be directly applied to machine learning. In my own tests, adding error correction to the output codes of a handwriting recognition system can decrease error rates by a factor of three, with no other changes to the system.

However, adding error correction to a system typically implies adding redundancy to decrease entropy, by increasing the minimum Hamming distance between codewords etc. This principle lies in direct tension with the fact that learning is a process of information compression, because information compression, as we know it, typically deals with removing redundancy.

A further conundrum is presented by the fact that traditional sequence compression, where redundancy is minimized and entropy maximized, dramatically increases the brittleness of a data stream: flipping bits in a zipped file is much more likely to render the original file unreadable than flipping bits in the uncompressed file. However, biology seems to welcome "data corruption", as evidenced by how resilient the genome is to mutation (mutation actually helps species adapt over time), and as evidenced by how well the brain works with uncertainty.

The CS approach to information compression increases brittleness

The most interesting theoretical approach to unifying the two apparently opposing forces of error correction and information compression is Kolmogorov complexity or "algorithmic information theory", which states that it may require significantly less space to describe a program that generates a given sequence or structure than is needed to directly represent it (or sequence-compress it). Algorithmic compression may be used by the brain to dramatically increase compression ratios of structure, making room for redundancy. (It is certain that algorithmic compression is used in the genome, because there are 30,000 immensely complex cells in your body for every base pair of DNA in your genome.)

The criticality of feedback in the learning process

Feedback (and the ability to respond to feedback by updating a model or by changing future behavior) is the single most critical element of a learning system -- in fact, without feedback, it is impossible to learn anything. However, the brain consists of probably trillions of nested feedback loops, and the emergent behavior of a system incorporating even just a few linked feedback loops can be hard to characterize. It is critical to understand how the vast number of interacting feedback loops in the brain work together harmoniously at different scales if we hope to build a brain-like system. We have a lot of work to do to understand the global and local behavioral rules in the brain that together lead to its emergent properties.

The use of feedback in machine learning has so far mostly been limited to the process of minimization of test-set error during training (through the process of backpropagation or an analog). Time series feedback loops in recurrent networks are also immensely powerful though, and these can be trained using backpropagation through time. Recurrent networks and other time series based models can be used for temporal prediction ("Memory prediction framework" in Wikipedia). Prediction is a fundamental property of intelligence: the brain is constantly simulating the world around it on a subconscious level, comparing the observed to the expected, and then updating the predictive model to minimize the error and taking corrective action based on unexpected contingencies. Temporal prediction is a core concept in Jeff Hawkins' book On Intelligence, however Jeff's HTM framework is far too rigid and discrete in the implementation of these ideas to be generally applicable to substantial real-world problems.

How to build a brain

Building a computational structure with multiscale, feedback-based, predictive properties similar to those in the brain is critical to creating machine intelligence that will be useful in the human realm. Until we figure out how to do this, we're stuck with machine learning amounting to nothing more than the process of learning arbitrary function approximators.

Jeff Hawkins' HTM framework looks a lot more like a big Boolean logic network than the soft, fuzzy Bayesian belief network present in the brain. The basic ideas behind HTM are sound, but we need to replace HTM's regular, binarized, absolute-coordinate grid system with something more amorphous, reconfigurable and fuzzy, and we need to propagate Bayesian beliefs rather than binary signals. Building such a system so that it has the desired behavior will be a hard engineering challenge, but the resulting system should be, ironically, much closer to the principles Jeff describes in his own book.

Most importantly, however, we will have built something that functions a lot more like the human brain than most existing machine learning algorithms -- something that, through having a cognitive "impedance" correctly matched to the human brain, will more naturally interface with and extend our own intelligence.



Why I'm not keeping my Samsung Series 5 ChromeBook

I received my free Samsung Series 5 ChromeBook (the ChromeOS notebook given out to all Google I/O attendees) a couple of days ago, and was excited to try it out. I decided not to keep it however, because of multiple hardware issues:
  • The touchpad requires you to physically click to generate a mouse click event. It's hard to depress the pad with your pointer finger because of the high spring constant, but if you use your thumb, then the wide click area of the edge of the thumb can hit the pad non-simultaneously, and as a result (as the contact area grows) the mouse cursor can jump away from the click target before you have actually depressed the touchpad, meaning you miss the target and/or click on something else instead.  This is one of the biggest non-starters for me, it's extremely frustrating. I'm not the only person who has reported this problem.
  • External mouse support seems to not be enabled yet -- at least, the mouse I tried didn't even get powered up -- which means I can't work around the touch pad behavior.
  • Thhe keyboardd typpes doubble letterss freqquently. Not sure if it's a hardware bounce or a software glitch.  It feels like it might be a hardware bounce, which implies this may not be fixable.
  • There are no Delete, Home/End or PageUp/PageDown keys. There are alternatives to Delete (Shift+Backspace) and PageUp/PageDown (Alt+Up/Down) but there is no alternative to Home/End. This makes editing text a pain because you have to frequently reach for the touch pad or use Ctrl+Left/Right to move a word at a time to the beginning or end of what you're editing.
  • The keyboard isn't super-nice to type on.  I can't put my finger on why exactly.
  • Scrolling complex pages can be just slow enough that it bugs me, and opening lots of tabs at once (e.g. when the session is restored each time you log in) can slow down the machine.  Even when not restoring a session, there were times when a tab would lock up for 30 seconds, generating a dialog stating "The following tabs are not responding; kill them?".
  • It's really quite heavy (though construction quality is very solid).
On the positive side, battery life is awesomely long, and, other than the speed issues on complex pages and/or when restoring many tabs at once (which is the tradeoff for long battery life, I guess), the Web experience is very smooth and fluid when interacting with most individual pages. The browser came with not just Flash but also the Google Talk voice/video plugin installed, which was a nice addition.

Overall, my experience using the ChromeBook for an entire day while sitting in a conference was just frustrating enough that I have decided not to keep it.


Whole-organism integrative expressome for C. elegans enables in silico study of developmental regulation

Thesis Defense: Whole-organism integrative expressome for C. elegans enables in silico study of developmental regulation

Author: Luke A. D. Hutchison
Co-Advisors: Prof. Isaac S. Kohane, Prof. Bonnie A. Berger

Date: Tuesday May 17, 2011
Time: 11am - 12:15pm
Location: MIT CSAIL Stata Center, Patil/Kiva seminar room, 32-G449

Short abstract:  [tl;dr]

The C. elegans nematode has been extensively studied as a model organism since the 1970s, and is the only organism for which the complete cell division tree and the genome are both available. These two datasets were integrated with a number of other datasets available at WormBase.org, such as the anatomy ontology, gene expression profiles extracted from 8000 peer-reviewed papers, and metadata about each gene, to produce the first ever whole-organism, cell-resolution map of gene expression across the entire developmental timeline of the organism, with the goal to find genomic features that regulate cell division and tissue differentiation. Contingency testing was performed to find correlations between thousands of gene attributes (e.g. the presence or absence of a specific 8-mer in the 3' UTR, the GC-content of the sequence upstream of the transcriptional start site, etc.) and thousands of cell attributes (e.g. whether cells that express specific genes die through apoptosis, whether cells become neurons or not, whether cells move in the anterior or posterior direction after division). The resulting database of contingency test scores allow us to quickly ask a large number of biologically-interesting questions, like, “Does the length of introns of expressed genes increase across the developmental timeline?”; “Across what period of development and in which cell types is this specific gene most active?”; “Do regulatory motifs exist that switch on or off genes in whole subtrees of the cell pedigree?”; “Which genes are most strongly implicated in apoptosis?”, etc. This whole-organism expressome enables direct and powerful in silico analysis of development.

Long Abstract:

The C. elegans nematode has been extensively studied as a model organism since the 1970s. C. elegans was also the first organism to have its genome fully sequenced, and it is the only organism for which the complete tree of cell divisions is known, from the zygote to the fully-developed adult worm. By integrating these two datasets with a number of other datasets available at WormBase.org, it is possible to start looking for a mapping from the C. elegans genome to its cell division tree, i.e. to identify genomic regulators of cell fate and cell phenotype.

Two different versions of the cell fate tree for C. elegans were linked and merged to maximize the metadata available for each cell, then the cell fate tree was cross-linked with the anatomy ontology, or hierarchical map of containment and relatedness of the worm's anatomical features. Reachability analysis was performed on the anatomy ontology to obtain a list of organs and tissue types that each cell is part of. A dataset of reported expression levels of thousands of genes in different tissue types and organs, as extracted from the gene expression results in 8000 peer-reviewed papers, was cross-linked with the anatomy ontology, and gene expression reported at tissue or organ level was propagated through the anatomy ontology to the individual cells that comprise those anatomical features. A gene metadata database was also integrated to provide metadata about the genes active in each cell. This combination of the two linked cell fate trees, the anatomy ontology, the gene expression database and the gene metadata database yields the first whole-organism, cell-resolution map of gene expression across the entire developmental timeline of the organism.

Given this integrated database of gene expression, contingency testing was performed to find correlations between thousands of different potential gene attributes (e.g. the presence or absence of a specific 8-mer in the 3' UTR, the GC-content of the sequence upstream of the transcriptional start site, etc.) and thousands of different potential cell attributes (e.g. whether cells that express specific genes die through apoptosis, whether they become neurons or not, whether they merge into syncitia, whether they move in the anterior or posterior direction after division). The resulting database of contingency test scores allow us to quickly ask a large number of biologically-interesting questions, like "Does the length of the introns of expressed genes increase across the developmental timeline?"; "Across what period of development and in which cell types is this specific gene most active?"; "Do regulatory motifs exist that switch on or off genes in whole subtrees of the cell pedigree?"; "Which genes are most strongly implicated in apoptosis?"; "Which genes cause cells to stop dividing and become leaf nodes in the cell pedigree?", etc. In querying for genes correlated with apoptosis in cells or daughter cells, for example, the database lists a large number of genes that have not previously been implicated in apoptosis. This whole-organism expressome enables direct and powerful in silico analysis of development on an unprecedented scale.

Finally, the increase in the amount of biological data being produced per year is far outstripping Moore's Law, but more importantly, language support for easily building large parallel data manipulation pipelines, like the one described above, is sorely lacking. As a result cores sit unused or programmers spend inordinate amounts of time manually parallelizing their code to make use of the available cores, which is error-prone. This is often termed "the multicore dilemma". The data transformation pipeline that integrates these various C. elegans data sources exhibited a number of repeating design patterns that directly gave rise to a new paradigm for building implicitly-parallelizing programming languages, known as Flow. The Flow paradigm is not central to the thesis research itself, but will be briefly described if there is time at the end of the defense.


On Intel 3D chips and human 4D brains. (And flying cars.)

Intel today announced a breakthrough in making "3D chips". This is being misconstrued by many news outlets as meaning chips that are built out of 3D bricks of logic gates. Actually it's nothing of the sort. These chips are fundamentally 2D (or close to 2D) in layout, they just have conductive rails with U-shaped 3D cross sections. This makes for better transistors because the feature sizes they're working with today are already approaching the size of X-ray wavelengths, meaning it's getting harder and harder to manufacture conductive elements that are unbroken and not "blobby" using lithography. Giving the rails a U-shaped cross section triples the available conductive area, and gives you rails that are more uniform and less blobby in profile view, which means you get less random variance in the resistance of each wire, etc. So we have a factor of 3 improvement in area, and maybe a factor of 10 improvement in uniformity of current flow. That should give us, maybe, 10 more years' jump on Moore's Law ;)

Yes, this is a big manufacturing achievement, but as stated above, these are not 3D chips. However anything that can be done to get out of a single 2D plane is a huge step forward in terms of graph layout: you can't embed any graph that has a subgraph of K_3,3 (bipartite graph with 3 nodes in each part) or K_5 (completely connected graph with 5 nodes) into the 2D plane -- which is exactly why with roads, we have intersections and traffic jams -- except in the case of freeways, which to achieve unimpeded flow are often a spaghetti mess of raised bridges.  And this is also why we need flying cars.

So current chips already are not truly 2D or you simply could not produce them, it would be mathematically impossible. There are at least 3 layers of silicon (because that's how you create a transistor anyway). On a macro scale, most motherboards these days have something like 10-15 layers, because that's the only feasible way to lay all the wires. The more layers you can add, the less constrained you are in the layout. So we really just need to be figuring out ways to stack maybe 10-15 layers of silicon and we'd have a huge win.

I suspect that with truly 3D chips -- were you try to pack all the gates into a volume that's closer to a cube than a chip -- the biggest problems will be power density issues, and those issues will be major. We already can't wick away heat fast enough, and with today's (nearly-) 2D chips, the surface area to volume ratio (which is the critical aspect for heat transfer) is already maximized. So chips would have to run far more slowly if they were 3D in structure. (This is why the brain operates at something like 200 Hz and is massively parallel -- but it also only consumes something like 0.5W of power, so it is billions of times more energy efficient than today's computers). On the up-side, all the interconnected parts of a truly 3D chip would be physically closer together, so data path lengths would be much smaller than with current chips.

This is one huge advantage of the brain -- that wires don't have to deal with 2D crossings issues, and can dramatically reduce interconnect distances -- and this is also why if our brains were 4D they could operate much, much faster: in fact if we had 3D neurons embedded in a 4D space, all pairs of neurons could be interconnected with an axonal distance of close to zero :-)

From an energy density and energy efficiency standpoint though, the brain is looking more and more amazing all the time...


Spoiler alert: Code to solve GoogleNexus Challenge #4

SPOILER ALERT: Don't read this if you haven't yet given up on GoogleNexus challenge #4!
It's a great puzzle, try to figure it out first.


Yesterday's GoogleNexus Challenge took you to this page which linked to this map of urban rail systems.  There were actually two puzzles embedded in the challenge: The first puzzle involved the colored circles at the top, which corresponded to line crossings, and the numbers corresponded to the index of the letter to take in the crossing name.  Reading off these letters gave LONDON.  Then there was a list of seemingly random words which are actually anagrams of London Tube stop names, only missing a letter.  The missing letters (one per line) spell out the answer, "Please send me a Nexus S so Google goodies I can access".

You could totally do this by hand if you're good with anagrams, but since comparing 44 anagrams (that are each missing a letter) to 306 tube stops is not my idea of fun, I wrote code to do it.

Here's the code I wrote to solve the puzzle (after copying the list of tube stops from Wikipedia).  I basically cross out one letter at a time  from the anagram in the station name, then output the single letter that results.

Don't judge my code, I wrote this at light speed...

import java.util.ArrayList;

public class Main {

    static String[] stations = {

    "Acton Town", "Aldgate", "Aldgate East", "All Saints", "Alperton", "Amersham", "Angel", "Archway", "Arnos Grove", "Arsenal",
            "Baker Street", "Balham", "Bank", "Barbican", "Barking", "Barkingside", "Barons Court", "Bayswater", "Beckton",
            "Beckton Park", "Becontree", "Belsize Park", "Bermondsey", "Bethnal Green", "Blackfriars [ 9 ]", "Blackhorse Road",
            "Blackwall", "Bond Street", "Borough", "Boston Manor", "Bounds Green", "Bow Church", "Bow Road", "Brent Cross",
            "Brixton", "Bromley-by-Bow", "Buckhurst Hill", "Burnt Oak", "Caledonian Road", "Camden Town", "Canada Water",
            "Canary Wharf", "Canary Wharf", "Canning Town", "Cannon Street", "Canons Park", "Chalfont & Latimer", "Chalk Farm",
            "Chancery Lane", "Charing Cross", "Chesham", "Chigwell", "Chiswick Park", "Chorleywood", "Clapham Common",
            "Clapham North", "Clapham South", "Cockfosters", "Colindale", "Colliers Wood", "Covent Garden", "Crossharbour",
            "Croxley", "Custom House", "Cutty Sark for Maritime Greenwich", "Cyprus", "Dagenham East", "Dagenham Heathway",
            "Debden", "Deptford Bridge", "Devons Road", "Dollis Hill", "Ealing Broadway", "Ealing Common", "Earl's Court",
            "East Acton", "East Finchley", "East Ham", "East India", "East Putney", "Eastcote", "Edgware", "Edgware Road",
            "Edgware Road", "Elephant & Castle", "Elm Park", "Elverson Road", "Embankment", "Epping", "Euston", "Euston Square",
            "Fairlop", "Farringdon", "Finchley Central", "Finchley Road", "Finsbury Park", "Fulham Broadway", "Gallions Reach",
            "Gants Hill", "Gloucester Road", "Golders Green", "Goldhawk Road", "Goodge Street", "Grange Hill",
            "Great Portland Street", "Greenford", "Green Park", "Greenwich", "Gunnersbury", "Hainault", "Hammersmith",
            "Hammersmith", "Hampstead", "Hanger Lane", "Harlesden", "Harrow & Wealdstone", "Harrow-on-the-Hill", "Hatton Cross",
            "Heathrow Terminals 1, 2, 3", "Heathrow Terminal 4", "Heathrow Terminal 5", "Hendon Central", "Heron Quays",
            "High Barnet", "Highbury & Islington", "Highgate", "High Street Kensington", "Hillingdon", "Holborn", "Holland Park",
            "Holloway Road", "Hornchurch", "Hounslow Central", "Hounslow East", "Hounslow West", "Hyde Park Corner", "Ickenham",
            "Island Gardens", "Kennington", "Kensal Green", "Kensington (Olympia)", "Kentish Town", "Kenton", "Kew Gardens",
            "Kilburn", "Kilburn Park", "King George V", "Kingsbury", "King's Cross St. Pancras", "Knightsbridge", "Ladbroke Grove",
            "Lambeth North", "Lancaster Gate", "Langdon Park", "Latimer Road", "Leicester Square", "Lewisham", "Leyton",
            "Leytonstone", "Limehouse", "Liverpool Street", "London Bridge", "London City Airport", "Loughton", "Maida Vale",
            "Manor House", "Mansion House", "Marble Arch", "Marylebone", "Mile End", "Mill Hill East", "Monument", "Moorgate",
            "Moor Park", "Morden", "Mornington Crescent", "Mudchute", "Neasden", "Newbury Park", "North Acton", "North Ealing",
            "North Greenwich", "North Harrow", "North Wembley", "Northfields", "Northolt", "Northwick Park", "Northwood",
            "Northwood Hills", "Notting Hill Gate", "Oakwood", "Old Street", "Osterley", "Oval", "Oxford Circus", "Paddington",
            "Park Royal", "Parsons Green", "Perivale", "Piccadilly Circus", "Pimlico", "Pinner", "Plaistow", "Pontoon Dock",
            "Poplar", "Preston Road", "Prince Regent", "Pudding Mill Lane", "Putney Bridge", "Queen's Park", "Queensbury",
            "Queensway", "Ravenscourt Park", "Rayners Lane", "Redbridge", "Regent's Park", "Richmond", "Rickmansworth",
            "Roding Valley", "Royal Albert", "Royal Oak", "Royal Victoria", "Ruislip", "Ruislip Gardens", "Ruislip Manor",
            "Russell Square", "St. James's Park", "St. John's Wood", "St. Paul's", "Seven Sisters", "Shadwell", "Shepherd's Bush",
            "Shepherd's Bush Market", "Sloane Square", "Snaresbrook", "South Ealing", "South Harrow", "South Kensington",
            "South Kenton", "South Quay", "South Ruislip", "South Wimbledon", "South Woodford", "Southfields", "Southgate",
            "Southwark", "Stamford Brook", "Stanmore", "Stepney Green", "Stockwell", "Stonebridge Park", "Stratford",
            "Sudbury Hill", "Sudbury Town", "Swiss Cottage", "Temple", "Theydon Bois", "Tooting Bec", "Tooting Broadway",
            "Tottenham Court Road", "Tottenham Hale", "Totteridge & Whetstone", "Tower Gateway", "Tower Hill", "Tufnell Park",
            "Turnham Green", "Turnpike Lane", "Upminster", "Upminster Bridge", "Upney", "Upton Park", "Uxbridge", "Vauxhall",
            "Victoria", "Walthamstow Central", "Wanstead", "Warren Street", "Warwick Avenue", "Waterloo", "Watford",
            "Wembley Central", "Wembley Park", "West Acton", "West Brompton", "West Finchley", "West Ham", "West Hampstead",
            "West Harrow", "West India Quay", "West Kensington", "West Ruislip", "West Silvertown", "Westbourne Park", "Westferry",
            "Westminster", "White City", "Whitechapel", "Willesden Green", "Willesden Junction", "Wimbledon", "Wimbledon Park",
            "Wood Green", "Wood Lane", "Woodford", "Woodside Park", "Woolwich Arsenal" };

    static String[] matches = { "renumber digits", "rub ink", "long hiatus", "big redskin", "damp heat", "calming moon",
            "third felon", "neutral pink", "technical flyer", "robs enemy", "sea honour", "saintly chef", "two cents",
            "old enchanter", "sparkle biz", "crocus fiord", "dusty brown", "ruby queen", "rich garcons", "sardine gland",
            "sans broker", "hated seaman", "not tensely", "gloved barker", "noted cavern", "evilest trooper", "scarlet esquire",
            "educators role", "manly beer", "carrot nubs", "thinks bigger", "hawthorn holler", "nocturnal howls", "diurnal gripes",
            "mad realtor", "gas alternate", "civil oratory", "orange press", "english carol", "tallest peahen", "rare lotus",
            "clan idol", "concerning torment", "darn heel" };

    public static void main(String[] args) {
        ArrayList<String> stops = new ArrayList<String>();
        for (String s : stations) {
            String sout = "";
            for (int i = 0; i < s.length(); i++) {
                char c = Character.toLowerCase(s.charAt(i));
                if (c >= 'a' && c <= 'z')
                    sout += c;

        for (String m : matches) {
            for (String s : stops) {
                if (s.length() == m.length()) {
                    StringBuffer buf = new StringBuffer(s);
                    for (int i = 0; i < m.length(); i++) {
                        char c = m.charAt(i);
                        if (c != ' ') {
                            int p = buf.indexOf(c + "");
                            if (p >= 0) {
                                buf.setCharAt(p, ' ');
                    String bs = buf.toString().trim();
                    if (bs.length() == 1)


Is the MLK quote fake? "I mourn the loss of thousands of precious lives..."

(or, "On not rejoicing in Osama bin Laden's death"...)

The following quote spread all over the Internet in the last 24 hours:

"I mourn the loss of thousands of precious lives, but I will not rejoice in the death of one, not even an enemy. Returning hate for hate multiplies hate, adding deeper darkness to a night already devoid of stars. Darkness cannot drive out darkness: only light can do that. Hate cannot drive out hate: only love can do that"
— Martin Luther King Jr.

Megan McCardle, editor for The Atlantic, posted a blog entry saying that MLK "probably" didn't say this.  However, MLK said everything but the first sentence.

The original source of the mis-quote was finally tracked down.

This was just a copy/paste error: somebody made the statement in the first sentence -- which was their own assessment of the situation -- and then pasted MLK's quote below it.  Then somebody else copied the whole thing, and posted it to facebook, joining the two quotes together so it looked like it was all attributed to MLK. Then it spread all over the Internet, and then the followup spread all over the Internet to say that it was fake.  Which it is not -- it should just be split into two pieces like this:

"I mourn the loss of thousands of precious lives, but I will not rejoice in the death of one, not even an enemy."
— Jessica Dovey
"Returning hate for hate multiplies hate, adding deeper darkness to a night already devoid of stars. Darkness cannot drive out darkness: only light can do that. Hate cannot drive out hate: only love can do that."
— Martin Luther King Jr. [source, Google Books]

Even though MLK didn't say the first part, Jessica's comments are eloquent and I completely agree with them.

While I'm on the subject, here are a couple of other related quotes:

"No man is an island, entire of itself; every man is a piece of the continent, a part of the main. . . any man's death diminishes me, because I am involved in mankind" —John Donne, Meditation XVII

"Always forgive your enemies. Nothing annoys them more." --Oscar Wilde


Life, Intelligence and the Second Law of Thermodynamics

[On how life locally creates order in a universe that is only supposed to increase in entropy over time]

The Second Law of Thermodynamics states that entropy in a closed system cannot decrease, and that this irreversible increase in entropy is somehow intimately interrelated with the arrow of time. However the unique, and perhaps the defining, characteristic of intelligence specifically and of life in general is the seeming ability to counter entropy [1] locally or temporarily by creating order in its own environment while it is alive. Furthermore, life posesses the unique ability to counter entropy on an ongoing basis -- “permanently” except perhaps at the limit [2] -- by reproducing or making multiple copies of itself: every offspring produced by an organism is also then able to counter entropy locally in its own sphere. Therefore if the hallmark of life is to locally counter entropy, then a branching, tree-like family pedigree structure is the hallmark of all intelligent life, and the continuation of intelligent species through reproduction is itself the guarantee of overcoming entropy beyond the lifetime of an organism [3][4].

Note that all this is also true of the organism's unconscious construction of its own ordered body: it is a system of high order and low entropy (and the development process produces waste heat through respiration etc.). As soon as the organism dies, it is by definition no longer alive, or its body no longer acts “intelligently” in creating order. Its ability to focus energy to maintain its own ordered structure is lost, and it will immediately begin to decay.

Also note that from the point of view of an organism, past work is at best a memory and so amounts to a sunk cost, and organisms are not typically aware of the waste heat lost in doing work to increase order, therefore they will typcially only trade off effort involved in doing work today with the apparent reward of increasing order as a direct result of that work. When the perceived reward outweighs the cost, the organism will tend to do the work to obtain the reward. Psychology and biochemical reward systems like dopamine have therefore typically evolved to reward organisms for making local decisions that seem rewarding while actually accomplishing a broader purpose, such as the propagation and continuation of the species.


[1] Note that we are using a very loose definition of “countering entropy” here -- in fact, the second law of thermodynamics cannot be violated, even locally, so what is actually happening is that the organism is focusing energy to do work to “pile up order” in one local region. However doing work produces waste heat, which is effectively how the second law of thermodynamics is satisfied even in this case: the waste heat produces an overall increase in entropy. Nevertheless with a constant energy source external to the system (the sun), and an unbroken chain of life across the generations, individual workers can continue to increase order within the system: they effectively channel and focus the energy from the external source to locally increase structure and order in some sort of "meaningful" way.

[2] i.e. the heat death of the universe, the extinction of the species due to extinction-level events / resource exhaustion, etc.

[3] (Incidentally, this is a rather glaring indictment against the decidedly unintelligent behavior of every sufficiently developed nation, in letting the birthrate fall below replacement rate of 2.1 children per couple-lifetime -- since this trend, if continued to its only mathematically-possible end with no future change in birthrate, is extinction of the species and the final victory of the Second Law of Thermodynamics.)

[4] This is probably true in a fractal sense right up to the scale of entire universes, if the theory (proposed by Andrei Linde and others) that universes “bud off” in hierarchical tree-like structures to create new universes is to be believed.


PS this is only orthogonally related, but I highly recommend the following article in the latest American Scientist magazine: The Man Behind the Curtain -- physics is not always the seamless subject that it pretends to be.

Also orthogonally related, but I love the following quote: "There is only one thing which is more unreasonable than the unreasonable effectiveness of mathematics in physics, and this is the unreasonable ineffectiveness of mathematics in biology. — Israel Gelfand" -- cited in the Wikipedia article for Unreasonable Effectiveness.


Source Code Symmetry and Transcendent Programming Tools

The more experienced you become as a coder, the more you look at patterns and shapes and symmetry, rather than just reading code a character or token at a time to understand a piece of code.  This is an extremely desirable programming skill, because:
  1. The bandwidth and pattern recognition capabilities of the subconscious layers of the human visual system far exceed those of the conscious reasoning brain.
  2. Cross-checking code symmetries can effectively ensure completeness and correctness of code in many situations.
By analogy, when a beginning chess player is evaluating their next best move, they search the space of moves in the immediate neighborhood of the current board configuration. A really good chess player however does not spend much time thinking of individual moves, but rather thinks at a much higher level of abstraction, dealing with patterns and strategies.  The advanced player is in effect able to compress a huge amount of information into a relatively small number of concepts, and is able to employ much more powerful reasoning tools over these concepts.

A really simple example of source code symmetry is:

a[i].x += a[i - 1].x;
a[i].y += a[i - 1].y;

The visual symmetry here is that 'x' and 'y' run across the rows, but 'i' and 'i - 1' run down the columns. If you know that's what you're expecting and you don't see those things lining up without even thinking about what they mean, then you probably have a copy/paste error.

There are many complex abstract examples of code symmetry, and very few advanced programmers would even be able to enunciate the subconscious tools they employ daily to analyze visual and logical symmetries when writing code.  Some vague examples include:
  • Symmetries in the wavy line of indentation (indicating symmetries in scope and control flow);
  • Relative differences in the structure of nested function call applications between two expressions;
  • Differences in Boolean operators used between a block of related but different complex Boolean expressions.
In general, the presence of visual symmetry indicates the presence of underlying structural symmetry in the program. The converse is not necessarily true however -- programming language syntax may obfuscate the underlying symmetry of a program if the syntax is not designed to render functional symmetries in a visually symmetric way.

I have believed for a long time that one day we will be able to write really, really good programming tools that alert you to broken symmetries (e.g. copy/paste errors where you copied code for x but forgot to change the x to a y for the second version) -- or even suggest code or write code for you based on predicted symmetries or symmetries that are detected to be incomplete. This sort of power could catch a lot of the sorts of bugs you get from confusing two similarly-named variables etc. And I suspect programming with this level of integration between syntax, logical structure and IDE functionality would take programming to a completely new, transcendent level.

In the general case, detecting all reasonable symmetries for an arbitrary programming language may be uncomputable or at least intractable. A better approach would be to bend a language's syntax to support symmetry explicitly as a top-level feature of the language. This would involve identifying the types of symmetry you typically find in a program and finding syntactic ways of binding the symmetrical parts together in useful ways. Functions are already a weak form of this, since they allow for the common parts of a symmetry to be abstracted away and parameterized. But I think it could go a lot deeper than that.


Why NDAs are usually pointless and counterproductive

Non-Disclosure Agreements are usually pointless and counterproductive. Forget using an NDA as a crux in most cases unless it's absolutely necessary. Unless of course you're not innovative enough to come up with the next big idea after the current one, and the next big idea after that.
  • Most high-level VCs have seen thousands of ideas per week and will tell you they have seen it all; "there are no new ideas under the sun, only variations on old ideas".  They will generally refuse to sign NDAs, it's not worth their time to entertain your paranoia.
  • Bringing any product to market is 5% the idea and 95% the execution, and execution takes work, dedication and passion.  If you tell me your idea, what am I going to do, drop everything in my life, dedicate myself to your idea, and go bust my gut for 5 years to bring your idea to fruition before even you can?
  • Furthermore, the success of any product is 5% the product and 95% the timing. There were many facebook-equivalent social networks before facebook, many Groupon-like collective purchasing power initiatives before Groupon, and many tablet computers before the iPad dating as far back as the Apple Newton or further.  (Of course ideal timing windows are very narrow, and there are now many Groupon clones riding on Groupon's coattails, but the point is that the idea was around for quite a while before Groupon was successful.)
  • Most people that are approached about an idea but asked to sign NDAs first are the sorts of people that already have way too many ideas of their own and are so maxed out that there's no way they have time to go run off pursuing yet another idea. Especially your idea.
  • In most cases, those asking others to sign an NDA before they share an idea don't have much more to offer than their one pet idea. They are a liability to a VC or potential future business partner.
  • Asking a resourceful or powerful person to commit to signing an NDA before they even know what the idea is laughable: The incentive system is circularly reasoned. They aren't going to sign it until they know what it's worth to them, and they can't know what the idea is worth until they sign it. Why would they commit to paint themselves into a corner before they even know what the idea is about?
  • Typically NDAs do not come with a corresponding counter-agreement that says something like, "If you agree not to disclose this, and you become involved in developing this idea in role X, the payoff to you will be Y% of the stock at release."  NDAs typically uniquely benefit the NDA-writer.
  • Forcing somebody to sign an NDA is often an attempt on the part of the idea-holder to reverse the hierarchy of power. If they didn't think the potential NDA-signee would be of critical importance to the success of their project, they wouldn't be asking them to sign an NDA. Getting somebody to sign an NDA is an attempt to maintain an asymmetric relationship of control and wield authority over the very person whose skills you need to bring your idea to fruition, which is a relationship very few resourceful people are willing to enter into.
  • An NDA may already overlap in IP with an idea the signee is already working on (but they can't know until they sign). If there is any overlap in IP, both of you end up in a very sticky situation.


Bringing Open CourseWare to North Korea

I was just interviewed by Voice of America about my trip with Choson Exchange last September to take Open CourseWare to North Korea: 북한 대학, 고도의 자료 공유체계 구축 [audio recording]

It's all in Korean, so here is an executive summary of the key points in English:

We took OCW and Wikibooks to North Korea and presented them at the Pyongyang Intl Sci and Tech Book Fair. Choi Thae Bok, one of the highest leaders in NK, came to inspect our books and said "every student and professor in NK needs access to these materials", so he called the Dean of Kim Chaek University (the "MIT of North Korea") to come meet with us.

The reporter asked about Internet access in NK. Some people have access to the Internet at high levels, and they copy information from the Internet and put it on the national Intranet (human-based content filtering).

Finally North Korea has an extensive book scanning project where they digitize thousands of books and make them accessible to anyone with access to the Intranet. Their book scanning project and their computer labs were pretty advanced, they had all the latest Dell and HP computer equipment with big flat panel monitors, and the computers all had signs on them that read, "Gift from the Great General Kim Jong Il".


Watson's Jeopardy win, and a reality check on the future of AI

So Watson beat the two best Jeopardy champions at their own game. What now?

Call me cynical, but as someone who has undertaken machine learning research for 14 years, the Jeopardy result is really not that surprising -- you would win too if you had the equivalent knowledgebase of all of Wikipedia at your fingertips for instant recall, and if you had a huge buzzer advantage by being able to process individual pieces of information in parallel at much faster rates than the human brain!

But there is a much deeper problem with some of the media pontification about the future of AI and machines taking over the world: try asking Watson how "he" feels about winning.

Watson's learning model is currently (only?) really, really good at figuring out what question you were asking given an answer to a general knowledge question. I'm sure there are lots of reusable pieces of the Watson system (some natural language processing (NLP) code, etc.). But what the mainstream media doesn't seem to understand is that it would be an enormous stretch to say that the system could simply and easily be applied to other domains.

The promise of machine learning is that algorithms should in theory be reusable in many situations. The Weka machine learning toolkit, for example, provides a generic ML framework that is used for all sorts of things. But extracting the right features from your data, and deciding how to represent them, is a huge problem on its own, and can be tackled completely separately from the learning issues. (This is all further muddied once you throw in NLP.)

Today most of the feature selection for any given learning task is done by hand engineers. An AGI (Artificial General Intelligence) would have to do that itself. We don't have much of a clue yet how to teach an AGI how to pick its own reasonable and useful feature sets in a totally generic or smart way. But it's quite easy to show that, for most complex datasets, your feature selection strategy is almost as important as, or more important than, the exact machine learning algorithm you apply.

What very few people appreciate is that machine learning has so far amounted to little more than learning arbitrary function approximators. You learn a mapping from a domain to a range, or from an input to an output. Minimizing the classification error is the process of refining that function approximation to minimize error on as-yet unseen data (the test dataset, i.e. data that was not used to train the previous iteration of function approximation). Because all machine learning algorithms (as they are currently framed) are basically just trying to learn a function, they are all in some deep sense quite equivalent. (Of course in practice, not all algorithms even work with the same data types, so that's why this is mostly only true in the deepest sense, but there has been quite a bit of work done to show that at the end of the day, most of today's machine learning algorithms are basically doing the same thing with different strengths and weaknesses.)

Incidentally, the fact that the whole field of machine learning is about learning arbitrary function approximators is pretty much the whole reason that a lot of people in CS learning theory don't really talk about AI anymore, only ML. There's nothing much intelligent about machine learning as it stands currently. I heard it said that CSAIL (the CS and AI Lab) here where I work at MIT is only still called CSAIL in deference to Marvin Minsky and the glory days of AI, and that a lot of people don't like the name and want to change it when Marvin finally totally retires. (That probably won't happen, but the statement alone was illustrative...) We need a complete revolution in learning theory before we can start to truly claim we're creating AI, even if the behaviors of ML algorithms feel "smart" to us: they only feel smart because they are correctly predicting outputs given inputs. But you could write down a function to do that on paper.

I'm not claiming we can't do it -- "It won't happen overnight, but it will happen" -- I'm just stating that ML and AI are quite different, and we're very good at ML and not at all good at AI.

Efforts to simulate the brain are moving along, and Ray Kurzweil predicts that in just a decade or two we should be able to build a computer as powerful as the brain. While that may be true in terms of total computational throughput of the hardware, there is no way to know if we will be able to create the right software to run on this hardware by that time. The software is everything.

One of the problems is that we don't know exactly how neurons work. People (even many neuroscientists) will tell you, "of course we know how a neuron works, it's a switching unit, it receives and accumulates signals until a certain potential is reached, then it sends on a signal to the other neurons it is connected to." I suspect in several years' time we will realize just how naive that assumption is. For now, there are already lots of fascinating discoveries made that show that things are just not that simple, e.g. (hot off the press yesterday): http://www.eurekalert.org/pub_releases/2011-02/nu-rtt021711.php

From that article:
> "It's not always stimulus in, immediate action potential out. "
> "It's very unusual to think that a neuron could fire continually without stimuli"
> "The researchers think that others have seen this persistent firing behavior in neurons but dismissed it as something wrong with the signal recording."
> "...the biggest surprise of all. The researchers found that one axon can talk to another."

This is exactly the sort of thing that makes me think it's going to take a lot longer than Ray predicts to simulate the brain: we don't even know what a neuron is doing. A cell is an immense, extraordinarily complex machine on the molecular scale, and simplifying it to a transistor or thresholded gate is not necessarily going to produce the correct emergent behavior when you connect a lot of them together. I'm glad people like the researcher conducting the above research are doing some more fundamental work into what a neuron actually is and how it really functions. I suspect that years down the line we'll discover much more complicated information processing capabilities of individual cells -- e.g. the ability of a nerve cell to store information in custom RNA strands based on incoming electrical impulses in order to encode memories internally [you read it here first], or something funky like that.

Of course even a simplified model is still valuable: "Essentially, all models are wrong, but some are useful" (--George E. P. Box). However we have to get the brain model right if we want to recreate intelligence the biologically-inspired way. Simply stated, we can't predict what it will take to build intelligence, or how long it will take, until we understand what it actually is we're trying to build. Just saying "it's an emergent property" is not a sufficient explanation. And emergent properties might only emerge if some very specific part of the behavior of our simplified models works correctly -- but we have no way of knowing which salient features must be modeled correctly and which can be simplified.

But a much bigger problem will hold up the arrival of AGI: not only do we not know how single neurons really work, we have NO CLUE what intelligence really is. And even less clue what consciousness really is. And the problem with Ray's predictions is that even though we can forecast the progress of a specific quantifiable parameter of known technology, perhaps even if the exact underlying technology that embodies the parameter changes form (e.g. Moore's Law continued to hold across at least 50 years, even across the switch from vacuum tubes to transistors to silicon wafers etc.), we can't forecast the time of creation or invention of a new technology that is for all intents "magic" right now because we still don't know how it would work. In fact we can predict the arrival of a specific magic technology about as well as we can predict the time of discovery of a specific mathematical proof or scientific principle. Nature sometimes chooses simply not to reveal herself to us. Can we even approximately predict when we will prove or disprove P=NP or the Goldbach Conjecture? How much harder is it to define intelligence (or even more so, consciousness) than to prove or disprove a mathematical statement?

Finally, and most importantly, somebody needs to get Watson to compete in Jeopardy against Deep Thought to guess the correct question to the answer 42...