admin

Weekly links for 10/08/29

joshuaclayton’s blueprint-css at master – GitHub
This is a CSS framework designed to cut down on your CSS development time. It gives you a solid foundation to build your own CSS on
SAX (Symbolic Aggregate approXimation)
Sax: Symbolic Aggregate approXimation — SAX is the first symbolic representation for time series that allows for dimensionality reduction and indexing with a lower-bounding distance measure. In classic data mining tasks such as clustering, classification, index, etc., SAX is as good as well-known representations such as Discrete Wavelet Transform (DWT) and Discrete Fourier Transform (DFT), while requiring less storage space. In addition, the representation allows researchers to avail of the wealth of data structures and algorithms in bioinformatics or text mining, and also provides solutions to many challenges associated with current data mining tasks. One example is motif discovery, a problem which we recently defined for time series data. There is great potential for extending and applying the discrete representation on a wide class of data mining tasks. Source code has “non-commercial” license
jbooktrader – Project Hosting on Google Code
Introduction à Git pour les gens normaux | E-vidence
a very efficient introduction to git in french
Particle Filters
Particle Filters
author: Simon Godsill, University of Cambridge

Staged Type programming

The dynamic class is only truce only between the two worlds. There has to be something better

Click to continue reading “Staged Type programming”

Finding relevant topics : text analysis with python

Finding relevant topics : text analysis with python

What do you talk about ? What are we interested in ?
If you were given an access to some network, how to identify what its member talks about ?

David Blei made a nice video at Google which I already posted on

It turns out that the method described, Latent Dirichlet Allocation can be extended in interesting directions, for sentiment analysis which enable automated news-trading for instance. Also LDA can be implemented using Gibbs sampling, which is another general method for finding latent variables. So I decided to give it a crack, and get started with some python as well.

As an example I generate documents that are actually images.
The topics are then groups of pixels. For instance the first topic ‘talks’ about pixel 1,2,3,4,5, topics 2 about pixels 11,12,13,14,15   :

From there I generate a set of documents. For each document, I draw a sample P from a dirichlet. P is a vector that represents a mixture of topics this document will speak about. Then for each word from this document, I pick from a topic from the mixture, then from the topic, I pick a term (aka, a pixel) from the words distribution of that topic.

If we run the LDA on such a generated set of documents, we find the following “topics”, with increasing likelihood for the data :

Likelihood : -119781, Iteration #0

Likelihood : -103233 iteration # 2

Likelihood : -89401 iteration # 4

Likelihood : -78168 iteration #8

Likelihood : -73634 iteration # 12

Likelihood : -71200 iteration # 17

Likelihood : -69670 iteration # 25

Likelihood : -66950 iteration # 35

Likelihood : -65199 iteration # 65

So we successfully recover the “topics”.

The crux of Gibs sampling is : we have N words, each corresponding to a hidden topic, and a model for their joint distribution. we want to have access to the distribution of topics given those words, or to take a sample from it. It turns out we can express easily the distribution of 1 topic given the rest (that is given the other topics and given the words, cf the docs below). Gibbs sampling means starting from a random point for the N topics, and sampling 1 topic at a time for the N topics, and doing so many times, gives you a sample from the joint distribution of the N topics.

The 2 detailed papers I found on the subject are Parameter estimation for text analysis and Distributed gibbs sampling of Latent topic model and the python code can be found at my github page http://github.com/nrolland/pyLDA Python has huge number of  libraries implemented and is a big time saver.

Weekly links for 10/02/13

3. An Informal Introduction to Python — Python v2.6.4 documentation
La documentation de python.
Learning Python, Linux, Java, Ruby and more with Videos, Tutorials and Screencasts
Showmedo is about learning and (Free and) Open-source software (FOSS). We were inspired to start Showmedo by watching some very effective web video-tutorials/screencasts. These convinced us that web-videos can be a great way to quickly and efficiently acquire knowledge. It can even be fun, or at least painless. For some things there is no substitute to seeing it done.

notes

  • gnuradio : is a set of software routine to process signal. it enables you to sample the whole electromagnetic spectrum, and decode it to provide radio, GSM, DECT, GPS, functionality with a single piece of hardware. pretty awesome.
  • palantir a software company that provides useful software for datamining and flexible correlation studies.

The radical uncertainty of stocks

One would like to think that an investment in stocks can not result in a complete meltdown, except in some strange pathological cases. Yet it can be easily shown that by the mere structure of stocks, such a meltdown (or its bubble counterpart) is largely unpreventable in a systematic way.

If one can hope to have price stability, it could only be found in the form of a “fundamental value” around which market’s participants could stabilize the price. If prices becomes to low, an agent would then buy because he knows he can extract some money directly from the company, in the form of dividend. In the lack of such capacity, buying at a “good price” does not mean anything, as nothing prevents the price to go even lower, at some even ‘better’ level..

So at the heart of price stability lies in that fundamental value, which as every other investment can be obtained by adding up all the present value of future forecasted cash flows. Given a set of hypothesis on future profits, this gives the company a fundamental price, under which one can decide whether or not the stock is over or undervalued. But….. beyond that mere number, one can wonder how firmly grounded such a value can be.

Let’s consider a simplistic example company whose business yieds 1 dollar per year. Let’s assume that interest rates are on average at 5%. For each year i, the present value of a dividend would be 1 USD * 0,95^i , and the current value the sum of those would then be 20 USD. Out of those 20 USD, 8 comes from the next ten year’s dividends, and 12 comes from all the years beyond that.

For a hypothetical investor who has formidable insight about the next 10 years for the company (quite a genius really), and intending to bring back market value with the fundamental one, it means that despite his formidable insight his investment is still subject to 60% of sheer uncertainty after 10 years, and to the market’s evaluation of that uncertain period of 9 years and beyond before. The inability to evaluate the correct price of stock is fundamental and can not be overcome by the mere construction of what a stock financially is. Even this is a very rough case, it does show that the level of uncertainty contained in stock prices is pretty much irreducible, and for its main component, driven by the market itself. What Keynes called the beauty contest is not a secondary but a primary, essential effect.

In real life though, one can hope that other stabilizing forces come at play, and that through time and diversification, good judgment becomes recognized by the market. And technique can tentatively be put in place to try to isolate as much as possible the judgment and eliminate other effects (global market or sectoral movement etc…) . However, there are deep consequences of this fundamental uncertainty :

  • A lot of a corporation’s energy is spent to control market’s feeling (avert fear, suscitate enthusiasm). Although necessary when a nothing can flip your stock, that diverted energy, in the end, does produces nothing.
  • It boldens the case for investors who have a edge in really understanding a business. An obvious illustration of this is Mr Buffet. (real) private equity investments has been historically successful it seems as well.
  • It tends to imply that stock market should not be a straight mass-investment vehicle for retirement or for the general public. If a successful investment implies a certain craftiness, the current plain vanilla stock investment by armies of zombies is poised to be profited from by unscrupulous financiers.

The market legitimacy in classical economy lies in its ability to have a match between investors willing to taker some risk, and companies in need of financing. The current “stock” does not seem to be able to fulfill that role, at least in the way it is used today. It can be seen as easy to say so now that the market crashed, but I guess I have some credential there, having blogged about it before the meltdown.. ;)

The anonymous financial room

That would leave only 5.7 per cent of Volkswagen’s ordinary shares
available to be traded on the market. However, hedge funds and other
traders had between them short sold shares equivalent to 12.9 per cent
of the total, and in consequence were obliged to buy and return them.
They understandably panicked, and the resultant frantic efforts to buy
Volkswagen shares caused the price to quadruple

http://www.lrb.co.uk/nl/v30/n23/mack01_.html

This story highlights (among other things) the difficulty of creating a market consensus. Had hedge funds been able to gather and delegate the handling of this short to a common entity, who would then negotiate in their name, the problem would not have been so wide and painful.

However, when faced with such a case, there are strong incentives not to disclose anything to competitors, for many reasons. Which is why a platform that guarantee both authenticity of participant yet remains completely anonymous could be useful. With it, people with a shameful problem can at least discuss it without fear that the mere discussion will aggravate the situation.

I wonder what infrastructure can guarantee that kind of “anonymous authentication”…

no time to think

An interesting discussion by a Stanford professor

Star trader

Wall Street - Bull

Creative Commons License photo credit: David Paul Ohmer

August 15, 2008 WSJ  : The two-year-old hedge fund founded by former UBS AG star trader JW,  is down about 85% from its inception through July, according to a person familiar with the matter.

August 03, 2006: UBS star trader goes solo with launch of hedge fund. One of the City’s most aggressive traders will soon be snapping even more ferociously at the heels of management as the head of a $5bn (pounds 2.7bn) hedge fund. In his new role, Jon Wood will be set free from potential client conflict at UBS, the Swiss investment bank where he currently heads proprietary trading. Backers have already committed $3bn to his SRM global fund, which will be capped at $5bn and launched on 1 September. UBS, where Mr Wood has worked for the past 17 years, is investing $500m. Investors in SRM Global can choose from two fee structures – a 1 percent management fee to invest for a period of five years and 1.5 percent for three years. SRM Global will retain 25 percent of its profits.

Hey, good thing it was capped at USD 5bn !!

Proof network and knowledge repository

I have been looking a long time for a way to represent on the web a mathematical corpus. It would be tremendous if one could find on the web the different books written on a subject, in a form that enables verification of theorem and proof.

Let’s reflect on what a lesson is : Mostly a lesson is about learning one by one different entities, and the relations that those entities share. It is all about serializing a graph.

If we were to improve this process of learning, we should identify what represent the most effort and the least added value in the making of it, and I see 3 big bottleneck.

  • This process has to occur at a level the audience can understand :

The same lesson can take a day or 5 minutes depending on the knowledge of the audience. One of the big problem that clutters reuse of lessons on a large scale basins is that every time one wants to adapt the level, a whole rewrite is necessary. Having a formal representation of books would allow for seamless rewrite of proof down to the basic axiom if necessary.

  • Another variability is how does one reuse the theorem or presentation of another.

Dissemination now is informal, slow, and based on individual reading. Few ways exists to have clever solution to emerge. A repository of steps for proofs, which would allow reuse in other context, would enable indirect voting, and promote the best practices.

  • Finally, another problem is the sheer size of the domain which can only be tackled with by many specialists.

Sometimes the difficulty lies not in the domain itself but in the many ways there are to show one same effect. There is no place for such collaboration to take place in a fruitful way, with people ranging from high school students to phd’s to contribute and enrich each other. Wikipedia exists, but is completely out of scope : what if I was to see the 10 differents ways to prove an assertion, and how the theorems used apply in the specific case i am looking at ? wikipedia can’t handle this kind of data explosion and no onewill contribute this to the details I might need (and some other person won’t)

So, facing those issues, one might think that the computer science people in universites found a way ? No, you want to know why ? They are research ways not tho share proof and have a formal system for proof and representation. no. they are *waaaaayy* behond that : they are looking to *automate* the creation of proof. This really is for me the completely wrong way to go, and we should first concentrate on having a formal description system to tackle those 3 points I exposed. *Then*, with such a useful formal description, will we get ammo for automated proof, if we ever can solve it.