Huge amounts of data are now easily and legally available on the Web. This data is generally heterogeneous and merely structured. Machine learning models which have been developed to automatically retrieve, classify or cluster observations on large yet homogeneous data collections have to be rethought. [...]