On reading their article I get the impression that they think that this is both a hitherto-unknown phenomenon and one which is still baffling web developers. This puzzles me, as even a relative neophyte such as myself knows how to make these documents available to search engines: indexes. All you need is a linked-to page somewhere which then lists all of the documents available. This page doesn't have to be as obvious as my Set Dance Music Database index - it can be tucked away in a 'site map' page somewhere so that it doesn't confuse too many people into thinking that that's the correct way to get access to their documents. However, don't try to hide it so that only search engines can see it, or you'll fall afoul of the regular 'link-farming' detection and elimination mechanisms most modern search engines employ.
Of course, being a traditionalist (as you can see from both the content and design of the Set Dance Music Database) I tend to think that lists are still useful, at least if kept small. And I do need to put in some mechanisms for searching on the SDMDB, as well as a few other drill-down methods. So giving your people just a search form alone may not be catering to all the methods people employ when finding content. Wikis have realised this years ago - people like interlinking. And given that these 'deep web' documents are still accessible via a simple URL, if you really need to you can assist the search engines by creating your own index page to their documents by basically scripting up a search on their website that then puts the links into your index, avoiding listing duplicates.
So the real question is: why are the owners of these web sites not doing this? We may just need to suggest it to them if they haven't thought of it themselves. The benefits of having their documents listed on Google are many - what downsides are there? I'm sure the various criticisms of such indexing are mainly due to organisational bias and narrow-mindedness, and can either be solved or routed around.
The other type of annoyance I find ties in with this: it is the practice of making a hidden index, or a privileged level of access, available to search engines that normal people don't see. I've seen a few computing and engineering websites do this, and Experts Exchange is particularly annoying for it: you can google your query and see an excerpt from the page with the question but when you go there you find out that access to the answers requires membership and/or payment. This, as far as I'm concerned, is just a blatant money-grabbing exercise and should be anathema. Either your results are free to access, or they're not - search engines should not be privileged in that respect.
All posts licensed under the CC-BY-NC license. Author Paul Wayper.