PDA

View Full Version : Search engines vs. Directories



minstrel
10-17-2003, 01:58 PM
Several threads recently have asked questions about directories like DMOZ/ODP -- one has a question about why Google's directory lags so far behind DMOZ. I came across an article (excerpted here) which makes reference to what I see as the fundamental problem with DMOZ or similar projects, including the "ESP" project mentioned here -- although well-intended, the exponentially increasing size of the internet makes it virtually impossible to keep up by reviewing each new site with "human hands".



Associated Press
October 17, 2003

PITTSBURGH — Carnegie Mellon University researchers are using an Internet game to help improve artificial intelligence, in hopes of making Web searches more powerful. Graduate student Luis von Ahn and his mentor, professor Manuel Blum, believe search engines can one day adopt word labels generated by their ESP Game to help computers see images more as humans do.

(snip)

Search engines use algorithms — mathematical recipes designed to solve problems — to sort, rank and filter pages, text and images on the Internet. The ESP Game tries to improve upon that by asking two players who don't know each other to type in words that describe a series of images. Players win points when they match words — and those matches become labels Mr. von Ahn and Mr. Blum can affix to the image in question.It would take too long for researchers to label the hundreds of-millions of images that can be accessed by Google or other search engines. But Mr. von Ahn believes that task might be accomplished in a few months by getting a few thousand people to play the game each day.

Spokesmen for Google and Alta Vista were mum on that prospect, and some industry analysts were skeptical. Sophisticated algorithms can track which sites help the most users with specific questions — and that's generally faster and cheaper than using a phalanx of human editors, said Danny Sullivan, editor of the on-line newsletter SearchEngineWatch.com.

mikmik
10-20-2003, 12:19 PM
I just read one estimate put it that 50,000 new sites, or maybe it was just pages(anyone? - can't remember where I read it), but some obscenely large number of new additions are posted monthly to the www !
So how many pictures does that bring the total to now? Let's see, multiplied by about 30 differrent languages for starters (how many charsets I wonder?)
and several descriptions of each picture - generated and analyzed by humans at the optimistic rate of one per minute = ... Somebody better send out for pizza, we'll be here a while! Anyways I know it is just to get patterns that AI bots could use but I think it would be far easier just to read "alt" tags myself, let alone get a program to interpret the visual information of graphics the way humans do.
Don't tell Mr. von Ahn and Mr. Blum or it will ruin their day! Xml anyone?

zbatia
10-20-2003, 05:03 PM
I have submitted RTEK2000.com to DMOZ long time ago (I believe more than 6 months ago) and it's still not there. I don't expect it to be there because the process of inclusion in the Directory is not authomated but rather is manual (see mikmik note above).

alienzhavelanded
10-20-2003, 09:09 PM
Although I agree it's time consuming and tedious, I see the wisdom in having at least one human-edited directory. The future is of course, with the spiders becoming more advanced though. The limitations of the system in the article were well pointed out by mikmik in just a few short sentences.

The Martian

minstrel
10-20-2003, 09:56 PM
yikes... that title rather sounds like surrendering in battle...


I have submitted RTEK2000.com to DMOZ long time ago (I believe more than 6 months ago) and it's still not there. I don't expect it to be there because the process of inclusion in the Directory is not authomated but rather is manual (see mikmik note above).

Another problem is that, if you explore the DMOZ directory, you'll soon discover that a large number of categories have no editors... at least in categories relevant to my profession and website... since DMOZ explicitly tells you you can't submit to more than one category, what happens if the one you choose doesn't even have an editor?

cbp
10-21-2003, 01:34 AM
Another problem is that, if you explore the DMOZ directory, you'll soon discover that a large number of categories have no editors... at least in categories relevant to my profession and website... since DMOZ explicitly tells you you can't submit to more than one category, what happens if the one you choose doesn't even have an editor?

Every category has an editor. Editors in categories at a higher level can and do edit those categories that have no editor listed. 200+ editalls can edit any category.

I noted the comment from mikmik re 50 000 sites or pages get added daily to the www - how many of those is DMOZ ecpected to add? 2000-4000 sites gets added daily to DMOZ (substantially more gets submitted).

CBP

Derald
10-21-2003, 08:55 AM
I am listed on DMOZ and 5 or 6 weeks ago I used the "update url" in my category to change the description, since I have expanded the services I offer.

How long should I wait before doing it again? I don't want to irritate the powers that be.

Derald

minstrel
10-21-2003, 09:19 AM
Every category has an editor. Editors in categories at a higher level can and do edit those categories that have no editor listed. 200+ editalls can edit any category.

Yes, but as the higher level categories grow, which they do inevitably, that editor eventually runs out of time to properly attend to the subcategories - then a subcategory like "clinical services" (a hypothetical) ends up with maybe 7 or 8 entries, which isn't even a representative sample...


I noted the comment from mikmik re 50 000 sites or pages get added daily to the www - how many of those is DMOZ expected to add? 2000-4000 sites gets added daily to DMOZ (substantially more gets submitted).

...and that is precisely the point: assuming your figures and mikmik's are accurate and assuming the maximum additions for DMOZ, that means they can index 8% of new sites daily -- they are doomed to fall ever farther behind. I am not trying to imply that the editors aren't working hard - I'm trying to argue that no matter how well-intentioned they are or how hard they work at that single task, the numbers are written on the wall: they cannot do what they are trying to do.

cbp
10-21-2003, 04:38 PM
How long should I wait before doing it again? I don't want to irritate the powers that be.

Don't do it again. Everytime you submit a site or an update, it overwrites the previous one. If the editor chooses to deal with the pool of suggestions by date, you will be putting yourself at the back of list.

Also, your suggestion to change the description may have been rejected. In the category I edit I change almost all the descriptions and reject almost all suggested changes that are submitted as they do no meet the DMOZ guidelines (they are usually packed with keywords, and that is not acceptable).

Best way is to go to http://www.resource-zone.com/ and ask the status - but PLEASE read the forum guidelines first

CBP

Derald
10-23-2003, 05:58 AM
How long should I wait before doing it again? I don't want to irritate the powers that be.

Also, your suggestion to change the description may have been rejected. In the category I edit I change almost all the descriptions and reject almost all suggested changes that are submitted as they do no meet the DMOZ guidelines (they are usually packed with keywords, and that is not acceptable).

Best way is to go to http://www.resource-zone.com/ and ask the status - but PLEASE read the forum guidelines first

CBP

Thank you for your suggestion regarding the forums. I read the guidelines last night and I'll post today.

I would like to change to correspond with my current meta tag description. It is not packed with keywords. Google seems to keep grabbing my old description from DMOZ when it dances.

Thanks,
Derald

cbp
10-23-2003, 07:07 PM
I have not checked your site, but generally what is in the meta description of a lot of sites that are submitted to DMOZ contain "marketing hyperbole" rather than the DMOZ guidelines of a good, clean and simple description of the site.

Will look out for you at resource-zone.

CBP

Derald
10-24-2003, 05:38 AM
I have not checked your site, but generally what is in the meta description of a lot of sites that are submitted to DMOZ contain "marketing hyperbole" rather than the DMOZ guidelines of a good, clean and simple description of the site.

I've read the guidelines for submission and they seem reasonable to me. My purpose to be more accurate about scope and focus of my company's services. What is your opinion of my new description?

Current:
Atlanta, Georgia area designer specializing in creative communications, file preparation, commercial printing, and web design services to regional and national companies. Services description, portfolio, and bio of designer.

Proposed:
Atlanta, Georgia design firm specializing in graphic design, web design, interactive media, digital prepress, and commercial printing services to regional and national companies.

Thanks,
Derald

atimmins
10-24-2003, 11:45 AM
Has anyone heard if they have resolved their server issues, there was litterally a month when I could not submit a site for review. IE would churn and churn forever and finally return an error page. I havn't tried in acouple of weeks now, which I know is bad. I have kept a list of sites to submit once I know I wont waste an hour (or so it seems) to get an error message.

cbp
10-24-2003, 06:13 PM
Derald:
Looks fine to me, but its not up to me - I edit over in a health category :-( It will depend on if the editor does see that there is really a difference between the two descriptions and that reflects what the site is about. Suggestions for URL & description changes can take a while or not - depends on the priority given by the editor (ie add new sites; delete dud sites; change URL's; change descriptions; have a life).

Atimmins:
The server issues have been resolved for a while now. The error message does not necessarily mean that your suggestion went through, so resubmitting will not help (may hinder) - go to http://www.resource-zone.com/ and ask if suggestion received (read guideline first) - they will tell if you need to re-submit.

CBP

Derald
10-27-2003, 09:02 AM
Derald:
Looks fine to me, but its not up to me - I edit over in a health category :-( It will depend on if the editor does see that there is really a difference between the two descriptions and that reflects what the site is about. Suggestions for URL & description changes can take a while or not - depends on the priority given by the editor (ie add new sites; delete dud sites; change URL's; change descriptions; have a life).
CBP

Thanks for your input and response. I realize all of you are volunteers and personally, I appreciate all of your efforts.*

This message is not intended to be a "suck-up" to the editorial team at DMOZ. No urls were harmed in the creation of this post.

Derald