March 7, 2000 4:00 AM PST

Researchers work to eradicate broken hyperlinks

Researchers at the University of California at Berkeley say they have come a step closer to solving a frustrating problem familiar to most Web surfers: the broken hyperlink.

In a recent academic paper, computer scientists Thomas A. Phelps and Robert Wilensky outlined a way to create links among Web pages that will work even if documents are moved elsewhere. Although researchers have tried to tackle the issue before, Internet search experts said the paper describes a potentially elegant solution to a widespread and long-recognized puzzle.

"It's a pretty clever way of dealing with a very difficult problem," said Ron Daniel, who once worked on an alternative solution that has been submitted to the Internet Engineering Task Force, an online standards body.

A key feature of the Web is its ability to take readers instantly to related documents through hyperlinks. Some consider it the soul of the medium. But as many as one in five Web links that are more than a year old may be out of date, according to Andrei Broder, vice president of research at search engine AltaVista. When surfers click on such links, they get a "404 error" message.

"The rate of change on the Web is very fast," he said. "And the more active a Web site is, the quicker it changes."

In their paper, Phelps and Wilensky say the preliminary results of their research indicate that the vast majority of documents on the Web can be uniquely identified based on a small set of words that no other document shares. This set of words can be used to augment the standard URL (Universal Resource Locator), or Web address, and turn up the page if it goes missing.

One of the things that makes the research interesting, Wilensky said, is the low number of terms required.

"It takes about five words to uniquely identify a page if you pick the words cleverly and the page is still out there somewhere," he said.

If a document's URL changes, a search engine could be employed to automatically locate the missing page based on the five terms.

"What makes this possible is that you already have a search engine infrastructure," said Wilensky, a professor of computer science at UC Berkeley, who gave most of the credit for the work to Phelps, a postdoctoral student. "You're 'bootstrapping' onto something that's already been built."

Wilensky also noted that the system would rely primarily on Web publishers, rather than on a third-party administrator, an issue that had become a hurdle for some other plans.

AltaVista's Broder concurred that the results of the research were promising, reflecting similar research he has conducted on "strong queries"--or complex searches--in which he found that any document can be uniquely identified using eight carefully selected terms.

"The trick is to find the right formula of rare words that are also important to the meaning of the document," he said.

But Broder warned that the procedure carries the risk that selected words may later be edited out of the document, rendering the identifier moot. For example, he said, in the Phelps and Wilensky paper, the authors used a misspelling, "peroperties," as an identifying term for their paper.

He said the most promising element of the work was the fact that it is compatible with existing systems.

"There is a chicken-and-egg problem involved," he said. "None of the big players will adopt (this kind of system) until a lot of people start using it."

Daniel, who said he has given up active research on the problem in part because of a lack of commercial interest in his work, said Phelps and Wilensky may have hit on a way to solve two parts of a three-part problem: determining an identifier and establishing how the identifier will be linked to a document over the long haul.

But Daniel said they haven't figured out what to do with pages that are deleted from the Web altogether.

"Storage is an interesting issue," he said, adding that intellectual property concerns and rights management could become an issue down the road. "At some point perhaps libraries will evolve into taking an active role in indexing pages. But that will depend on publishers giving out the necessary licensing."

Powered by Jive Software
advertisement

Latest tech news headlines

RSS Feeds

Add headlines from CNET News to your homepage or feedreader.

More feeds available in our RSS feed index.

advertisement

Inside CNET News

Scroll Left Scroll Right
  • News - Business Tech

    Chrome's JavaScript challenge to Silverlight

    The advent of Google's Chrome browser, software pros say, should spur a big speedup for JavaScript, which would raise its standing against Microsoft's Silverlight technology.

  • Gallery

    Photos: Top 10 reviews of the week

    Here are CNET Reviews' 10 favorite items from the past week, including the TiVo HD XL, Sony Cyber-shot DSC-H50, and the Dish Network's newest digital TV converter box.

  • News - Apple

    Apple watchers spot 'iPod Nano' pix, iTunes hints

    The rumor mill has long been predicting a longer, leaner new version of the iPod Nano, and now it's conjuring up some pictures.

  • Coop's Corner

    Chris Shipley 1, Internet lynch mob 0

    Demo's impresario goes public with a tart and smartly written riposte to the shoot-from-the-lip crowd.

  • Video

    Katie Couric reflects on first Webcast

    The political conventions are over and so are CBS Evening News anchor Katie Couric's first series of Webcasts. CNET's Kara Tsuboi sat down with Couric on the final night of the Republican National Convention to discuss what she liked about Webcasting, some of her most memorable guests, and whether TV news will still be around by the next round of conventions.

  • News - Digital Media

    Google-focused satellite enters orbit

    The search titan has exclusive rights among online mapping sites to images from the new GeoEye-1 satellite, which launched Saturday.

  • Video

    YouTube plays party politics

    During the presidential campaigning four years ago, YouTube didn't even exist. Now it's a tool candidates must master to get their message across. CNET's Kara Tsuboi stops by the YouTube upload booths at the Democratic and Republican conventions to find out why Google's video site has such a big presence in Denver and St. Paul, Minn.

  • News - Gaming and Culture

    Are Demo and TechCrunch50 fragmenting their audiences?

    With both events scheduled to start Monday, many press, as well as venture capitalists and others are having to choose which one to attend.

  • News - Cutting Edge

    Execs predict next Google-like tech

    On eve of company's 10-year anniversary, researchers and business pundits speculate about what technologies might someday have as much impact as Google.

  • Gallery

    Images: The art of 'Spore' prototypes

    Will Wright and his Maxis team worked on dozens of prototypes to test the elements of their soon-to-be-released evolution game. Here's a sampling.

  • Crossfade

    The Standard, 'A Different Skin': Free MP3 of the Day

    Eschewing the danceable beats favored by many of its post-punk brethren, while opting instead for more ominous and insistent rhythms, is what makes the Standard visceral and engaging. Download a free MP3 of "A Different Skin" courtesy of CNET Download Mus

  • Green Tech

    Duke Energy to invest in mini solar power plants

    Can hundreds of rooftop solar panels collectively operate like a central power plant? Duke Energy launches $100 million distributed solar program to find out.