is Íslenska en English

Lokaverkefni (Meistara)

Háskóli Íslands > Verkfræði- og náttúruvísindasvið > Meistaraprófsritgerðir - Verkfræði- og náttúruvísindasvið >

Vinsamlegast notið þetta auðkenni þegar þið vitnið til verksins eða tengið í það:

  • Titill er á ensku Adaptive Revisiting with Heritrix
  • Meistara
  • Útdráttur er á ensku

    The World Wide Web contains an increasingly significant amount of the
    world’s knowledge and heritage. Since the Web is also in a constant state
    of change significant efforts are now underway to capture and preserve its
    contents. These efforts extend the traditional legal deposit laws that have
    been aimed at preserving printed material over the last centuries.
    The first three chapters outline the fundamental challenges for collecting
    the Web and present the software, Heritrix, which has been designed to
    perform this task. The first chapter focuses on the reasons and history
    behind this endeavour, with chapters two and three focusing on more
    technical aspects.
    The goal of this project was to develop a new way of collecting parts of
    the Web that are believed to change very rapidly and are considered of
    significant interest. The later chapters focus on defining such an
    incremental strategy, which we call an ‘adaptive revisting strategy’ and
    how it was implemented as a part of Heritrix. A part of this discussion is
    how to detect change in documents.
    Finally we discuss initial impressions of the new software and highlight
    areas that require further work or attention. As the goal of the project was
    primarily to establish the foundation for such incremental crawling and
    provide a simple and sturdy implementation, this section contains many
    thoughts on issues that could be improved on in the future.

  • 16.3.2009

Skráarnafn Stærð AðgangurLýsingSkráartegund 
Adaptive Revisiting with Heritrix - Thesis.pdf1.12 MBOpinnThesisPDFSkoða/Opna