is Íslenska en English

Grein

Landsbókasafn Íslands - Háskólabókasafn > Rit starfsmanna Lbs-Hbs >

Vinsamlegast notið þetta auðkenni þegar þið vitnið til verksins eða tengið í það: http://hdl.handle.net/1946/6073

Titill: 
  • Titill er á ensku Incremental Crawling with Heritrix
Efnisorð: 
Útgáfa: 
  • September 2005
Útdráttur: 
  • Útdráttur er á ensku

    The Heritrix web crawler aims to be the world's first open source, extensible, web-scale, archival-quality web crawler. It has however been limited in its crawling strategies to snapshot crawling. This paper reports on work to add the ability to conduct incremental crawls to its capabilities. We first discuss the concept of incremental crawling as opposed to snapshot crawling and then the possible ways to design an effective incremental strategy. An overview is given of the implementation that we did, its limits and strengths are discussed. We then report on the results of initial experimentation with the new software which have gone well. Finally, we discuss issues that remain unresolved and possible future improvements.

Tengd vefslóð: 
Samþykkt: 
  • 27.8.2010
URI: 
  • http://hdl.handle.net/1946/6073


Skrár
Skráarnafn Stærð AðgangurLýsingSkráartegund 
iwaw05-sigurdsson.pdf166.15 kBOpinnHeildartextiPDFSkoða/Opna