Facilitating Diverse Collection and Curation in Web Crawling and Indexing and Blockchain: What's Not To Like?
Seminar | November 2 | 3:10-5 p.m. | 107 South Hall
Matt Bayley, Mark Graham, and David S. H. Rosenthal
Facilitating Diverse Collection and Curation in Web Crawling and Indexing
(Matt Bayley & Mark Graham)
We propose to create an open and publicly available index of the public web. Building on the 22 year history of Internet Archiveâs effort to archive, and make available, web pages (URLs) we will construct a publicly accessible list of web sites (hosts). We will provide a variety of ways for people to interact with the data with two key areas of focus being efforts to support more/better web archiving as well as general research about the Web. In addition to indexing about 2 billion URLs for web hosts we plan to create/associate various metadata including language, genre and last observed HTTP status codes. We consider this project to be foundational to an ongoing and expanding effort to map resources available via HTTP. Obvious additional enhancements (beyond the scope of this initial project phase) might include adding link graph data and user-generated metadata.
Blockchain: What's Not To Like?
(David S. H. Rosenthal)
We're in a period when blockchain or âdistributed ledger technologyâ is the Solution to Everythingâ, so it is inevitable that it will be proposed as the solution to problems in academic communication and digital preservation. These proposals typically assume, despite the evidence, that real-world blockchain implementations actually deliver the theoretical attributes of decentralization, immutability, security, anonymity, lack of trust, etc. The proposers appear to believe that Satoshi Nakamoto revealed the infallible Bitcoin protocol to the world on golden tablets; they typically don't appreciate or cite the nearly three decades of research and implementation that led up to it. This talk will discuss the mis-match between theory and practice in blockchain technology.