History AI - Part I: AI Making History

In the summer of 2023, I embarked on an ambitious journey to collect an extensive archive of historical images, totaling close to 2 million, from a public online repository. These captivating images capture typed or handwritten accounts of war experiences, predominantly in a single non-English language, although they encompass various other languages as well. While I won’t disclose the name of the archive or share any documents here, I may provide some information in the future.


History AI - Part II: System Design

Assumptions / Constraints We will operate on a dataset of ~2,000,000 jpeg images / ~500GB The initial budget is $1000. It is expected that this will increase, but the goal is to re-evaluate the budget prior to spending. We will operate using the Google Cloud Platform (GCP) but might explore other cloud offerings when performance or cost become a concern System Design Scraping I’ve implemented scrapers using various languages including PowerShell, Node.

