For time immemorial, Man has been afflicted with a terrible longing. This longing, this desire, lies at the heart of all human endeavors, from the construction of great empires, the domestication of the earth, the advancement of knowledge, and the beginning of our exploration of the stars.
What is this longing?
I am, of course, referring to the desire to do cool shit.
In the spirit of this venerable tradition, I have embarked on a challenge presented by OpenAI: the OpenAI to Z Challenge. Our task? To utilize new advances in Artificial Intelligence to scan open-source datasets of the Amazonian Rainforest for as-yet-undiscovered ancient civilizations.
This challenge still ongoing, and will not conclude for another month. Together with fellow NS member Bjorn, I have spent the past week researching a mountain of datasets and going down research rabbit holes. Now, as we move from the research phase to the building stage, I want to take the time to share what I’ve learned.
The Amazonian Rainforest stretches over six million square kilometers. Only a small percentage of that has been meticulously mapped and studied. Conventional, in-person archaeological expeditions have only reached the more accessible exterior. The interior is an endless labyrinth of the most aggressive flora and fauna on Earth.
Fig. 1: Satellite image of the Amazonian Rainforest, via Google Earth
The lighter green areas have been somewhat cleared and contain human human habitation. The darker green area is largely untouched - the only modern human settlements are located along the river banks.
For centuries, the interior was all but impenetrable to outsiders - exploration was slow, expensive, and unbelievably dangerous. Many explorers who made the voyage never returned home.
Today, decades of data collection has granted archaeologists the tools to source leads far more easily and quickly than they ever could have done with conventional methods. The Amazon has long been known to contain countless geoglyphs - large artworks carved into the face of the Earth. Most known examples were discovered using traditional archaeological methods, but a recent discovery made headlines when researchers successfully used AI to discover a new, never-before-seen geoglyph in Peru.
Fig. 2: Nazca geoglyph detected by AI, located in Peru, via BBC
This week, I decided I’d try to use AI to located geoglyphs for myself.
The OpenAI post detailing the challenge included a number of helpful links, including one to the personal website of archaeologist James Q. Jacobs. From his blog, I was able to download a file containing a map of the known geoglyphs in the rainforest. As it turns out, there are… a lot of them.
Fig. 3: Geoglyph map
This image is a small slice of Brazil’s southern border. Each dot is a known geoglyph. They number well into the hundreds, at a density that suggests a massive human population. Most geoglyphs are not artistic works - they are instead the outlines of ancient buildings and settlements
Fig. 4: Close-up of geoglyph, with outlines
Fig. 5: Close-up of geoglyph, without outlines
Even after thousands of years, many of these glyphs are still clearly visible. However, only a small percentage of the rainforest has been examined for geoglyphs. I had a theory: if I cross-referenced this map with maps of recently-deforested land, I might be able to find new, never-before-seen geoglyphs.
One of the data sets we have to work with is called TerraBrasilis. It’s an open-source database provided by the Brazilian government that contains data on Amazonian deforestation. It covers every year from 2007 up to the present. The initial file was 1.75 GB in size and couldn’t be visualized directly, so I created a Jupyter Notebook to slice up the data into manageable pieces.
Fig. 6: Jupyter Notebook
From here, I was able to convert the data into a format that Google Earth could understand:
Fig. 7: Recently deforested zones
Recently deforested areas are less likely to have been examined by archaeologists. My idea was this: start at an area known to possess a large number of geoglyphs, then travel in the direction of recent deforestation. There’s a good chance that new geoglyphs could be found there.
Within five minutes of looking, I found a strangely-circular depression in the ground. I asked GPT-o3 what it thought and it believed that we were looking at a geoglyph.
Fig. 8: Potential geoglyph site
In order to verify that o3 wasn’t hallucinating, I asked it to examine a number of images - some containing verified geoglyphs, some not. It was extremely accurate in identifying geoglyphs - something I which surprised me.
So, what did we learn?
We learned that:
There are massive amounts of land that have not been examined for geoglyphs.
GPT-o3 can identify geoglyphs.
It is theoretically possible that we could just unleash GPT-o3 on a massive amount of satellite imaging data and identify large numbers of new geoglyphs.
We could drastically reduce the amount of compute needed by cross-referencing other data sources, like TerraBrasilis, to identify areas more likely to contain novel geoglyphs.
What’s next?
There are many other datasets and data types that we can learn from. For instance, Global Ecosystem Dynamics Investigation (GEDI) data provides insight into topography that isn’t obvious from satellite imaging. LiDAR data, such as Sentinel-2, is particularly valuable: because these images penetrate the canopy, LiDAR data allows researchers to examine the forest floor in high-definition for any as-yet-unseen architecture. By leveraging, these tools, we can make more educated guesses about where geoglyphs and other ancient architectures are likely to be.