In previous post, I discussed a graph that suggested that the CO2 and CH4 levels in the atmosphere are unprecedented in the last 800,000 years and proposed that it is misleading to compare high resolution data with low resolution data. After I published that post, I wondered whether I could illustrate this with an example. It should be possible if I had some detailed dataset. Then I could make a detailed graph, see how that looks like, then sample this dataset in the same way as a proxy dataset and again make a graph. Comparing both graphs should make clear what the effect is.
Initially, I considered using white noise, but after a while I thought that it would be better if I use some real world, high resolution data with some trend in it. I remembered that the Dutch meteorological services published their measurements for different stations like for example the daily temperature data from De Bilt. This is detailed information and I know how it looks like, so that would be a good thing when I start comparing.
The data is provided as 1/10th of a degree Celsius (for example, 150 is the representation for 15 °C). In what follows, I will keep it in that format.
I then needed the resolution of a proxy. At first, I wanted to sample in a regular pattern, but when looking at some proxy datasets, I noticed that this was not the case in those datasets. They mostly have a 100+ years interval, but not in a regular pattern. I then settled with the dome C proxy dataset. This dataset is for CO2, but this is not important because I only want to use the sample rate of this dataset in order to see what it would do with the know example dataset.
The dome C proxy dataset is 12,609 years long (from 9,067 to 21,676 years) and has a sampling rate of about 175 years. To make it myself a bit easier, I took the last 12,609 data points of my daily temperature dataset (from July 25, 1983 until the end of 2017). This is the result:
The datapoints are in the range of -132 and +271 tenths of a degree. There is some trend in the data. At the beginning it seems a bit cooler, then temperatures go up around datapoint 2000, then back up around 5000 and again between 10,000 and 11,00. Generally, temperatures going up slightly.
I then took the intervals from the example proxy dataset and sampled the temperature values accordingly. This is the result:
Look at the data range. It is now in the range of -10 and +239 tenths of a degree. It is getting more narrow. Which is quite logical since it would be quite coincident that all extremes would be sampled.
Okay, but this is not exactly what happens with proxy data. The proxy samples are not from a specific year, but from a range of years. If I average the values in those ranges, then I get this:
Again, look what happens with the range of the data. Now we are left with a range of +12 and +170 tenths of a degree (compared to -132 and +271 tenths of a degree in the original data). Which is also logical since by averaging the values in those ranges, the graph is basically smoothed. This also removes the subtle trend from the original data. The first lower trend around datapoint 2,000 is now gone, the second one is still somewhat visible. The third is not visible, but there is another dip before 10,000 that wasn’t in the original data. It was created by the short sample rate at that point, therefor averaging two low values and resulting in a low value for that range.
What will happen when I delete the 2017 data (3 data points) and replace it with the actual value of one of those datapoints. I take the value of 23.5 °C to show the issue with the low resolution data) and get this:
Now the big question: does this somehow mean that a daily temperature of 23.5 °C is unprecedented in our 32 years dataset?
Not necessarily. It might be unprecedented in the smoothed, low resolution data (there was no range in the daily temperature dataset with an average of at least 23.5 °C), but it is not unprecedented in the underlying higher resolution data. There are 102 measurements of at least 23.5 °C in the daily temperature record since 1983. When we look at the original data, then there are 1,873 measurements that exceed the highest value of the average data (more than 170) and 2,554 measurement that are outside the range of the averaged data (less than 12 or more than 170).
Why couldn’t we see all this in previous graph? Well, these were hidden in the lower resolution data. Their absence was an artifact of the low resolution sampling of the dataset.
Okay, this example doesn’t say anything about whether the current CO2 and CH4 levels are unprecedented in the last 800,000 years. It is just a practical example to show that it is not possible to prove that a certain value from a high resolution dataset is unprecedented using a low resolution dataset. Just as in the proxy datasets that have a very low resolution of on average 170 years and never had a period with an average of 386 ppm CO2 or 1790 ppb CH4. Even if such a 60 years period would exist, it would not show up in this low resolution dataset, so it is misleading to suggest that our current (high resolution) values are unprecedented on basis of such low resolution data. There is no way to tell how many similar warmings in the past may have been hidden in the lower resolution proxies.
What amazes me most is that nobody is taken seriously when they challenge this absolute nonsense. Statisticians and various other fields should be interviewed on the news to point out that this practice is effectively lying to the public. It cannot be taken for anything other than a blatant lie.
What amazes me is probably related: that even experts seem to suggest in their communication that a high resolution dataset infers something about a low resolution dataset and that there is so little nuance this is a comparison between two distinct different things. It is not even a comparison between for example two temperature datasets, but a temperature dataset compared to a dataset that has some relation to temperature AND in a radical different resolution. It is not even an apple-to-orange comparison, but more an apple-to-car comparison or something alike.
LikeLiked by 1 person
very nice “experiment” .. and: you even get a hockey stick in the end !! .. THAT cannot be “bad science” ..
it is always interesting to revisit here, and spend an hour or so reading …
It is very easy to find hockey sticks if you know where to look. I am always puzzled when I hear the claim from scientists in the media that the smaller range of temperatures/CO2/CH4 in the proxy data means that those parameters where incredibly stable in the past, while the same smaller range can be achieved by the sampling rate of such a low resolution dataset…
Always nice to hear that visitors find this blog interesting. Thanks for sharing!
I used a link to this article a few days ago on twitter .. No reaction from the original poster, but an “interesting” reaction from ULG prof. Damien Ernst ..
Now, since you have the script or the excel sheet at hand, wouldn’t it be interesting, if time permits, to do a few more simulations ?
– introduce a small known longtime trend into the original data and see how the proxy reacts (with ev. introduction of “the measured last 3 datapoints” in the proxy
sorry, that was gone before it was finished … 🙂
– introduce a bigger known longtime trend, when little is seen in the proxy with the small longtime trend
– introduce 2 or 3 shorttime trends in the data, and see how the proxy reacts ..
other ? …
That should be possible. I agree that the trend of the KNMI data was not very pronounced. i could certainly introduce increasing and/or decreasing slopes or some other trend to see what happens.
The introduction of shorttime trends was something that I was contemplating (until I spotted the the Energy in Australia article…).
I am traveling right now and don’t have the Calc sheet at hand. Will look at it after I return home. I am also working on another project that is more urgent, so it could take some time before I put myself to it.
Thanks for the notification Duc. I am more focused on studying and writing posts than on looking whether these posts generate discussion on Twitter (I probably should do more effort on Twitter), so it is nice that someone else is paying attention.
I found the discussion you mentioned. Not sure whether the “Interesting” reaction of Damien Ernst refers to the original poster, your reaction to it, the link in your reaction or the clashing opinions of you and Kees van der Leun. I am however rather intrigued by Damien Ernst’s like of your tweet.
I did the processing in LibreOffice Calc. I am more used to Excel (at work), but it doesn’t work on Linux (OS of my home computer and laptop). Looking for an alternative though (Calc as well as Excel do not handle larger datasets well). Looking into R at the moment, seems promising.
It is an interesting idea to do some more simulations. I was playing with that idea when writing that post, but then suddenly another interesting story emerged and I forgot about it.
I’m pretty sure that the “interesting” was about your blog …