ForecastWatch December 2010 Customer Newsletter
The November aggregations have run and are now in ForecastWatch. There has been a lot of activity this month. The new server has allowed some analysis (like the drilldown feature announced last month) that previously was unavailable due to resource constraints.
The most important change this month is in how aggregation days are selected. Beginning this month, they will be selected by actual observation date. Previously, they were selected by forecast date. So, for November aggregations, it would collect statistics on all forecasts from 11/1/2010 through 11/30/2010. That works great for all the aggregation types, except for low temperature. For low temperature forecasts, there are two types of forecast providers: those that forecast a morning or 24-hour low, and those that forecast an overnight low. The forecast date for a forecast is the date that is associated with the forecast. This will either be the forecast "tombstone", the column, the XML tree, etc. However, the low temperature, depending on the provider, could be associated with the actual observation date that matches the forecast date (for morning low and 24-hour low providers) OR the actual observation date that is the next day from the forecast date (for overnight low providers).
What this means is that prior to November, the low temperature aggregations were shifted one day for overnight low providers. For example, for October and other previous months, low temperature aggregations for CustomWeather, Environment Canada, Persistence, and Climate Normal forecasts, as well as a number of private feeds, correctly were from 10/1/2010 through 10/31/2010. However, the others would have been from 10/2/2010 through 11/1/2010, because while their forecast dates were from 10/1/2010 through 10/31/2010, the actual observation dates for their low temperature forecasts would be one day shifted.
Thanks to a subscriber who used the new drilldown to individual forecasts functionality, this was uncovered, and corrected. The net result is that for low temperature aggregations prior to this month, you can appropriately compare CustomWeather, Environment Canada, Persistence, and Climate Normal with each other, and Accuweather, The Weather Channel, Intellicast, NWS, NDFD, and Weather Underground with each other. But on months where the first day of the month or the first day of the next month was significantly different than the rest of the month accuracy-wise, you can't compare rightfully low temperature statistics between the groups. All but one day will be the same (10/2/2010 through 10/31/2010 in the example), but the first set will have 10/1/2010 in the aggregation, and the second set will have 11/1/2010 and not 10/1/2010 in the aggregation.
Another major effort this month has been in improving and auditing the canonical forecast descriptions assigned to forecast icons and text forecasts. A canonical forecast is a normalized forecast. For example: "partly cloudy", "partly sunny", and "times of clouds and sun" are all "partly cloudy/partly sunny" forecasts on the canonical "sunny, mostly sunny, PS/PC, mostly cloudy, cloudy" scale. Currently, this doesn't matter much to ForecastWatch statistics. Any icon or text forecast with precipitation mentioned is considered a precipitation forecast. But in the future, I may break down the forecast, and it does matter for sky condition forecasts, etc.
Currently there are 2,052 unique forecast icons in the ForecastWatch database. Every unique URL is a unique forecast icon, and so that number is bigger than the actual number of distinct images. The NWS is the big offender, for a time every regional office had its own web server with icons. These were previously all hand classified. The problem with hand classification over time is that there may be inconsistencies. I've gone through and classified each different icon (no matter how many URLs it had) and cleaned things up. No changes with regards to ForecastWatch, but will make things consistent for the future.
As for text forecasts, there are currently 1,289,805 unique text forecast strings, with Accuweather by far being the most expressive. Part of it is that some forecasters put much more information in their text forecasts than others. There is no real way to categorize them manually, so I created a recursive descent parser program to read and categorize the forecasts. That program has also evolved over time as new phrases have been found. I've improved it greatly over the last month and will be rerunning against the entire text forecast collection. Again, this won't change things in ForecastWatch, but might adjust some of the likelihood categories assigned, or sky conditions slightly.
More things are definitely afoot here to continually improve the data that ForecastWatch provides, as well as its usefulness, but that will have to wait until next month's letter. As always, if you have any questions, concerns, or product ideas, I am always available.
Have a wonderful Holiday and a joyous New Year, filled with family, friends, and loved ones!