Introduction
Unlocking the mysteries of the cosmos is no longer confined to the realms of science fiction, but data science. In this blog post, we are going to use DataChat to analyze UFO sightings in the United States. We can see the trends of location, seasonality, time and shape to predict where you should be if you want to see the next UFO flying over the United States. Join me on a data-driven journey as we not only unravel the past but also provide invaluable insights on predicting where and when UFO sightings are most likely to occur in the future.
Location
Our dataset gives us information about the location, using latitude and longitude, the date and time of the sighting, as well as the shape. I have a hypothesis that seasonality will play a huge effect on the number of sightings. As the seasons change, and with it the weather, the amount of sightings too will fluctuate, based when people are out and about and looking at the sky. DataChat makes it easy to plot.
So far, the breakdown follows a lot like you’d expect. The number of sightings matching the population trends of the United States, more on the coast, less in the middle, more people, more chance they’re going to see a UFO.
Instead of looking at large swaths of longitude and latitude, I am instead going to bin those columns into quadrants, much like a latitude and longitude map.
DataChat makes it easy to bin columns of like values. Here, I binned both the latitude and longitude columns, and named them W. This way we can read a Heatmap like a compass. Each quadrant represents a portion of the US, lower numbers being south and west, higher numbers north and east. So quadrant S1,W1 is around San Diego, Quadrant N1,E1 around Maine.
By breaking down the country into quadrants, you can begin understanding where you should be standing if you want to see UFOs.
Time
Alright, now we know where we should be to have the highest chance of seeing a UFO, now let’s figure out when we need to be to have the best chance of seeing a UFO.
DataChat makes it easy to extract information from Date and Time columns. We have a timestamp and a Date column, down to the day and minute, but for the sake of our analysis it will be more useful to look at the trends of this data. So instead of the Data and Time, I want to see only the Hour, and the Month.
Here we have a trend map of every single month showing much more sightings in July.
And a similar chart looks at the total number of sightings going way up after 21:00. Looking at these trends we can see that being outside during July after 9PM vastly increases the statistical likelihood of seeing a UFO.
Descriptive Analytics
So you’re in the Great Lakes region, at 9PM in July. We used the Descriptive Analytics tools of DataChat to find the most likely place to see a UFO. But that’s not all that DataChat can do. It also includes a powerful modeling and predictive analytics engine that can predict as well
Predictive Analytics
Using a DataChat you can Train a Model, to create the same analysis that we’ve done in 5 steps in one. By Training a Model in DataChat, we get an analysis that looks at a Dataset and gives a ranking of which columns most impact your KPI, in our case, the number of UFO Sightings.
This very quickly tells us the importance of Hour, and generates its own visualizations.
Here again showing us which latitude most impacts where sightings are happening.