The Essence of Data Science
by Stephen Chao Being that this is our inaugural blog post, I wanted to talk about a topic that is relevant but also gives you a sense of who we are and what we hope will come out of this new interest in utilizing data.
Big Data and Data Science has become a huge phenomenon in an extremely short amount of time and it’s showing. Every one of the big New York universities (NYU, Columbia, Cornell, etc.) has a Data Science center in the works as do major universities in other parts of the country. As a whole, it is motivating new graduates in technology, business and statistics to add data science to their resumes to stay competitive and juice up their professional profiles.
But if you ask them, new grads and universities alike, “What is Data Science?” the answers are often dubious and complicated. Half of them may not even believe that data science is a real field.
And admittedly, it is confusing. The study of data as it exists today is VERY new. With Big Data on the rise and the need for innovative talent being such a hot topic among many large businesses and organizations, it makes sense that a new field would rise to meet the demand for new analyses and new storage, processing and analytic tools to manage ever growing data sets.
But I think part of that confusion stems from our perception of what it means to “study” this field and questions about who it is that works with these data sets. I suspect that when people think about the public areas that deal with Big Data and data science, they associate it with commercial industries like computers, IT, retail/sales and finance. And while these industries may be the main source of the demand for data science, they don’t define its essence. There is, in fact, an important part, often ignored: SCIENCE!
Science is the systematic process of obtaining, organizing and growing knowledge that explains subjects about anything in our universe through testable predictions. We develop a hypothesis about something we observe to happen (or, maybe, think should happen that causes other stuff to happen) in an effort to explain why it happens. And we rely on the scientific method to do it. We have an idea. We devise a process for testing and (repeatedly) check if that idea is true. And if the tests confirm the idea, we can consider it correct and knowledge is obtained.
It seems really simple, and as an overall process it is, but the subtleties and complexities can make it a complicated, pain-staking yet beautiful endeavor. Through science, we can start to understand much about the forces that mold our existence and that understanding enriches us as a society; enrichment that is as much spiritual as it is monetary. I hope that Data Science can serve to add to this enrichment.
For this to happen, we need to appreciate what combining “science” and “data” really means and what it has the potential to achieve. At present, data science is generally described as an amalgamation of advanced mathematic, statistic and technological skills being used by a subset of highly skilled and ingenious professionals to make business and commerce work better. To limit the benefit of data science primarily to revenue and profits is extremely short-sighted, and as a data scientist, a little reductive.
I believe that in the practice of data science, the science must come first. Huge data sets are offering huge possibilities. We now have an unprecedented ability to look for explanations to any number of social, numerical, mathematical, or statistical phenomena just to name a very few. Having the ability to analyze data sets in the millions and billions can provide us with new insights on the behavior of trends and numerical phenomenon. New understanding of numerical patterns for example, can be found by examining large sets of numbers and that can lead to new models for encryptions, processing images, and even the understanding of numbers themselves as is explored in mathematical number theory.
And it’s not just numbers and figures. We can aggregate decades-worth of collected information and find relationships that we previously could not because the effort made it infeasible. In astrophysics, for example, we have data on tremendous records on levels of space radiation and energy. A data scientist can create a tool that allows for all of these data readings to be compiled and analyzed, giving new possibilities to how this information can be considered. Social sciences can also get a boost. Modeling millions of data points pertaining to trend data, purchasing behavior and advertising can provide new insight into the way people choose, make decisions, or interpret messages in society. Anthropology, sociology, behavioral psychology, the list of disciplines goes on and on where limits on their innovation are imposed simply because they can’t establish trends to support their ideas. Someday, we hope to be the data scientists that tackle these fields.
Right now, data science has gained popularity, but stands at a crossroad: It can go on being a buzzword or it can be a real discipline that serves to further the collective knowledge and analytical capabilities of every other scientific discipline.
Think about it! Every science has data that needs analysis just as much as every commercial industry has data that needs it. Our specialty is data: how it moves, how it is stored, how it can be utilized, and how it can be wielded to inform our decisions, choices, and understanding of events or phenomenon. While it draws upon techniques commonly associated with math, statistics and computing, the lens through which data is considered and explored is entirely unique to data science.
A scientific approach to understanding is not, and should not be, specific to any single industry just the way Data Science is not, and should not be, considered just a subset talent or a laundry list of skills. I hope we can push these ideas forward and make data science a bold new specialty.
Image courtesy of fotographic1980/ FreeDigitalPhotos.net