Dr. Angela Berardinelli

Assistant Professor - Department of Mathematics and Information Technology

Mercyhurst University

What is data science?

Data science is a relatively new field of study that involves making sense of data. Often this data is unstructured - it is not formatted nicely (it might just be a list of words and numbers). These days there is a LOT of it - a data science project in a "real world setting" might involve taking terabytes or petabytes of data and trying to put it into context. This is where the term "big data" comes from.

For comparison: MB stands for "megabyte," GB stands for "gigabyte," TB stands for "terabyte," and PB stands for "petabyte." One song file is about 3 MB = 0.003 GB. Most smartphones these days have 16 GB or 32 GB of internal storage (the Samsung Galaxy S6 has a minimum of 32 GB storage, and the iPhone 6S has a minimum of 16 GB storage). Your laptop's hard drive is probably 100-500 GB (the current MacBook Air models come with 512 GB of storage). 1 TB = 1012 bytes = 1000 GB and 1 PB = 1015 bytes = 1 million GB. So one petabyte of data is what can be held by the hard drives of 500,000 MacBook Airs.

In 2012, Facebook revealed that it was accumulating more than 500 TB of data each day (about 1,000 full MacBook Air hard drives every day). For Facebook, "data" would be photos, status updates, likes, shares, links, friendships, videos, etc. It is likely that Facebook is accumulating a lot more data than that right now; their active user base grew from 901 million users at the beginning of 2012 to over 1.5 billion users by the end of 2015.

Back to data science... Data science takes data (collected by companies, governments, non-profit organizations, individuals, etc.), analyzes it (using various algorithms), and presents results. A data scientist interprets these results and gives new insight to decision makers (in the business/government/organization) to help guide them to the next steps for their organization based on the evidence provided by the data. Data science involves a unique blend of technical skills, statistical models and other quantitative analysis techniques, and knowledge of the target market (for example, the specific business market that is being analyzed).

Expertise in all three of these areas makes for a great data scientist (computer programming, statistics, and "the target market" - whatever that means in your context). Computer programming and statistics come into play during the preparation of the data ("cleaning the data" - removing the "noise" and the nonsense while preserving and organizing only the "useful" parts of the data that will actually be part of the analysis) and the implementation of machine learning algorithms that are often based heavily in an understanding of statistics. Knowledge of the target market is important in all phases of the project. It is essential to understand the context of the question in order to find a useful answer. A good data scientist should know the market well enough to identify the difference between "useful" parts of the data and the "noise" in the data, to identify which algorithm will be able to answer the questions of interest to the client, and to interpret the results into actionable insights for the client.

About the Profession of "Data Scientist"

  • This short book (Kindle edition is free) describes four totally different types of data scientists and how each can be useful to an organization. The authors even created a web survey (found here) based on their research which can tell you what kind of data scientist you are.
  • This article tries to explain to business professionals (or other people who may not understand the importance of data professionals) why/how the skills a data scientist has are useful to their business.
  • A 2013 blog post about the demand for data scientists, what they are, and how to become one
  • "Data scientist" was rated #1 career for work-life balance in 2015 by Glassdoor, #9 best job in America in 2015 by Glassdoor, and #8 best paying job in 2015 by Forbes.
  • A 2015 article listing the 10 cities with the most growth in data science jobs available. California and Washington DC are the top two locations for pure number of data professionals, but after the initial BOOM in data professional employment, companies all across the country in all different kinds of industires are still catching on.

How can Mercyhurst University prepare you for a possible career in data science?

MIS 150 (Introduction to Data Science) is a course open to students from all majors. In that course, we work with smaller data sets to learn about several common data science algorithms, and we do all of our analysis in Microsoft Excel to help bolster students' quantitative analysis techniques in a way that will be useful for any career after graduation.

MIS 150 alone is not enough to prepare a student for a career in data science, but it can be a great way to gain exposure to help you decide if it is something you are interested in exploring further. If your curiosity for data science is not satiated by MIS 150, you have two options after that (although MIS 150 is not a prerequisite for the programs described below).

If you already have a bachelor's degree or if you are a graduating senior (from Mercyhurst or from another institution, with any major), you can apply for our two-year masters degree (MS) program in Data Science.

As an undergraduate student (at Mercyhurst, with any major), you can apply for the Data Science Scholars program in your sophomore year. If accepted, you would take 12-15 credits total of graduate data science courses in your junior and senior years. You then would have the option to complete a master's degree in data science with just one additional year (typically called a 4+1 program, where you receive an undergraduate degree in 4 years, then add just one more year of school to get a graduate degree).

Click the links provided in the paragraphs above for more information about our graduate data science curriculum or to apply for one of the programs.