Data Science is rapidly growing in popularity both in companies and in academia. So what exactly is Data Science and how does Codio take Data Science out of the realm of the weird and wonderful and help students get their hands dirty?
What is Data Science?
Data Science is a combination of techniques and tools that allow you to take data sets of any size and any kind and extract meaning from them.
It uses techniques and methods from many different fields, including mathematics, statistics, computer science data mining, databases and data warehouses.
According to Wikipedia, “data scientists use their data and analytical ability to find and interpret rich data sources; manage large amounts of data despite hardware, software, and bandwidth constraints; merge data sources; ensure consistency of datasets; create visualizations to aid in understanding data; build mathematical models using the data; and present and communicate the data insights/findings. They are often expected to produce answers in days rather than months, work by exploratory analysis and rapid iteration, and to produce and present results with dashboards (displays of current values) rather than papers/reports, as statisticians normally do.
What tools are widely used?
There a a wide variety of systems, platforms and tools that can be used to address Data Science requirements. Which tools you use depends very much on what format your data is in and where it is stored, what you already know and love and what you don’t yet know but ought to.
- Databases : data is often stored in SQL and NoSQL databases and so some familiarity with database queries is important to gain access to the source data in the first place and massage it into the correct format for later stage analysis tools.
- R : the R programming language is now the most popular language for processing data and visualising the results.
- Python : the Python language has a very rich set of tools and modules aimed specifically at data science. iPython/Jupyter Notebook is widely used for teaching purposes and libraries such as numpy, scipy, matplotlib and many others provide a rich ecosystem.
- Matlab, SPSS and SAS : these and other tools are high-end and expensive tools that are custom written for statistical analysis. They are widely used in the corporate environment and also in academia where academic discounts are available. Hadoop and Spark : these are open source software framework applications for distributed processing of large data sets. They can scale from a single machine to clusters containing thousands of servers. They use simple programming models but are non-trivial to configure and use.
- Excel : you would be surprised just how much can be accomplished with Excel for smaller data sets and tasks. However, we would not consider it a full component in the Data Science tool chest.
So, how does Codio help?
Codio provides very considerable assistance for both teachers and students. We help not only with a massively scalable cloud infrastructure but also with teacher support features, a browser based IDE, class management, LMS integration and many other features designed to accelerate the pace of teaching and learning.
Codio gives students any number of Cloud based servers with an automatically attached IDE. Each server is a full Ubuntu server with sudo level access. Where you install components or complete Data Science environments manually or instantly grab them from a Codio template, you can do whatever you like with a Codio project.
You can install any database, programming language, platform or component that can be installed on any regular Ubuntu server. Each server also gets its own domain name and is fully web facing, so applications like Jupiter Notebook and R-Studio Server run no problem.
You can even teach and install Hadoop and Spark for distributed processing applications and Big Data scenarios.
Teachers of data science will find Codio seriously streamlines their workflow and student interactions, allowing them to spend a lot more time with the students and a lot less on setup, configuration and administration.
- A teacher can create and snapshot custom configurations for commonly used Data Science tools.
- Snapshots can be taken off the shelf by students, avoiding the considerable time associated with environment setup.
- Each project created by a student is a full-fledged Ubuntu server in the Cloud with a Web IDE attached.
- There is no limit to the number of configurations a student can use. Each project is fully independent of others, avoiding the many types of conflict that can occur when trying to install and maintain multiple configurations on a single machine.
- There is no need to prepare environments within a CS lab. As everything is set up by a lecturer, runs in the cloud and is browser based, your CS lab machines only need a browser with nothing needing to be installed on your own hardware.
- Students can use their own PCs, reducing the need to invest so heavily in CS computers and also allowing students to work from home.
- Teachers can assign any configuration at any time to an entire class so students get their own machine to work on.
- From Codio’s LMS area, teachers are able to instantly access any student’s environment and assess, grade or assist
There are many advantages for students, too.
- Access course materials or their own projects from any machine, not just CS lab computers.
- Instantly create Data Science environments by using Data Science stack templates or by custom creating their own Stack.
- Create any number of Data Science projects without worrying about cost. Codio’s container technology means that whether you have one project or ten, the cost is the same. That's not something you can achieve with VMs either.
A Big Data Case Study
If you would like to hear how Codio is used to teach Big Data using Hadoop and Spark, please read the Kent State Big Data case study.
If you are teaching Data Science or Big Data, Codio offer many features for both students and teachers that allow you to concentrate on doing rather than configuring.
Our free trial allows up to 100 students and teachers to experience the Codio platform without restriction.