Important collection of Data Science, Big Data, R Programming Books
Pulled from the web, here is a great collection of eBooks. While every single book in this list is provided for free, if you find any particularly helpful consider purchasing the printed version. The authors spent a great deal of time putting these resources together and I’m sure they would all appreciate the support!
Data Science in General
- An Introduction to Data Science
Jeffrey Stanton, 2013 - School of Data Handbook
School of Data, 2015 - Data Jujitsu: The Art of Turning Data into Product
DJ Patil, 2012
Interviews with Data Scientists
- The Data Science Handbook
[Buy on Amazon]
Carl Shan, Henry Wang, William Chen, & Max Song, 2015 - The Data Analytics Handbook
Brian Liou, Tristan Tao, & Declan Shener, 2015
Forming Data Science Teams
- Data Driven: Creating a Data Culture
[Buy on Amazon]
Hilary Mason & DJ Patil, 2015 - Building Data Science Teams[Buy on Amazon]
DJ Patil, 2011 - Understanding the Chief Data Officer
Julie Steele, 2015
Data Analysis
- The Elements of Data Analytic Style
[Buy on Amazon]
Jeff Leek, 2015
Distributed Computing Tools
- Hadoop: The Definitive Guide[Buy on Amazon]
Tom White, 2011 - Data-Intensive Text Processing with MapReduce
[Buy on Amazon]
Jimmy Lin & Chris Dyer, 2010
Learning Languages
Python
- Think Python: How to Think Like a Computer Scientist
Allen Downey, 2012 - Python Programming
Wikibooks, 2015 - Automate the Boring Stuff with Python: Practical Programming for Total Beginners
[Buy on Amazon]
Al Sweigart, 2015 - Learn Python the Hard Way
[Buy on Amazon]
Zed A. Shaw, 2013
R
- R Programming for Data Science
Roger D. Peng, - R Programming
Wikibooks, 2014 - Advanced R
[Buy on Amazon]
Hadley Wickham, 2014
SQL
- Learn SQL The Hard Way
Zed. A. Shaw, 2010 - SQL Tutorial
Tutorials Point
Data Mining and Machine Learning
- Introduction to Machine Learning
Amnon Shashua, 2008 - Machine Learning
Abdelhamid Mellouk & Abdennacer Chebira, 450 - Machine Learning – The Complete Guide
Wikipedia - Social Media Mining An Introduction
[Buy on Amazon]
Reza Zafarani, Mohammad Ali Abbasi, & Huan Liu, 2014 - Data Mining: Practical Machine Learning Tools and Techniques
[Buy on Amazon]
Ian H. Witten & Eibe Frank, 2005 - Mining of Massive Datasets
[Buy on Amazon]
Jure Leskovec, Anand Rajaraman, & Jeff Ullman, 2014 - A Programmer’s Guide to Data Mining
Ron Zacharski, 2015 - Data Mining with Rattle and R
[Buy on Amazon]
Graham Williams, 2011 - Data Mining and Analysis: Fundamental Concepts and Algorithms
[Buy on Amazon]
Mohammed J. Zaki & Wagner Meria Jr., 2014 - Probabilistic Programming & Bayesian Methods for Hackers
[Buy on Amazon]
Cam Davidson-Pilon, 2015 - Data Mining Techniques For Marketing, Sales, and Customer Relationship Management
[Buy on Amazon]
Michael J.A. Berry & Gordon S. Linoff, 2004 - Inductive Logic Programming: Techniques and Applications
[Buy on Amazon]
Nada Lavrac & Saso Dzeroski, 1994 - Pattern Recognition and Machine Learning
[Buy on Amazon]
Christopher M. Bishop, 2006 - Machine Learning, Neural and Statistical Classification
[Buy on Amazon]
D. Michie, D.J. Spiegelhalter, & C.C. Taylor, 1999 - Information Theory, Inference, and Learning Algorithms
[Buy on Amazon]
David J.C. MacKay, 2005 - Data Mining and Business Analytics with R
[Buy on Amazon]
Johannes Ledolter, 2013 - Bayesian Reasoning and Machine Learning
[Buy on Amazon]
David Barber, 2014 - Gaussian Processes for Machine Learning
[Buy on Amazon]
C. E. Rasmussen & C. K. I. Williams, 2006 - Reinforcement Learning: An Introduction
[Buy on Amazon]
Richard S. Sutton & Andrew G. Barto, 2012 - Algorithms for Reinforcement Learning
[Buy on Amazon]
Csaba Szepesvari , 2009 - Big Data, Data Mining, and Machine Learning
[Buy on Amazon]
Jared Dean, 2014 - Modeling With Data
[Buy on Amazon]
Ben Klemens, 2008 - KB – Neural Data Mining with Python Sources
[Buy on Amazon]
Roberto Bello, 2013 - Deep Learning
Yoshua Bengio, Ian J. Goodfellow, & Aaron Courville, 2015 - Neural Networks and Deep Learning
Michael Nielsen, 2015 - Data Mining Algorithms In R
Wikibooks, 2014 - Data Mining and Analysis: Fundamental Concepts and Algorithms
[Buy on Amazon]
Mohammed J. Zaki & Wagner Meira Jr., 2014 - Theory and Applications for Advanced Text Mining
Shigeaki Sakurai, 2012
Statistics and Statistical Learning
- Think Stats: Exploratory Data Analysis in Python
[Buy on Amazon]
Allen B. Downey, 2014 - Think Bayes: Bayesian Statistics Made Simple
[Buy on Amazon]
Allen B. Downey, 2012 - The Elements of Statistical Learning: Data Mining, Inference, and Prediction
[Buy on Amazon]
Trevor Hastie, Robert Tibshirani, & Jerome Friedman, 2008 - An Introduction to Statistical Learning with Applications in R
[Buy on Amazon]
Gareth James, Daniela Witten, Trevor Hastie, & Robert Tibshirani, 2013 - A First Course in Design and Analysis of Experiments
[Buy on Amazon]
Gary W. Oehlert, 2010
Data Visualization
- D3 Tips and Tricks
[Buy on Amazon]
Malcolm Maclean, 2015 - Interactive Data Visualization for the Web
[Buy on Amazon]
Scott Murray, 2013
Big Data
- Disruptive Possibilities: How Big Data Changes Everything [Buy on Amazon]
Jeffrey Needham, 2013 - Real-Time Big Data Analytics: Emerging Architecture
[Buy on Amazon]
Mike Barlow, 2013 - Big Data Now: 2012 Edition
[Buy on Amazon]
O’Reilly Media, Inc., 2012
Computer Science Topics
- Natural Language Processing with Python [Buy on Amazon]
Steven Bird, 2009 - Computer Vision [Buy on Amazon]
Richard Szeliski, 2010 - Concise Computer Vision [Buy on Amazon]
Reinhard Klette, 2010 - Artificial Intelligence A Modern Approach, 1st Edition
[Buy on Amazon (3rd Edition)]
Stuart Russell, 1995
Well, there you have it. Thousands of e-pages to read through. We hope there’s something there for everyone, no matter what level you’re starting at. If you have any suggestions of free books to include or want to review a book mentioned, please comment below and let us know!
Data science and machine learning are a complex set of interconnected concepts. To remain abreast of times, you require spending time not only in conducting a lot of research but also revising concepts. Even if you are a thorough professional, you would still want to catch up with the current trends and on knowledge once acquired. Books have always been the best source of information and also staying in touch with the basic concepts even while working. Here is a comprehensive list of vital books for data science that you would always need to refer to despite the plethora of resources available via the internet.
- Understanding Machine Learning: From Theory to Algorithms – By Shai Shalev-Shwartz and Shai Ben-David
Machine learning has become one area of computer science that is growing at a very fast rate and that too with far-fetching applications. This book aims at a principled manner of introducing the concepts of algorithmic paradigms and machine learning. It provides theoretical accounts of the fundamentals of machine learning along with mathematical derivations that aid in transforming these principles into practical algorithms.
After the initial chapters covering the basics, the book includes an entire range of important topics that have not been covered previously by any other textbook. Some of the other critical points covered in the book are:
- The computational intricacy of learning and concepts of stability
- Convexity and important algorithmic paradigms with neural networks
- Stochastic gradient descent
- Structured output learning
- Emerging theoretical concepts, for example, the PAC-Bayes approach
- Compression-based bounds
- Foundations of Data Science By Avrim Blum, John Hopcroft, and Ravindran Kannan
This book introduces the various statistical learning methods and is meant for upper-level undergraduates, students who are aiming a Masters’ degree and those who are pursuing PhD in non-mathematical sciences. The book contains a large number of R labs, extensively detailed descriptions about the implementation of various methods in practical life. It is for these valuable resources that a practising data scientist would find it beneficial.
- A Programmer’s Guide to Data Mining: The Ancient Art of the Numerati – By Ron Zacharski
This book follows a learn-by-doing approach. Passive reading at times becomes less fruitful therefore this book allows the reader to work their way out through experimentation and exercises with the help of the Python code that is provided in the book itself. There are exercises where the reader needs to actively use the programming data mining techniques, allowing them to get a better grasp. The textbook is divided into a series of learning modules that leads from one to the next. When one reaches the end of the book, quite a strong foundation of understanding the data mining methods have been laid.
- Mining of Massive Datasets By Jure Leskovec, Anand Rajaraman and Jeff Ullman
To read and understand this book, one does not require any particular background. It is so designed to serve learners at the undergraduate computer science level. To encourage a deeper understanding of the subject, the chapters are provided with various reading references that one can make use to read and learn further.
- Storytelling With Data: A Data Visualization Guide for Business Professionals by Kole Nussbaumer Knaflic.
This is one of the most important pieces to read for anyone in the data science industry, though the person might not be directly associated with the business or enterprise. Simply speaking, the book deals with the extraction and organization of copious amounts of data. This includes the removal of data that is in excess and having no clarity, improvement of the various data collection procedures and then deducing the most practical, relevant, visualizations of data. Simply put, the book deals with organizations and extraction of vast quantities of data. It is one of the most definitive pieces that tells you what to do with the useful data that has been collected. Many insights are applicable to tech in general and would be beneficial for even those who do not work in this particular sphere.
The above-mentioned data science books are good pieces to read but your career would definitely get a boost with relevant training.
The course provides an introduction to Data Analytics to users and provides a detailed, hands-on training basis real business instances. It is an exposure of the widest array of tools, techniques, case studies related to the business, explained in a clear and lucid manner making understanding easy for all. For those who are aspiring to be a successful Data Scientist, there is enough exposure to databases used for data storage, right from the traditional RDBMS to the latest NoSQL. For complete support and guidance, we have our top-notch faculty who are always open to any queries. Once enrolled, you will pick up skills like SQL, learn about Data Analytics by using SAS, R, Python and Excel. To compliment all of this, you would be taught Tableau as well to master the art of data visualization. The course has been designed in a comprehensive manner at the end of which you would be suitably equipped to enter into the field of Data Science.
For a perfect hold on to the subject, you would need to brush up on the fundamentals as well. Here are a few more books that you should read along with the 5 vital books on data science. The following list would hold as data science books for beginners also.
Python data science books
Mastering Python for Data Science is written by Samir Madhavan. It introduces data structures in Numpy & Pandas and how to import data into these structures. You will learn to perform linear algebra in Python and make analysis by using inferential statistics. Later, the book deals with advanced concepts like building a recommendation engine, ensemble modelling, high-end visualization using Python, etc.
Python for Data Analysis, written by W Mckinney, author of Pandas library. It is considered to be one of the most comprehensive books covering the manipulation, cleaning, processing, visualization and data crunching in Python.
Introduction to Machine Learning with Python is written by Sarah Guido and Andreas Muller. It’s for beginners to get started with machine learning, building ML models in python, advanced methods for model evaluation, tuning parameters, text-specific processing techniques, ways of working with text-data, etc.
Best statistics book for data science
Data science and statistics go hand in hand. Therefore books on statistics are equally required for aspiring data scientists.
Introduction to Statistical Learning is a recommended book for practising data scientists, with focus on connecting statistics with machine learning besides laying emphasis on using ML algorithms in real life.
Elements of Statistical Learning is written by Trevor Hastie and Rob Tibshirani. It introduces readers to higher level algorithms like Bagging & Boosting, Neural Networks, Kernel methods etc.
Think Stats written by Alien B Downey and deals with performing statistical analysis in Python. It focuses on understanding statistics in real life by popular case studies. It also deals with Bayesian estimation.
Books for data scientists
R Cookbook is written by Teetor Paul and is a good read because of its several tips and recipes to help students in getting over the daily struggles in manipulation and data pre-processing. It does not contain the theoretical explanation of various concepts but the focus is on how to use these concepts to solve problems. Some of the other topics covered in this book are statistics probability, data pre-processing, time series analysis, etc.
R Graphics Cookbook is written by Winston Chang. Data visualization makes data more interesting and analyses easy. Customizing a table, making it more engaging through the usage of colors, is considered to be a key skill of a data scientist. This book helps one to do this by focusing on building data on R by sample data. It emphasises upon ggplot2 package to understand and manage all visualization activities.
Applied Predictive Modelling is written by Max Kuhn and Kjell Johnson. This book comprises theoretical and practical knowledge by neatly managing the critical topics like over-fitting, linear & non-linear models, trees methods, feature selection, etc. It also demonstrates these algorithms using the caret package. Caret is considered to be one of the most powerful ML packages contributed in the CRAN library.
It is very easy to solve problems by logging on to the internet and getting readable matter. But books are one source that will not give you incorrect information and also enrich your experience by providing more than one viewpoint. There are various perspectives as well that will broaden your horizon. The books mentioned above have been shortlisted basis the content, the variety of case studies and also the examples so that whether you are an established data scientist or a beginner, these books would be useful at times of need. It would also help you in selecting and picking up the next book that you would need for data science.
Post a Comment