R, Tidyverse, and their beauty

A good developer and a good statistician both know how to use different tools that are the most suited for the job at hand.

Blogposts

In the interest of clarity, I'm going to be up front about my biases: I adore R. Don't get me wrong, I also full-heartedly support Python and all of its beautiful packages and features that comes with it being developed by a developer. I'd also argue that many times, when you wish to write a program that is object oriented, functional, and clean program that also fits with regular programming principles (interconnected multi-file projects comes to mind), Python is a fantastic choice. In fact, admittedly in my line of work today, I use Python approximately 90 percent of the time. However, today I'd like to adress the hate that R receives and why I've come to embrace it.

R isn't a language designed to make applications, it's not a language that was designed to make lengthy programs, nor was it designed with things like programming efficiency in mind.

R is a programming language designed by statisticians, for statisticians.

Yes, you can create really cool interactive data visualizations. Yes, you can make fully functioning web applications. But you can't do it as cleanly nor as efficiently as other languages can. And when you do, the final product likely isn't going to look as suave as products from other languages (not to mention likely slower). This is starting to sound like an R hate-post. So - why use R.

R is hands-down, the single best programming language when it comes down to collecting, organizing, transforming, and gaining insights from tabular data. It is so core to what R is as a tool that it isn't even a package that you import. You download R (and hopefully RStudio if you are a sane person and wish to remain that way) and you can get started right away. You don't even need to import a dataset! There are already some simple ones that you can take and just run with.

But then there are the packages -the most beautiful thing about R. CRAN, an open-source repository for R packages is quite possibly the most robust and plentiful resource for different libraries for data analysis. Not only that, the population that most heavily utilizes R, academic scholars, R&D researchers, and applied statisticians, frequently upload bleeding-edge tools that are available to use with great documentation on both practical use, assumptions made, and theoratical optimality. Not only that, with the introduction of the Tidyverse set of packages and its adoption by many data scitentists, data wrangling is easier to apply, clear to communicate, and simple to understand with it's pipeline-style formatting. There is no need to expend much effort learning the syntax or create a lengthy data query.

Finally, the defined scope and application or R makes it so that there is a tight knit community collaborating and sharing their work both online and physically. There are many communities that now host friendly R competitions, workshops to help beginners, and networking events to familiarize with others in the field. Online, look no further than the data science community on Twitter, maybe get your hands wet with some EDA on weekly Tidy Tuesday Challenge, and get to know others.

At the end of the day, I'm stuck arguing the same thing that was said in the title, however ineloquently. R isn't better than Python, it does similiar things with a different goal in mind. A good developer and a good statitician should both know how to use different tools that are suited for the job at hand. I've detailed some of the things that I feel truly differentiates R from other languages like Python. I may be wrong, even more so I may yet still be uninformed. I believe that serious effort in mastering R comes at a great benefit - no time can really be considered wasted if it was spent learning R. Take what I said to consideration when deciding a language of your choosing.