Programmer Hadley Wickham praises the diversity of the R community

His packages are used by The New York Times, the FDA, Facebook and other companies to organize and display data in a neat way.

The most requested programming languages ​​and developer tasks around the world
These coding languages ​​are the most requested among companies, according to a hired report.

Hadley Wickham is part of a growing movement of statisticians and data scientists who preach the evangelization of R as a handy, easy-to-use tool for data analysis.

Wickham is the brain behind the popular dplyr package, making it easier to manipulate data. He has developed or co-developed others, including tibble, ggplot2, glue and pillar.

Many are widely used by companies such as The New York Times, Facebook and Google. His fans have even dubbed his creations.

SEE: Six popular programming languages: get started (free PDF) (TechRepublic Premium)

“Initially it was a language used primarily by statisticians, so the assumption was that people who used R had a PhD in statistics,” Wickham said. “With the rise of data science, the popularity of R has increased enormously. Many people with many different backgrounds and many different domains now use it to find out what is going on with your data.”

“The thing that really drew me to R was that flexibility and power that gives you to really struggle with your data and ask questions and to find out what happens in a very fluid and interactive way,” he added to.

The programming is in Wickham’s blood because his father and sister have Ph.D.s in the statistics. He started using the R language 15 years ago when he was a student at the University of Auckland, where R was founded in 1993 by statisticians Ross Ihaka and Robert Gentleman.

Wickham is now the senior scientist at RStudio and serves as an assistant professor of statistics at the University of Auckland, Stanford University and Rice University. His work with R has made him a sort of celebrity in data science, with many of his fans engulfing forums with gratitude for his packages.

His tools have simplified the somewhat mysterious code needed to handle issues such as data aggregation and plotting. This has made R applicable to almost every industry that needs a way to organize data.

Wickham said he was honored to see people at government agencies such as the Food and Drug Administration and companies like FiveThirtyEight and Twitter used his packages. He emphasized the acceptance of R by pharmaceutical companies, who use it to design and analyze the results of clinical trials and other parts of the drug discovery pipeline.

“A group of people in the financial world use it, as well as insurance and the academic world. If you are involved in a discipline that collects data, it works. It becomes more popular in economics and many biologists and ecologists use it. It is useful for people who don’t have a traditional quantitative background, but now have to struggle with data. Journalists are a good example, “he said.

“Part of it is that it was designed by statisticians. The heart of the language is specifically designed for the types of problems you encounter when analyzing data.”

Wickham, from Hamilton, New Zealand, has been working on databases since he was 15 and develops Microsoft Access databases.

His ggplot2 package – one of the most popular – has been downloaded by millions of people who believe it helps manage data visualization. The purpose of so many of its packages is to remove the hard part and make it easier for more people to access tools that simplify their data.

His goal for the future is to continue the expansion of R around the world to diversify the pool of people who use it. A disadvantage, he said, is that it can be difficult to use R without speaking English.

Groups now translate some of his books about R into Spanish and other languages, so that more people can gain more insight into the book.

“One of the things I am interested in is making sure that anyone who wants to use R can use R. I went to the Latin-R conference in Chile and I wondered,” How can we help people whose first don’t use English R? “” He said.

“So a community in Latin America recently translated my book” R for Data Science “into Spanish and one of the great things they did is that they also translated some of the datasets, so the names of the datasets and the Variable names are also in Spanish. ”

He hopes that there can be more interaction and exploration between R and some other competing languages ​​such as SQL and Python. According to him, the idea should be to simplify it so that everyone can use these tools for all types of data. He joked that he even scraped data from his yoga class website and could play with it with R.

There are many people who are not programmers, statisticians, or mathematicians, but who are forced to handle data.

“How can we help those people learn R through a combination of better tools that are easier to understand and easier to learn and better teach and better resources,” he said.

The fairly recent popularity of the R language has made the user base one of the most diverse, with communities around the world and a particularly large community of women who have called themselves R-Ladies.

“What is special about the R community is the R ladies community, which is relatively recent. A whole series of meetups around the world are now focused on women and other gender minorities,” he said.

“It has really influenced the gender diversity of the R community.”

Open Source Weekly Newsletter

You do not want to miss our tips, tutorials and comments about the Linux operating system and open source applications.
Delivered Tuesday

Register today

Also see

Image: monsitj, Getty Images / iStockphoto

Similar Posts

Leave a Reply