Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What is the strategy behind the dizzying pace of product changes in the R/Python space? I heard that RStudio was a great product, yet they have to redo everything again, of course also renaming the company as is also standard practice in the Python "scientific" market.

Is it Jupyter envy? Why is it not possible to keep one good product and stay with it?

I wish MatLab licenses weren't so expensive, at this point I'd just buy one and sit all this churn out.



As a regular R/bash/python user, I think the Posit philosophy is increasingly to break down barriers between the languages. It's no longer an either/or relationship, you can use both. And some packages or features that have been ported to python are just objectively worse in that language, so the want for R is definitely there.

Also, a lot of the Posit team are fully "bilingual", it's not like the old guard of academic R contributors. My impression is they appreciate both languages for what they have to offer.

For me, I'm apathetic to the languages, the only thing I care about is the output.


>ported to python are just objectively worse in that language

This is absolutely the case. Dplyr syntax is much more intuitive for many use cases than Pandas or Polars equivalents.

One thing I miss from RStudio is the Rmarkdown documents with inline outputs. Jupyter notebooks, even in VSCode, are so needlessly over-engineered and under-featured compared to the elegance of RMarkdown. So I am excited to see what Posit can do to bring that experience to python. My git repos will be thankfull anyway.


I don't even know why people need to use dpyler and the tidyverse, in my opinion R is very comfortable for data wrangling and making all kinds of plots out of the box. Its able to handle huge amounts of data as well especially if you adopt a functional programming approach vs object oriented (what I see with a lot of the classic "academic" brittle hardcoded slop that R gets a bad rap for). Very fast if you keep in mind it is a vectorized language and write your scripts with that perspective. Tidyverse seems like a new, unrelated syntax to learn on top of it all, whereas the base graphics packages very much work like the base statistical functions and the base data wrangling and everything else in the base R package.


My feeling is that people with a more mathematical background tend to like developping DSLs that look more like math than code, and is typically written once and then thrown away; whereas people with a more software engineering background tend to prefer code that is more explicit about what it does, and have a better understanding about long term implications for maintenanability/extensibility. Which for me is the summary of the R versus Python debate in general.

One can see that in the JVM world with java vs scala: people attracted to scala tend to like "cute" DSL, java people tend to be more careful with shiny new features. (This is an oversimplification, of course)

Specifically for dplyr: it looks cute and tends to be easier to use in a REPL setting (you can build your pipeline step by step by running your command, looking at the output, get the command from history, add a step, run again; and at the end you get a single line to copy paste in your script). But if you want to wrap it in a function, it tends to create issues.


The base graphics packages make the plots as ugly as the ones generated by gnuplot though. ggplot2 on the other hand has very pretty output. And the concept of grammar of plots just makes so much sense to me.


You can make plots look however you want with base graphics. ggplot2 users mainly use the default settings honestly, you get that classic grey background plot I personally find more ugly than the cleaner white background defaults of the base package.


That's only true IMO of the in-IDE plots, but actual exported PNG or vector graphics I think base R plots are pretty perfect, other than perhaps the default colour palette


Beauty is in the eyes of the beholder. I much prefer the aesthetics of plots made with the lattice package or even base R over ggplot's.


Base graphics are also _massively_ faster than ggplot when data sizes get larger. To the extent that ggplot essentially becomes unusable.


Maybe it is for you, but the success of Dplyr and ggplot suggests a lot of others disagree.


I wonder how much of this is just a feedback loop; were people taught both tools and then chose the one that works best, or was one more heavily promoted than the other, so people went with what was easiest to get started?


Once you are using the tidy paradigm, it lends itself to efficient plotting with ggplot2. Plotting with base R would require reshaping your data. So I think insofar as dplyr becomes a popular default it makes sense ggplot2 would be in lock step


Its definitely a feedback loop. Every time you look up an R question on stackoverflow people give you a ggplot or dpylr answer and usually not a base package implementation. Its almost as bad as Ole Tang spamming gnu parallel on every xargs thread.


Im sure that’s part of it. But you could say the same for using python or R over another language. Besides, someone who knows R well enough to write DplyR thought the situation was dire enough to write it. And there’s also data.table but that is inscrutable to most folks and I have only ever used it for fread - which is 10x faster than any other method of loading csvs into R.


Hardly. Hand holding tools are popular but that doesn't mean they aren't hand holding tools that don't give you any new function you didn't have otherwise. Jupyter notebooks are probably more popular to write than python scripts for new data scientists too, doesn't mean anything though or take away some of the advantages you get writing properly packaged scripts instead of a big old notebook you iterate a pipeline in line by line and figure by figure.


I learned r too long ago so I am pretty fluent writing readable data wrangling code in base R. But I'm a biologist first, in my community I see the value dplyr adds in making it approachable for people who need to do some basic stats but probably will never need to really understand the language or do any development.

It also provides guardrails and encourages best practices which I find a bit to paternalistic and annoying but again I can see the value.

I think most R users would be surprised and just how much tidyverse functionality is hidden in base R but majority of the dplyr versions of functions have at least some intended improvement over the base R versions, and some are a massive improvement in functionality.

For example in a typical script the only tidyverse package I may load besides ggplot2 is tidyr, because the pivot_ wider/longer() functions really do solve a problem that was not fun in base R.


> One thing I miss from RStudio is the Rmarkdown documents with inline outputs

This is already in Quarto! https://quarto.org/docs/computations/inline-code.html#:~:tex....


I can't stand jupyter notebooks for several reasons. I've been using https://quarto.org/ and writing .qmd docs and really enjoy it.


> One thing I miss from RStudio is the Rmarkdown documents with inline outputs

RMarkdown (Rmd) was recently developed into “Quarto” (Qmd), precisely because they now support Python as well. I’ve used it a bit and it’s excellent.


But Rmd already supported python chunks (as well as other languages such as ruby, which seem to be missing from Qmd)


Quarto supports any language and works just fine. I have quarto blog posts for using APL as an example of a somewhat niche language.


I guess you have to add some plugin? I mean in Rmarkdown in Rstudio I just go ```{ruby} and I have a ruby block with nothing special installed. That doesn't work by default with Quarto.


In the Quarto front matter, you can choose to use a Jupyter backend, in which case any Jupyter kernel can be used to interpret code blocks. Many languages, including APL, have Jupyter kernels you can install.


RMarkdown really is great. I used RStudio/RMarkdown for almost all my homeworks, projects, and even papers with no code during my MS program. I now realize that it was Pandoc's Markdown mixed with LaTeX that I appreciated so much for scientific writing, but with RMarkdown you can easily call R and Julia.

I don't remember invoking Python from RMarkdowm (maybe you already could in RStudio but I never did), so this will be a welcome addition in this new Posit program.


You might be interested in what we’re building over at Evidence.dev

It’s basically RMarkdown for SQL


Neat! I have to deal with splunk at work and every time I do I'm annoyed that they decided to create their own query language.


A friend and I have been having a blast with Observable Framework. It seems really well put together- markdown + code blocks. See https://m.youtube.com/watch?v=Urf_bPFyhIk for a short demo


Just curious, what Rmarkdown features do you find lacking in VSCode notebooks? I have the opposite impression, but I am probably missing something because I don't use RStudio as much as VSCode.


The creator of pandas Wes McKinney is on the Posit team and working on this project too. Gives some additional credence to the idea of the convergence of tooling.


One of those is not like the others


> Is it Jupyter envy?

The funny thing is that the R in Jupyter actually stands for R (the language). It's Julia, Python and R. No need for envy.

Of course, RStudio/Posit != R (at least in theory)


I think the R people see the writing on the wall. Python has sucked all the air out of the room, and it is becoming increasingly difficult for them to target the R space exclusively.

If you had no legacy or compliance requirements, are you going to start a new data project in SAS, R, or Python? Where are you going to find the most talent?


I'm not sure what would lead to you believe this. I've worked in the data science/ML space for over a decade now and I see the majority of pure analytics projects started in R, including at big tech companies I've worked at recently.

Of course, ML projects and other things that need to result in production-grade models are almost always done in Python. This is currently the most visible form of "data project" due to all the ML/AI hype, but it is far from the only data work going on.


Im curious about the people who use R in big tech companies that you've worked at. Were the R users the people who had just come out of school and still working using their academic dev environment before weening off?

I always found that was the group who used R - kind of a use what you are used to until it gets out of step with the remaining workflow.

I also would say that the amount of R I see is far less than python.


So, (speaking as someone who started with R and now predominantly writes Python), I think there's a bunch of things going on here.

1. R is 100% better for analytics work and statistical modelling. There's just no contest.

2. Python is much, much better for data getting (APIs/scraping etc) and dealing with non table-like data. Again, there's basically no contest here.

3. Software engineers hate R (in most cases), which means that it's easier to hand over work for production in Python.

This leads to a situation where it looks like most of the prod-level work is being done in Python, but if you look under the covers you'll discover that most prototyping/analysis/exploration is done in R and then ported to Python if it works.

Like, Python is a great language for lots of things, but it's pretty terrible for exploratory DS work (pandas is like the worst features of base R and base Python mashed together in an unholy hybrid).

There's also the fact that all the NN stuff is predominantly Python, so lots of companies believe that they need Python people, which reinforces the stereotype.

And finally, while I love R, Python has more guardrails, and it's harder to make an unmaintainable mess with it (relative to R). Particularly when people use all the various lazy evaluation packages that the tidyverse has used over the past decade (I once maintained a codebase that used all of these in different places, it was not a fun experience).


One of the better comments in this thread, I would only qualify that different levels of ability mediate much of the "how hard is it to make an unmaintainable mess" dimension. Dplyr/tidy code can be pasta, as can pandas, and there is really a whole new level of that given llm generated nonesense edited/tweaked by novices masquerading as seniors.

Apropos this idea of a vs code competitor, I wish they would spend more effort on existing products. I find quarto frustratingly buggy and meanwhile see no reason to move my workflow from vscode to this new thing. Ymmv


> I would only qualify that different levels of ability mediate much of the "how hard is it to make an unmaintainable mess" dimension

Oh definitely, but at least Python's stdlib is relatively consistent, which helps packages be a little more so.

My favourite example is t.test, which is not a t method for the test class, unlike summary.lm which is.

And there's like 4 different styles of function naming in base & stats alone.

Python has problems (for gods sake, why isn't len a method?) but it's a little more consistent.

I used to think that R was responsible for a lot more of the mess than I now do, having seen the same kind of DS code (and I am a DS) written in both Python and R.

And it would be sweet if R had a pytest equivalent, if I never have to write self.assertEqual again, it'll be too soon.


Youre wrong. Python is outpacing R in usage. Every metric you can find proves it. R also has fundamental issues and lacks serious development.


Not to dispute because I have no idea so I'll assume you're correct. But how many metrics did you find and how were they obtained? And how would you know they are representative of all R users?


For whatever it is worth, the TIOBE index lists Python as #1, R at #21.

Python is the first language many people are exposed to today. It has a library and tooling for every use case.

https://www.tiobe.com/tiobe-index/


R has a pretty particular use case though, Python use for statistical programming/data analysis would be an apples to apples comparison. People doing a coding 101 course in Python don't really count against the R user base.


No one is disputing that R has usage in niche arenas.


s/serious/hyped


No. R fundamentally has not really improved in the past ~10 years. Do you know much about how R works?

Also try:

gsub('serious', 'hyped', x)


Maybe because it already does what it intends to do reasonably well? I mean, what do you think needs to be improved?


Here are 14 years of HN discussions/criticisms of R: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

"Does what it intends to do reasonably well" is going to be widely subjective, depending on whether the user's use-case is statistical/life-sciences vs more general purpose coding and relying on many packages; prototyping/experimentation vs production code; whether the user uses base-R, or tidyverse/data.table, etc.

Here are two of those many posts:

* An opinionated view of the Tidyverse “dialect” of the R language (July 5, 2019) https://news.ycombinator.com/item?id=20362626

* The R programming language: The good, the bad, and the ugly (epatters.org, 2018) https://news.ycombinator.com/item?id=35571659 -> https://www.epatters.org/post/r-lang/


If youre unironically asserting that R already does everything well enough Im not going to take you seriously.


Yet you haven't provided any substantial points against it and assume that others will take you seriously...


I'm asking what needs to be improved, in your opinion.

That's a normal follow-up question that you should be able to answer. Otherwise, why are you even commenting?


No you arent, youre clearly asking in a rhetorical way. Re-read your post.

Any criticism brought up you'd dismiss. Heres one: lack of native 64 bit integers.


No, I was just asking out of curiosity. You're overthinking too much.


No, you were clearly asserting that R already does everything needed. Stop trying to gaslight.


Please don't cross into personal attack or name-calling, and please especially avoid this sort of tit-for-tat spat with another user. It's way against the site guidelines and makes for boring reading.

I know it's not always easy to extricate oneself but it's helpful to remember that the only way to 'win' is to stop.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.


You're crazy.


Please don't cross into personal attack or name-calling, and please especially avoid this sort of tit-for-tat spat with another user. It's way against the site guidelines and makes for boring reading.

I know it's not always easy to extricate oneself but it's helpful to remember that the only way to 'win' is to stop.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.


The issue is that even if you peel the hype (which is a fact), python is still far larger.

If you check e.g. the journal of open source software (which does not have much ML/AI bias), most of the papers are python, with an occasional R and julia submission.


I have an impression that most SotA algorithms in many fields that are not deep learning are made available first as R (or even Matlab) package. Obviously they do get ported to python once they receive enough traction.


This has been my impression in social science academia: a lot of ready to run methods are released as R packages.


the problem is life sciences and statisticans are not going to learn Python though. and that's the talent


On the ground however, they learn both. Or I should say if you find someone who is exclusive to one or the other language (which happens), they aren't much of a programmer to begin with. There is a whole food chain essentially in life sciences in academia with computer knowledge. There are those who write the tooling, who are perhaps so abstracted they don't know the underlying biology and vet their tooling based on simulated data and a comparison with existing tooling, toiling in their own castles on tooling that might not ever see a real dataset. There are those who use the tooling to create novel pipelines to analyze data and draw conclusions based on their own or their collaborators literature research or life science perspectives, they might not care if the finding is truly novel or if its merely proving an existing gumption with a tool that hasn't yet been applied to an existing dataset. Then there are also those who run the pipelines created by others within their research group, sometimes others who have long left that given research group, with brittle hardcoded paths and other "DO NOT TOUCH" segments in a massive single 2500 line file that gurgitates some plots from a standard sort of csv file as input.


They are though. I work in computational biology, I would say the majority of work I see now is in python.


that's a much much smaller subset than the vast users of R in academic/research


Students I spoke to were forced to use SAS. They looked at me as if I was from Mars for suggesting alternatives.

For a "big data" project, people will probably use Python (though Google apparently retreats from it except in machine learning).

Why could you not use R and C++ or Java though? For example, Arrow has bindings for both, so the argument that Python is needed to shovel/scrape/steal data becomes less and less valid.


The issue is R is too small compared to Python and the gap is getting bigger. Theyre trying to grow the company, and realistically the only way is to support python.


VS code has been a pillar all along? MATLAB is utter trash on macos, slow and a dated ui. Also why support proprietary languages with predatory marketing tactics


Modern software can't work, that's is and no, I'm serious and have no intention to start a flame. Original desktops was a single OS-environment-framework, witch was "fragile" to a certain extent, but anything have to evolve in a fully integrated environment, this means FAR LESS code FAR LESS deps for anything. This means that's easy for the user to bend the environment to their wishes and for devs to create something "on the shoulder of giants" being with them instead of having something like https://xkcd.com/2347/ modern projects are typically written in Silicon Valley mode, anchoring the project and some deps without any reasoning about they future, scalability, maintainability and so on. Something change, anything on top collapse.

In practice we have nearly ZERO development for desktop apps, simply because modern desktops are still widget based stuff who was sold a "what we need", against the complexity of classic DocUIs, and then we migrate to modern web witch is a bad DocUI, so developing for the desktop is simply a nightmare. To add features we can't much "use the environment" so all apps tend to try doing anything inside evolving toward unmaintainable monsters no one can handle their codebase and at a certain point in time they became a kind of a framework where "features" became "ideas of someone" without a coherent vision or a target "for the application" like Eclipse from an IDE to a platform some use to code, some others to pay taxes (yes, Italian gov. have made Desktop Telematico witch is a custom Eclipse to fill taxes).

As the classic Greenspun's tenth rule we witness the same: we damn need an OS as a single user-programmable application like Emacs or doing ANYTHING is a nightmare and there is no long lasting solution.


Rstudio came out a long time ago at this point. I actually think it feels quite dated personally.

A data science focused version of VS Code with some kind of notebook sounds rather awesome to me.


I agree, Rstudio is really dated. Been using R and Python at work in parallel, I find Rstudio cannot do obvious things that R extension for VS Code can- like stopping at a breakpoint in the middle or knitting an Rmd file that has embedded code.


There is already Rmarkdown and great tutorials to go along with that for the budding R data scientist


Does it still bundle an entire, old version of libclang?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: