13
$\begingroup$

Some background, I am not a developer at all and until now all my scripts are in Octave (open source version of Matlab). However it seems that Python is the way to go.

As I am not a developer, and I do not like change for the sake of change if the end result is the same, I'd genuinely like to know why everything seems to be written in Python now? What makes Python "better" than Octave, Julia etc.?

Is it scalability, the availability of more packages, or is it (unpopular idea) a "self-fulfilling prophecy" that has therefore gained a momentum of its own? Other reasons?

$\endgroup$
8
  • 2
    $\begingroup$ No idea, I even prefer R over Python. Controlling the data seems much more intuitive in the former. $\endgroup$
    – KaiSqDist
    Commented Jul 4 at 13:01
  • 1
    $\begingroup$ I don't think this question is opinion based btw. In fact that's precisely the question: surely there must be metrics which makes Python the language of choice. Right? $\endgroup$
    – Frido
    Commented Jul 4 at 13:02
  • 1
    $\begingroup$ Not a Matlab or Julia expert, but the advantages that I see in Python are: 1) Vectorization (whether it's Numpy, some Pandas operations or even other packages that have optimized array methods that do not require the coder to utilize loops) 2) Quite easy to learn. $\endgroup$ Commented Jul 4 at 14:27
  • 3
    $\begingroup$ @JanStuller I'm not an expert on Python but Matlab/Octave is actually optimized for matrix and array operations (vectorization). $\endgroup$
    – Frido
    Commented Jul 4 at 14:36
  • 2
    $\begingroup$ Mainly a network effect but, to be fair, there are some good reasons for that. One being that Python has one of the most (if not the most) friendly and helpful communities of any major programming language. It's also designed to work well with libraries written in other languages. It's a really good control language with a (mostly) clean minimal syntax that belies its sophisticated design. $\endgroup$
    – JimmyJames
    Commented Jul 5 at 21:36

4 Answers 4

15
$\begingroup$

Corporate perspective:

Although I am not an expert on the technical differences between Python, Julia, R, and Matlab, I can provide some corporate "observations".

At my company, nearly all back-end developers possess knowledge of Python and can easily understand its syntax, whereas, few are familiar with Matlab or Julia (although some have experience with R). This makes it significantly more efficient for the entire business when front-office algorithms are written in Python, as back-end developers can seamlessly translate them into optimized C++ or C# code for the production pipeline.

Our back-end developers work in sprints as part of the SCRUM methodology, with each task allocated a specific number of days based on its complexity. If an algorithm is written in Julia or R, additional time and effort are required for back-end developers to understand the code, leading to potential delays and frustration. This might also translate into more meetings with the front-office trader who developed the algorithm, ultimately costing the company money due to inefficiency.

With regards to the discussion of using Julia (or Matlab) over Python for performance: for any algorithm where performance is of importance, we employ a team of experienced C++ developers. As C++ is renowned for its speed and has been a popular choice for high-performance applications for a long time, it also results in a larger job market of skilled C++ developers.


To summarize:

  1. Popularity and Simple Syntax: Python's popularity and simple syntax make it accessible to nearly all programmers within the business. It further facilitates smoother integration into a production-based system. Moreover, due to the simple syntax, it also makes for an excellent scripting language.

  2. Larger Job Market of Skilled Developers: There is a significantly larger pool of developers with a strong understanding of Python compared to Julia, R, or Matlab. This makes recruitment and replacement easier and more efficient.

  3. Reduced Corporate Costs: The above factors contribute to lower corporate costs. Python's popularity and simplicity enhance efficiency, while the larger job market ensures that finding or replacing developers is more straightforward and cost-effective. The same can be said about C++ developers vs any other high-performance language.

In summary, unless a new programming language emerges that matches Python's popularity and offers additional benefits, Python will likely remain the preferred choice in the long run.

$\endgroup$
11
  • 2
    $\begingroup$ +1. It could be a chicken and egg thing. I think I agree with you regarding popularity (which triggered my question) larger pool of skilled developers and reduced corporate costs, and they all have a kind of feedback effect on each other. Not sure I agree regarding simple syntax though. From what little I've seen Matlab/Octave is not that much easier or harder than Python $\endgroup$
    – Frido
    Commented Jul 4 at 13:08
  • 3
    $\begingroup$ Regarding the syntax: as Python is a general-purpose language I find that the syntax correlates better with other programming languages contrary to "statistical" programming languages such as Matlab or R. Hence, a developer having experience with eg. Javascript or C would have less issues with understanding a Python script vs a Matlab script. $\endgroup$
    – Pleb
    Commented Jul 4 at 13:28
  • 2
    $\begingroup$ Thank you @Pleb for the interesting answer. It seems like the development of the Python language was actually made with the syntax of older programming languages in mind such as JavaScript or C. $\endgroup$
    – KaiSqDist
    Commented Jul 5 at 7:43
  • 2
    $\begingroup$ @Pleb Sorry for not being specific. I was referring to this part in particular: "with each task allocated a specific number of days". $\endgroup$ Commented Jul 5 at 18:15
  • 2
    $\begingroup$ @KaiSqDist Python is a reasonably old language: it goes back to the late 1980s. I wonder how many Python developers are under 34 years old, as these folk would be younger than Python. $\endgroup$ Commented Jul 6 at 0:04
8
$\begingroup$

The claim that everything in finance is written in Python is a vast oversimplification. My experience is largely on the sell-side. However, @Pleb seems to agree with my opinion on the buy-side as well.

Low-Level Performance is crucial in real world infrastructure. That's why C/C++ reigns supreme for core functionalities in firms due to its raw speed. Julia may be quick, but its still young and a niche language. It's a lot faster to deploy and easier to write, but if you need a larger pool of experienced programmers, C++ offers a lot more choices.

Bloomberg:

  • most is written in C/C++ and some legacy Fortran

C++ is central to how we work at Bloomberg.

  • the API is written in C++

  • The derivatives pricing engine DLIB is written in DLIB, which is OCMAL based.

  • The GUI is written in Javascript

  • Some stuff is apparently written in Perl

  • Python seems to have increased in the last decade, but I do think the statement above says it all. Also, even the Python API only works when bundled with the required C++ API.

Blackrock:

  • Mostly C++ source
  • Alladin was written originally in C++, Java and Perl and has now been updated to Julia.

Jane Street:

  • strong believer in OCAML

Voladynamics:

  • uses C++

... should be packaged into an efficient and easy-to-use C++ library (with higher-level language wrappers on top).

OneSumX from Wolters Kluwer:

  • is written in Java, and any custom function you write needs to be in Java.

Quantlib:

  • written in C++




Overall, I do not think the language of choice depends mostly on which one is better. There may be specific topics where one is more suitable than another but that is usually an exception and most casual programmers and users never reach that level anyways. The largest user base of Python users in finance are not developers but guys like me, who are not good at programming but need something that is quick to write and useful. Its also a big plus that Python is also accepted by developers.

Generally, I think you mostly need to use what your boss tells you to use. So far I had the "privilege" to have to use (in no particular order) Python, Java, VBA, Julia, Matlab/Octave, OCAML, BLAN, SQL (DB2 and TSQL, Microsoft SQL Server and Oracle), R, Stata, EViews, SPSS, C, C++ and Javascript for work. Though not programming languages I also needed HTML, CSS, Mathjax and LaTeX.

That said, what your boss tells you to use is similar to actual languages (e.g. all questions must be written in English on Stack Overflow), or social media and the like. A languages success is driven by direct positive externalities, meaning a languages value improves as more users engage and participate with one another. That's why everyone uses WhatsApp.

The community behind Python is honestly second to none. I frequently find myself just blindly typing a question into Google before thinking (e.g. how to define the font in Matplotlib) because SO almost always has the solution readily available anyways. If you look at places like Quora, Reddit and the like you will always find people promoting Python.

Let's look at some data:

  • The most loved programming languages in general are supposedly Rust, Elixir, Clojur, Typescript and Julia, according to tag trends on SO.
  • Matlab is the least admired language and only 20% of developers who used this language want to use it again next year according to a SO survey
  • Julia, according to the same survey is only desired by 2.5%, but admired by 62,77%. It's no secret that I love to use Julia, as most of my answers here utilize Julia. However, I cannot use it much at work because no one else I work with at the moment has ever used it. Forcing people tp use it would just slow down work significantly for some time. Also, it is hard to hire new people with Julia expertise.
  • R is only admired by ~ 39%.

I only used Matlab / Octave before I started my career in finance (which is a long time ago). My sister uses it in the automative industry, because the simulink offerings are apparently great in this context. It's mostly network effects again though, with people using it because others use it. I personally never came across someone using Matlab in finance.

It took me 10 years to see a place that uses R. This place used R for some work because the guy building the quant department had a statistics background. They mostly decided to switch to Python (for the stuff they did in R) because it was hard to find people to apply when R was mentioned. Now, whenever some R code needs to be modified, there are only 2 or 3 people left that are willing to do it (and two studied statistics, the other math).

Some more or less plausible arguments against Matlab and R:

  • Some programmers might say using 1 indexing is why R and Matlab do not work (it's the same argument for Julia).

  • Neither LSEG (Refinitiv) nor Bloomberg offer direct usage of R or Matlab in their APIs. Provided almost all institutions use either LSEG, Bloomberg, or both, you have a massive problem trying to promote R or Matlab at work.

  • R is licensed with GPL, as are most packages. As docs.python states,

All Python licenses, unlike the GPL, let you distribute a modified version without making your changes open source.

This can actually make a difference in a corporate setting, as explained in this R-bloggers comment.

  • R syntax is for most, if not all programmers quirky.

Examples where programmers scratch their head (I am by no means a programmer, or a sophisticated R user, it's just stuff that bothers me and I have heard others complain about as well):

  • In almost all languages, the assigment operator is =, in R it's primarily <- (I think technically there are 5 assigment options in R).

  • x = "Hello" + "World" in Python vs x <- c("Hello", "World") in R makes you wonder whether <- and c() is designed to make you and your code collapse. Julia actually uses * instead of + for string concatenation. This choice is based on mathematics, because + is usually commutative, meaning A + B == B + A for all matrices A and B of the same shape. * is typically noncommutative and A * B != B * A, just like Hello * World!= World * Hello. While being mathematically sound, most languages, including C++, use +, making this still an interesting choice.

  • Why does a for loop leave a variable in your environment in R? You usually never need an iterator outside your loop.

  • It's is trying to make your code work, which is really not useful. How can you column bind vectors of unequal length?

    x <- c(1,2)

    y <- c(3,4,5,6)

     cbind(x,y)
    

or provide you with a NULL in this case

x <- list(firstname = "AK", lastname = "demy")
x$firstName

I'd much rather get an error than unfortunately using this value elsewhere.

$\endgroup$
4
  • 2
    $\begingroup$ As a historical side note, in Python 2, the loop variable is leaked into the environment like you mention for R, but that is not the case in Python 3. $\endgroup$ Commented Jul 5 at 2:46
  • 2
    $\begingroup$ +1. Thanks for the detailed answer. "The claim that everything in finance is written in Python is a vast oversimplification." Yes that may be true. But to put it another way, if I look at the average quant job description, sell side or buy side, Python. $\endgroup$
    – Frido
    Commented Jul 5 at 4:17
  • $\begingroup$ @fyrepenguin what do you mean? Python 3 loop variables are accessible outside the loop too. $\endgroup$
    – justhalf
    Commented 2 days ago
  • 1
    $\begingroup$ @justhalf sorry, more specifically for list comprehensions: stackoverflow.com/a/4199355 $\endgroup$ Commented 2 days ago
7
$\begingroup$

From Quantitative trading: how build your own algorithmic trading business, Ernst Chan, pp 34-39:

MATLAB used to be one of the most common backtesting platforms used by quantitative analysts and traders in large institutions. It has been taken over by Python (which I will describe as follows), but I still find it to be the most productive language for quants (as opposed to professional software developers). It is easier to use than Python, it is faster, and it has full customer support from the vendor. It is ideal for testing strategies that involve a large portfolio of stocks. It has numerous advanced statistical and mathematical modules built in, so traders do not have to reinvent the wheel if their trading algorithms involve some sophisticated but common mathematical concepts. There is also a large number of third-party freeware available for download from the internet, many of them very useful for quantitative trading purposes. Finally, MATLAB is very useful in retrieving financial information from various websites.

PYTHON Python has now taken over MATLAB to become the de facto backtesting language, especially after the numpy and pandas packages became available. With these packages, you can manipulate arrays and time series data just like you do in MATLAB. Python benefits from a large number of third-party packages for specific applications. There is Scikit-learn for machine learning, plotly for interactive data visualization, and seaborn for plotting, just to name a few most commonly used ones. While Python has almost any packages that you need for finance and trading, it is not without flaws.

R I have used R as the language for teaching my Financial Risk Analytics course at Northwestern University’s Master’s in Data Science program, which covered everything from time series analysis to copulas. It is a great language if you want to use classical statistical and econometric analyses for your trading (and there is nothing wrong with that!). That is because many academic statisticians and econometricians have implemented their algorithms in R. There aren’t as many implementations of machine-learning algorithms as Python or MATLAB. However, I recommend you do not use machine learning if you are creating a new trading strategy, and use ML only for improving your strategy (for reasons explained in Chapter 2 and in more depth at predictnow.ai/finml). Hence, R is a good language for this initial step of strategy exploration, though not as good as MATLAB since its IDE (RStudio) is also free, and, dare I say, primitive. Naturally, like Python, it also comes with zero support.

$\endgroup$
4
  • 1
    $\begingroup$ Exactly my thoughts (re Matlab). Octave is its open source clone and has less support obviously, but a great set of packages and the same functionality. I'm still a bit perplexed why it's under-utilized. $\endgroup$
    – Frido
    Commented Jul 4 at 14:26
  • 4
    $\begingroup$ I fully agree, I would just add that one of the strength of python is that it interacts very naturally with software environment, that is crucial for MLops or devOps. It is probably one of the reasons of is success. $\endgroup$
    – lehalle
    Commented Jul 4 at 16:45
  • $\begingroup$ Exactly this. The "not an island" factor + that it nearly is as user-friendly and feature-rich as purpose-built languages just makes it an outright win. Then add some years and you end up where we're now. $\endgroup$
    – ojdo
    Commented Jul 5 at 9:07
  • 1
    $\begingroup$ I have added citation, please check above. $\endgroup$
    – Sane
    Commented yesterday
2
$\begingroup$

People write software in Python because there are already a lot of libraries that will help them. Later they may turn their software into a library, so there is now at least one more library out there. IMHO Python is an OK language which has built up critical mass.

If you are doing quantitative finance, I expect that you might want to do machine learning. There are several big Python libraries out there, including PyTorch and TensorFlow, which you can use from Python, and which allow you to speed up processing by running code on the GPU, without learning how to write GPU code yourself. Even if I'm not using PyTorch I'd be lost without numpy and Matplotlib.

New contributor
Simon Crase is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.