Python
From Java to Python
Several years ago, while I was primarily focused on Java development, our team needed a tool for data manipulation and analysis. After evaluating our options, the Python library Pandas emerged as the clear winner. Pandas is practically a world of its own — learning it arguably requires more effort than learning Python itself. I ended up using Pandas extensively and even contributed to the project Pandas (v0.21.0) and pandas-datareader (v0.5.0). You can find more details about these contributions here
Before this, I had used Python occasionally, but my knowledge of the language was fairly superficial. That didn’t stop me from contributing to a Python-based open source project many years ago. My contribution focused on interoperability with a C++ library, so it was arguably more C++ than Python.
Over time, my exposure to Python deepened. I’ve worked with various Python modules, packages, and tools — including multithreading and concurrency. My first Python multi-threaded program run slower than the single-threaded one; it was before Python 3.13. Coming from Java, this was unexpected.
It turned out the Python interpreter uses a Global Interpreter Lock (GIL), and the reason for this is that the Python interpreter is not thread safe(in CPython pre 3.13 versions). You need an I/O bound program to see the benefits of multithreading in Python.
In our case, the solution was to use multiprocessing, which scales well for distributed processing. Another solution would be to do computationally intensive tasks in C++/C and use Cython.
As some of our tasks turned out to be I/O bound we added the task execution section to the system configuration. The user can configure the task execution engine, like the number of processes or/and threads, and to run experiments to find out the optimal number of processes/thread. Initially, we built a custom solution using Python’s standard library, and later added a Redis and RQ. We’re also evaluating distributed frameworks like Spark and Dask.
We also use Python as a portable scripting language, with scripts running on both Linux and Windows.
Transitioning from Python 2 to Python 3
The migration from Python 2 to Python 3 was a rather big project. Our first step was to ensure the codebase was compatible with both versions, eliminating the need to maintain two separate codebases. This process also gave us the opportunity to review, clean, and refactor the code-especially sections that had been written when we were new to Python.
The transition was made smoother because we had been preparing for it, and all new code was written with Python 3 compatibility in mind whenever possible.
More detailed description of our transition experience deserves its own entry. Here I just want to mention one side effect of the transition: code readability. After large-scale code conversion, especially when using automated tools, the resulting code is often less readable. Code readability is, after all, one of the main reasons Python has become so popular.
A significant part of the conversion was an intermediate step, making the code compatible with both Python 2 and Python 3. This step introduced some extra code that is no longer necessary. We are very happy with the results and plan to clean up the remaining transitional code when time permits.
What's Next
In future posts, I’ll describe examples of different Python features with examples from my projects. A significant portion of my Python development involves data processing, web development, and scripting. Some of my data processing projects might require a separate entry, like, extracting financial data from EDGAR regulatory filings (XBRL format, unstructured text)
First Text Analysis Python Project was my first unstructured text analysis project.
Some samples of Python code are on GitHub Gist
This program calculates projections for when hedge fund investors will receive their investments over time, with most calculations performed using Pandas.
The program reads data from an Excel file containing at least two worksheets: Liquidity Terms and Tranche Investments.
The program includes three scripts that generate reports and visualizations based on the data.
The emphasis is on the most common hedge fund withdrawal restrictions.
A more detailed description is in the HedgeFundsRedemption.md file.
This is a fork of jckantor's Python dateutil rule sets for NYSE trading days and holiday observances. The original rules are valid from the present onward. However, for backtesting or pattern recognition, there is often a need to access NYSE trading days from the past several years. The rules have been modified to provide NYSE trading days and holiday observances starting from 1986.
This website was created using Pelican, a static site generator written in Python.
Here are some of the Python libraries and tools that I use:
Pandas, NumPy, Beautiful Soup, Requests, Selenium, Cython, SciPy, NLTK, Matplotlib, pytest, unittest, Django, Flask, Redis, RQ, Pelican, reStructuredText, Sphinx.