Open Source

GitHub

First Text Analysis Python Project was my first unstructured text analysis project.

GitHub Gist


My open-source contributions were in different languages. Tools and languages that were used are listed at the end of each contribution description.


              Pandas and pandas-datareader
While using Pandas in a stock screening application, encountered problems that were present only for certain data values. The first solution was a workaround. It worked, but the solution was rather cumbersome and ugly. And what about existing code that might work with some data but not with other. Modifying Pandas code looked like a better solution. The changes actually affected two projects: Pandas and pandas-datareader, as pandas-datareader is now a stand alone project.
Below are the changes:
Made improvements to missing values handling, testing, documentation.
A signature-preserving decorator for Python 2.
Some changes were an API changes, which also included changes to what's new and documentation.
Provided a workaround for one of the issues as a temporary solution while the issue is being resolved, participated in discussions.
Python, Cython, pytest, Sphinx, reStructuredText.



    pandas-datareader pull requests:
https://github.com/pydata/pandas-datareader/pull/364 (merged from a command line and GitHub GUI incorrectly displays it as not merged but correctly shows it as merged in the source code https://github.com/pydata/pandas-datareader/commit/6cce5f18d52be802c7245c8a28d534236a9e2b24 )


              Arelle
    Arelle pull requests


              geWorkbench
At the time I was at The Center for Computational Biology and Bioinformatics(C2B2), Columbia University.
Most of my work was on geWorkbench, a Java-based open-source desktop application for integrated genomics. While on geWorkbench I was involved in new features, design, enhancements, and bug fixes.
There was some server side work too, mostly on a grid based application CaArray, which was an open-source web and programmatically accessible array data management system. Part of my CaArray responsibilities included installation and administration.
Some of the tasks required advanced calculus and mathematical modeling.
Java, C++/C, Swing, JBoss, Tomcat, Ant.


                GitHub Gist has Python and Java code.

This program calculates projections for when hedge fund investors will receive their investments over time, with most calculations performed using Pandas.

The program reads data from an Excel file containing at least two worksheets: Liquidity Terms and Tranche Investments.

The program includes three scripts that generate reports and visualizations based on the data.

The emphasis is on the most common hedge fund withdrawal restrictions.

A more detailed description is in the HedgeFundsRedemption.md file.

This is a fork of jckantor's Python dateutil rule sets for NYSE trading days and holiday observances. The original rules are valid from the present onward. However, for backtesting or pattern recognition, there is often a need to access NYSE trading days from the past several years. The rules have been modified to provide NYSE trading days and holiday observances starting from 1986.

This Java program performs basic operations on datasets stored in CSV (comma-separated) files. It reads the dataset into a dataframe to perform various operations.

The program can be used as a library or directly from the command line. Users can define operations using a simple language when running from the command line.

The main purpose of this project is to illustrate that in Java, the absence of a comprehensive library like Pandas makes advanced data processing quite time-consuming. In many cases, you may find it more efficient to use Python and Pandas, even if it requires learning a new language.

That said, if you are a Java developer who doesn't know Python and only needs to perform relatively simple column-based dataset operations, this tool could be a practical option.

For more details, please refer to the project’s README file.

    Other contributions
Submitted a bug fix for integration with aspell, spellchecking C++ library. Several years ago Leo switched from aspell to PyEnchant.
Python, C++.

Submitted code patches to Apache Axis to improve AxisFault logging and provided a workaround.
Java.

Filed bug reports against Java Swing, JBoss, and several others Java tools and libraries.

View comments.

more ...

Ant Script to Update/Install Eclipse


This Ant script, originally written several years ago and previously hosted on this site, is now available on GitHub.

At the time the script was created, Eclipse was still relatively new. While it offered extensibility, many essential features were left to plugin developers. As a result, Eclipse users often found themselves waiting anxiously for new builds or updated versions of key plugins.

However, updating Eclipse — or its plugins — was anything but straightforward. There were many undocumented or poorly documented rules, and plugin packaging conventions varied widely. Manual installation was common, and sometimes updates introduced compatibility issues that could render a workspace unusable. The script addressed all these issues and was easily customizable; sometimes you just need a command line tool.

It's hard to estimate how widely the script was used, but there were signs of the script usage: user emails with questions, feature requests, suggestions, and code contributions. The script was reviewed in several blogs and newsgroups; and, for a time, it ranked at the top of Google search results for "Ant script." At the time Apache Ant itself was gaining popularity as a build tool - an alternative to UNIX make utility - and this script extended Ant’s utility.

The following text was written alongside the script’s initial release. While some statements may now seem dated, they’ve been left untouched to preserve the context and spirit of the time:

Eclipse is a great IDE. It is relatively new and the speed with which Eclipse team introduces new features is amazing. However, my first update to a new Eclipse build was rather time consuming; and from reading news I realized that other developers were struggling with updates too. The script simplified updates and, over time, evolved to incorporate some other related activities. All the documentation is in the Readme file. One chapter in the Readme file, named What is the Right Way to Update?, is different from the other chapters as this chapter is not about how to use the script but rather why and when to use it.

You can download a zip file from this site or just browse Readme file online. Latest changes are in Release Notes. It is worth noting that Eclipse comes with the Update Manager but it seems to be for major releases and not for builds. The primary audience for this script is developers who update Eclipse quite often.

Read What Others Have Said:

http://youarenumber6.blogspot.com/2004/08/departmental-eclipse-with-ant.html

http://www.jroller.com/page/dorodok/20030106


View comments.

more ...