GitHub

First Text Analysis Python Project was my first unstructured text analysis project.

GitHub Gist


My open-source contributions were in different languages. Tools and languages that were used are listed at the end of each contribution description.


              Pandas and pandas-datareader
While using Pandas in a stock screening application, encountered problems that were present only for certain data values. The first solution was a workaround. It worked, but the solution was rather cumbersome and ugly. And what about existing code that might work with some data but not with other. Modifying Pandas code looked like a better solution. The changes actually affected two projects: Pandas and pandas-datareader, as pandas-datareader is now a stand alone project.
Below are the changes:
Made improvements to missing values handling, testing, documentation.
A signature-preserving decorator for Python 2.
Some changes were an API changes, which also included changes to what's new and documentation.
Provided a workaround for one of the issues as a temporary solution while the issue is being resolved, participated in discussions.
Python, Cython, pytest, Sphinx, reStructuredText.



    pandas-datareader pull requests:
https://github.com/pydata/pandas-datareader/pull/364 (merged from a command line and GitHub GUI incorrectly displays it as not merged but correctly shows it as merged in the source code https://github.com/pydata/pandas-datareader/commit/6cce5f18d52be802c7245c8a28d534236a9e2b24 )


              Arelle
    Arelle pull requests


              geWorkbench
At the time I was at The Center for Computational Biology and Bioinformatics(C2B2), Columbia University.
Most of my work was on geWorkbench, a Java-based open-source desktop application for integrated genomics. While on geWorkbench I was involved in new features, design, enhancements, and bug fixes.
There was some server side work too, mostly on a grid based application CaArray, which was an open-source web and programmatically accessible array data management system. Part of my CaArray responsibilities included installation and administration.
Some of the tasks required advanced calculus and mathematical modeling.
Java, C++/C, Swing, JBoss, Tomcat, Ant.


                GitHub Gist has Python and Java code.
The program reads data from an Excel file, which should have at least two worksheets: one for the funds liquidity terms and the other for the tranche investments. There are three scripts to produce reports and graphs. The focus of the program is on the most common hedge funds withdrawal restrictions. A more detailed description is in the HedgeFundsRedemption.md file.
This is a fork from jckantor for Python dateutil rule sets for NYSE trading days and holiday observances. The original rules are valid for time from now on. Sometimes for backtesting or pattern recognition there is a need to have NYSE trading days for the past several years. The rules were modified to produce NYSE trading days and holiday observances from 1986 and on.
The dataset is in a CSV (comma separated) file. The program reads the dataset into a dataframe and performs some operation on the dataframe. The program can be used as a library or from a command line. The user can test the code from a command line using simple language to define the operations. The purpose of this exercise is to demonstrate that in the absence of the library like Pandas in Java for any advanced data processing job you are better off using Python and Pandas, even if it involves learning a new language. However, for a Java programmer who doesn't know Python and needs relatively simple dataset column operations, it might make sense to use something like this. README file goes into more details about implementation and various limitations even for this tiny subset of Pandas functionality.
    Other contributions
Submitted a bug fix for integration with aspell, spellchecking C++ library. Several years ago Leo switched from aspell to PyEnchant.
Python, C++.

Submitted code patches to Apache Axis to improve AxisFault logging and provided a workaround.
Java.

Filed bug reports against Java Swing, JBoss, and several others Java tools and libraries.


Comments

comments powered by Disqus