Element-wise mathematical operators and iterator slides

I recently did engage in a quite elaborate discussion on the julia-stats mailing list about mathematical operators for DataFrames in Julia. Although I still do not agree with all of the arguments that were stated (at least not yet), I did get a very comforting feeling about the lively and engaged Julia community once again. Even one of the most active and busiest community members, John Myles White, did take the time to elaborately explain his point of view in the discussion – and this just might be the even higher good to me. Different opinions will always be part of any community. But it is the transparency of the discussions that tell you how strong a community is.

Still, however, mathematical operators are important to me, as I am quite frequently working with strictly real numeric data: no Strings, and no columns of categorical IDs. Given Julia’s expressive language, it would be quite easy to implement any desired mathematical operators for DataFrames on my own. However, I decided to follow what seems to be the consensus of the DataFrame developers, and hence refrain from any individual deviations in this direction. Alternatively, I decided to simply relate any element-wise operators of multi-column DataFrames to DataArray arithmetic, which allow most mathematical operators for individual columns. Viewed from this perspective, element-wise DataFrame operators are nothing else than operators that are successively applied to individual columns of a DataFrame, which are DataArrays.

As a consequence of this, I had to deepen my understanding of iterators, comprehensions and functions like vcat, map and reduce. For future reference, I did sum up my insights in a slide deck, which anybody who is interested could find here, or as part of my IJulia notebook collection here.

For those of you who are using the TimeData package, the current road-map regarding mathematical operators will be the following: any types that are constrained to numeric values only (including the extension to NA values) will carry on providing mathematical operators. These operators do perform some minimal checks upfront, in order to minimize risk of meaningless applications (for example, only adding up columns with equal names, equal dates,…). Furthermore, for any type that allows values other than numeric data these mathematical operators will not be defined. Hence, anybody in need of element-wise arithmetic for numeric data could easily make use of either Timematr or Timenum types (even if you do not need any time index). If you do, however, make sure to not mix up real numeric data and categorical data: applying mathematical operators or statistical functions like mean to something like customer IDs most likely will lead to meaningless results.

Advertisements

Posted on 2014/08/06, in Julia and tagged , , . Bookmark the permalink. 2 Comments.

  1. Thanks for these slides, lots of very helpful stuff compiled in one place.

    One thing that might be nice to know is that you can spread a tuple into multiple named variables inside a comprehension, which can make for slightly clearer access to the contents of a column in a DataFrame. So you can do something like

    [col for (name, col) in eachcol(df)]

    instead of

    [col[2] for col in eachcol(df)]

  2. I didn’t know that yet – thanks a lot for pointing it out!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: