• gigachad@feddit.de
    link
    fedilink
    arrow-up
    9
    ·
    edit-2
    8 months ago

    I think it’s a good thing polars developers are heading toward interoperability. The Dataframe Interchange Protocol the article mentions sounds interesting.

    For example, if you read the documentation for Plotly Express

    I know this seems to be an important topic in the community. But honestly, I rarely use all the plotting backends at all. They are nice for quick visualizations, but most of the time I prefer to throw my data into matplotlib on my own, just for the sake of customization.

    polars.DataFrame.to_pandas() by default uses NumPy arrays, so it will have to convert all your data from Arrow to Numpy; this will double your memory usage at least, and take some computation too. If you use Arrow, however, the conversion will take essentially no time and no extra memory is needed (“zero copy”)

    I don’t want to complain, it is definitely a good thing polars developers address this. pandas is the standard and as long as full interoperability between polars and the pandas ecosystem is lacking, this “hack” is needed. However, data transformation can be an incredibly sensitive topic. I do not even trust pandas or tensorflow in always doing the right thing when converting data - processing data in polars, converting it to pandas and then process it further - I am sceptical. And I am not even talking about performance here.

    If you’re doing heavy geographical work, there will likely someday be a replacement for GeoPandas, but for now you probably going to spend a lot of time using Pandas

    This is important. Geopandas is one of the most import libraries derived from pandas and widely used in the geoscience community. The idea of an equivalent like “geopolars” is insane in my eyes. I am biased as a data scientist mostly working on spatial data, but this is the main reason that I watch the development of polars only from the sidelines. Even if I wouldn’t work with geographic data, GeoAI is such an important topic you can’t just ignore it. And that’s only the perspective from my field, who knows what other important communities are out there that rely on pandas.