
Even though the new PyArrow backend for Pandas is bringing exciting features, it still looks disappointing in terms of speed.
People have been complaining about Pandas’ speed ever since they tried reading their first gigabyte-sized dataset with read_csv
and realized they had to wait for – gasp – five seconds. And yes, I was one of those complainers.
Five seconds might not sound a lot, but when loading the dataset itself takes that much runtime, it usually means subsequent operations will take as long. And since speed is one of the most essential things in quick, dirty data exploration, you can get very frustrated.
For this reason, folks at PyData recently announced the planned release of Pandas 2.0 with the freshly minted PyArrow backend. For those totally unaware, PyArrow, on its own, is a nifty little library designed for high-performance, memory-efficient manipulation of arrays.
People sincerely hope the new backend will bring considerable speed-ups…
…
Continue reading this article at;
https://towardsdatascience.com/measuring-the-speed-of-new-pandas-2-0-against-polars-and-datatable-still-not-good-enough-e44dc78f6585?source=rss—-7f60cf5620c9—4
https://towardsdatascience.com/measuring-the-speed-of-new-pandas-2-0-against-polars-and-datatable-still-not-good-enough-e44dc78f6585?gi=396abcdab833&source=rss—-7f60cf5620c9—4
towardsdatascience.com
Feed Name : Towards Data Science – Medium
machine-learning,artificial-intelligence,python,data-science,programming
hashtags : #Measuring #Speed #Pandas #Polars #Datatable
[gs-fb-comments]