How to keep dtypes when reading a parquet file(read_parquet()) in pandas?

Question

Code:

In [31]: df = pd.DataFrame({"a": [[{"b": 1}], [{"b": np.nan}]]})

In [32]: df
Out[32]:
              a
0    [{'b': 1}]
1  [{'b': nan}]

In [33]: df.dtypes
Out[33]:
a    object
dtype: object

In [34]: df.to_parquet("a.parquet")

In [35]: pd.read_parquet("a.parquet")
Out[35]:
               a
0   [{'b': 1.0}]
1  [{'b': None}]

As you can see here, [{'b': 1}] becomes [{'b': 1.0}].

How can I keep dtypes even in reading the parquet file?

Are you sure the dtypes have changed or is it merely a display issue? — Celius Stingher, Commented Aug 15, 2022 at 23:58
I think dtypes has been changed because when I pd.read_parquet("a.parquet")["a"].values.tolist() --> [array([{'b': 1.0}], dtype=object), array([{'b': None}], dtype=object)] the values are array type... which was orignally not.. — user3595632, Commented Aug 16, 2022 at 0:03

0x26res · Accepted Answer · 2022-08-17 14:38:25Z

1

You can try to use pyarrow.parquet.read_table and pyarrow.Table.to_pandas with integer_object_nulls (see the doc)

import pyarrow.parquet as pq

pq.read_table("a.parquet").to_pandas(integer_object_nulls=True)

	a
0	[{'b': 1}]
1	[{'b': None}]

On the other hand, it looks like pandas.read_parquet with use_nullable_dtypes doesn't work.

df = pd.DataFrame({"a": [[{"b": 1}], [{"b": None}]]})

df.to_parquet("a.parquet")
pd.read_parquet("a.parquet", use_nullable_dtypes=True)

	a
0	[{'b': 1.0}]
1	[{'b': None}]

edited Aug 17, 2022 at 14:38

answered Aug 16, 2022 at 13:44

0x26res

13.3k11 gold badges58 silver badges117 bronze badges

It still {'b': 1.0} instead of {'b': 1} ...
– user3595632
Commented Aug 17, 2022 at 14:11
oops my bad. Just edited the answer
– 0x26res
Commented Aug 17, 2022 at 14:33

Add a comment |

Collectives™ on Stack Overflow

How to keep dtypes when reading a parquet file(read_parquet()) in pandas?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
python
pandas
parquet
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged pythonpandasparquet or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
parquet
or ask your own question.