Code:
In [31]: df = pd.DataFrame({"a": [[{"b": 1}], [{"b": np.nan}]]})
In [32]: df
Out[32]:
a
0 [{'b': 1}]
1 [{'b': nan}]
In [33]: df.dtypes
Out[33]:
a object
dtype: object
In [34]: df.to_parquet("a.parquet")
In [35]: pd.read_parquet("a.parquet")
Out[35]:
a
0 [{'b': 1.0}]
1 [{'b': None}]
As you can see here, [{'b': 1}]
becomes [{'b': 1.0}]
.
How can I keep dtypes even in reading the parquet file?
pd.read_parquet("a.parquet")["a"].values.tolist()
-->[array([{'b': 1.0}], dtype=object), array([{'b': None}], dtype=object)]
the values arearray
type... which was orignally not..