Skip to main content

Questions tagged [parquet]

Apache Parquet is a columnar storage format for Hadoop.

1 vote
0 answers
27 views

Issue converting struct to parquet using xitongsys/parquet-go ( runtime error: invalid memory address or nil pointer dereference )

The timestamp in my struct is causing the following error: runtime error: invalid memory address or nil pointer dereference The struct is: type MyTs struct { time.Time } func (myts *MyTs) ...
user1898662's user avatar
0 votes
0 answers
31 views

spark sql query returns column has 0 length but non null

I have a spark dataframe for a parquet file. The column is string type. spark.sql("select col_a, length(col_a) from df where col_a is not null") +-------------------+------------------------...
Dozel's user avatar
  • 159
1 vote
1 answer
49 views

Why do I get an exception when attempting automatic processing by the Hugging Face parquet-converter?

What file structure should I use on the Hugging Face Hub, if I have a /train.zip archive with PNG image files and an /metadata.csv file with annotations for them, so that the parquet-converter bot can ...
Artyom Ionash's user avatar
0 votes
1 answer
49 views

Extracting data from blob storage to Databricks[automation]

I have blob data with in different folder by year, month and date(nested folder) refreshing daily. I need to design a pipeline which will efficiently load the historical data from blob to azure ...
Saswat Ray's user avatar
0 votes
0 answers
26 views

Writing a small paraquet dataframe into google cloud storage using spark 3.5.0 taking too long

We are using Spark on-premise to simply read a parquet file from GCS(Google Cloud storage) into the DataFrame and write the DataFrame into another folder in parquet format in GCS, using below code: ...
user3830120's user avatar
0 votes
0 answers
21 views

How to append time-series data with PyArrow Datasets?

Problem I'm looking to store time-series data that's being aggregated live to Parquet Datasets via PyArrow. I receive live batched data, for example, video view count each hour for the last 24 hours. ...
humanlikely's user avatar
0 votes
1 answer
70 views

Data Factory Parquet Incorrectly Ingestion Decimals

I am working on an Azure Data Factory pipeline and noticed that when I use a parquet sink to ADLS Gen 2, certain decimals are becoming truncated, and are returning results not consistent with the data ...
Brandon Chan's user avatar
0 votes
0 answers
26 views

ValueError: Appended dtypes differ when appending two simple tables with dask

I am using Dask to write multiple very large dataframes to a single parquet dataset in python. The dataframes are simple and all the column types are either floats or strings. I iterate over the dask ...
Ian Sudbery's user avatar
  • 1,728
0 votes
0 answers
94 views

ArrowInvalid: offset overflow while concatenating arrays when subsetting a Pandas Dataframe

When reading parquet files into a Pandas Dataframe, I am doing so using the following df = pd.read_parquet(PATH_TO_FILE, dtype_backend='pyarrow') In this case, my dataframe is quite large, 52 million ...
Vincent's user avatar
  • 8,506
0 votes
1 answer
44 views

Read parquet file using pandas and pyarrow fails for time values larger than 24 hours

I have exported a parquet file using parquet.net which includes a duration column that contains values that are greater than 24 hours. I've opened the tool using the floor tool that's included with ...
Wouter's user avatar
  • 2,262
0 votes
1 answer
26 views

How to update the set of values of an enum partition in Athena

Parquet files are stored in AWS S3 with prefixes like /fruit=.../year=.../month=.../day=.../. Their data are queried via AWS Athena, with a table in which fruit is typed as an enum: 'projection.fruit....
Pragmateek's user avatar
  • 13.2k
1 vote
1 answer
67 views

Duck DB with angular unable to load and query parquet file

I'm trying to make duckdb-wasm with angular to load a parquet file and query it, I was able to create a connection, but unable to load the parquet file, getting error Error: Uncaught (in promise): ...
sharath222's user avatar
0 votes
0 answers
22 views

How to create a Hive table with JSON stored as parquet, AND there is a special character in one nested field

I am unable to create a table from nested JSON data that is stored as parquet and contains a hyphen "-" in one of the nested fields. Example JSON: { "level1": { "level-2&...
maicat11's user avatar
-1 votes
0 answers
57 views

Read data from parquet file and write it into a SQL table

I have this code for writing a parquet file to SQL database table but it is not working import pandas as pd import pyodbc try: cnxn = pyodbc.connect( r'Driver={SQL Server};' r'...
user25884369's user avatar
1 vote
3 answers
64 views

Pandas read_parquet works on Python, but not in VSCode

I'm trying to read a parquet folder using the following code: import pandas as pd df = pd.read_parquet('PASP0001.parquet') I'm working on a virtual environment. The code works perfectly if I open a ...
Osvaldo Carvalho's user avatar

15 30 50 per page
1
2 3 4 5
273