Questions tagged [parquet]
Apache Parquet is a columnar storage format for Hadoop.
parquet
4,082
questions
1
vote
0
answers
27
views
Issue converting struct to parquet using xitongsys/parquet-go ( runtime error: invalid memory address or nil pointer dereference )
The timestamp in my struct is causing the following error:
runtime error: invalid memory address or nil pointer dereference
The struct is:
type MyTs struct {
time.Time
}
func (myts *MyTs) ...
0
votes
0
answers
31
views
spark sql query returns column has 0 length but non null
I have a spark dataframe for a parquet file. The column is string type.
spark.sql("select col_a, length(col_a) from df where col_a is not null")
+-------------------+------------------------...
1
vote
1
answer
49
views
Why do I get an exception when attempting automatic processing by the Hugging Face parquet-converter?
What file structure should I use on the Hugging Face Hub, if I have a /train.zip archive with PNG image files and an /metadata.csv file with annotations for them, so that the parquet-converter bot can ...
0
votes
1
answer
49
views
Extracting data from blob storage to Databricks[automation]
I have blob data with in different folder by year, month and date(nested folder) refreshing daily.
I need to design a pipeline which will efficiently load the historical data from blob to azure ...
0
votes
0
answers
26
views
Writing a small paraquet dataframe into google cloud storage using spark 3.5.0 taking too long
We are using Spark on-premise to simply read a parquet file from GCS(Google Cloud storage) into the DataFrame and write the DataFrame into another folder in parquet format in GCS, using below code:
...
0
votes
0
answers
21
views
How to append time-series data with PyArrow Datasets?
Problem
I'm looking to store time-series data that's being aggregated live to Parquet Datasets via PyArrow. I receive live batched data, for example, video view count each hour for the last 24 hours. ...
0
votes
1
answer
70
views
Data Factory Parquet Incorrectly Ingestion Decimals
I am working on an Azure Data Factory pipeline and noticed that when I use a parquet sink to ADLS Gen 2, certain decimals are becoming truncated, and are returning results not consistent with the data ...
0
votes
0
answers
26
views
ValueError: Appended dtypes differ when appending two simple tables with dask
I am using Dask to write multiple very large dataframes to a single parquet dataset in python.
The dataframes are simple and all the column types are either floats or strings.
I iterate over the dask ...
0
votes
0
answers
94
views
ArrowInvalid: offset overflow while concatenating arrays when subsetting a Pandas Dataframe
When reading parquet files into a Pandas Dataframe, I am doing so using the following
df = pd.read_parquet(PATH_TO_FILE, dtype_backend='pyarrow')
In this case, my dataframe is quite large, 52 million ...
0
votes
1
answer
44
views
Read parquet file using pandas and pyarrow fails for time values larger than 24 hours
I have exported a parquet file using parquet.net which includes a duration column that contains values that are greater than 24 hours. I've opened the tool using the floor tool that's included with ...
0
votes
1
answer
26
views
How to update the set of values of an enum partition in Athena
Parquet files are stored in AWS S3 with prefixes like /fruit=.../year=.../month=.../day=.../.
Their data are queried via AWS Athena, with a table in which fruit is typed as an enum:
'projection.fruit....
1
vote
1
answer
67
views
Duck DB with angular unable to load and query parquet file
I'm trying to make duckdb-wasm with angular to load a parquet file and query it, I was able to create a connection, but unable to load the parquet file, getting error
Error: Uncaught (in promise): ...
0
votes
0
answers
22
views
How to create a Hive table with JSON stored as parquet, AND there is a special character in one nested field
I am unable to create a table from nested JSON data that is stored as parquet and contains a hyphen "-" in one of the nested fields.
Example JSON:
{
"level1": {
"level-2&...
-1
votes
0
answers
57
views
Read data from parquet file and write it into a SQL table
I have this code for writing a parquet file to SQL database table but it is not working
import pandas as pd
import pyodbc
try:
cnxn = pyodbc.connect(
r'Driver={SQL Server};'
r'...
1
vote
3
answers
64
views
Pandas read_parquet works on Python, but not in VSCode
I'm trying to read a parquet folder using the following code:
import pandas as pd
df = pd.read_parquet('PASP0001.parquet')
I'm working on a virtual environment. The code works perfectly if I open a ...