I've already test three ways of converting a csv file to a parquet file. You can find them below. All the three created the parquet file. I've tried to view the contents of the parquet file using "APACHE PARQUET VIEWER" on Windows and I always got the following error message:
"encoding RLE_DICTIONARY is not supported"
Is there any way to avoid this? Maybe a way to use another type of encoding?... Below the code:
1º Using pandas:
2º Using pyarrow:
3º Using dask:
"encoding RLE_DICTIONARY is not supported"
Is there any way to avoid this? Maybe a way to use another type of encoding?... Below the code:
1º Using pandas:
Python:
import pandas as pd
df = pd.read_csv("filename.csv")
df.to_parquet("filename.parquet")
2º Using pyarrow:
Python:
from pyarrow import csv, parquet
table = csv.read_csv("filename.csv")
parquet.write_table(table, "filename.parquet")
3º Using dask:
Python:
from dask.dataframe import read_csv
dask_df = read_csv("filename.csv", dtype={'column_xpto': 'float64'})
dask_df.to_parquet("filename.parquet")