Supported Data Formats

When you upload a structured data file to your Space, Zark automatically converts it into a queryable dataset. You can then ask analytical questions in plain language — no SQL, no formulas, no setup.

Supported Formats

Format	Extensions	Description
CSV	`.csv`, `.tsv`	Comma-separated or tab-separated values. The most common format for tabular data
Excel	`.xlsx`, `.xls`	Microsoft Excel workbooks in both modern and legacy formats. Multiple sheets are supported — Zark intelligently determines how to handle the structure
JSON	`.json`	JavaScript Object Notation. Ideal for nested or hierarchical data structures
Parquet	`.parquet`	Columnar storage format common in data engineering. Optimized for analytical queries on large datasets

What Happens on Upload

When you upload a data file, Zark:

Detects the format — identifies the file type and encoding
Parses the structure — identifies columns, rows, and data types
Cleans formatting — handles inconsistencies automatically (see below)
Creates a queryable table — stored in a format optimized for fast analytical queries
Registers metadata — column names, types, and row count become available for the planner

The entire process takes seconds for most files. Very large files (millions of rows) take longer but are still processed automatically.

Automatic Data Cleaning

Zark handles common data quality issues without manual intervention:

Issue	How Zark handles it
Inconsistent formatting	Normalizes values across rows
European number formats	Detects commas as decimal separators vs. thousands separators
Currency symbols	Strips symbols from numeric columns while preserving the values
Misplaced headers	Detects header rows even when they’re not in row 1
Mixed types	Infers the most appropriate type per column
Encoding issues	Handles UTF-8, Latin-1, and other common encodings

Excel-Specific Behavior

For Excel files with multiple sheets:

Zark examines each sheet’s structure
Related sheets may be combined into a single table
Sheets with incompatible structures are processed separately
Formula results are captured (not the formulas themselves)

Large Datasets

Zark handles datasets with millions of rows. Data is stored in a columnar engine optimized for aggregation queries — counts, sums, averages, group-by, and time-series analysis all perform well at scale. Results are paginated at display time. The full dataset remains queryable regardless of size.

Verifying Your Data

After upload, you can verify the structure and contents:

How many rows are in this data?

What columns are available?

Show me a random sample of 10 rows

Are there any data quality issues?

Show me unique values in the status column

These verification queries help you confirm that Zark parsed your file correctly before running analysis.

Developers

Platform Capabilities

Marketplace

AI Spaces

File Storage

Working with Data

Reference

Supported Formats

What Happens on Upload

Automatic Data Cleaning

Excel-Specific Behavior

Large Datasets

Verifying Your Data

Developers

Platform Capabilities

Marketplace

AI Spaces

File Storage

Working with Data

Reference

​Supported Formats

​What Happens on Upload

​Automatic Data Cleaning

​Excel-Specific Behavior

​Large Datasets

​Verifying Your Data

Supported Formats

What Happens on Upload

Automatic Data Cleaning

Excel-Specific Behavior

Large Datasets

Verifying Your Data