Skip to main content
When you upload a structured data file to your Space, Zark automatically converts it into a queryable dataset. You can then ask analytical questions in plain language — no SQL, no formulas, no setup.

Supported Formats

FormatExtensionsDescription
CSV.csv, .tsvComma-separated or tab-separated values. The most common format for tabular data
Excel.xlsx, .xlsMicrosoft Excel workbooks in both modern and legacy formats. Multiple sheets are supported — Zark intelligently determines how to handle the structure
JSON.jsonJavaScript Object Notation. Ideal for nested or hierarchical data structures
Parquet.parquetColumnar storage format common in data engineering. Optimized for analytical queries on large datasets

What Happens on Upload

When you upload a data file, Zark:
  1. Detects the format — identifies the file type and encoding
  2. Parses the structure — identifies columns, rows, and data types
  3. Cleans formatting — handles inconsistencies automatically (see below)
  4. Creates a queryable table — stored in a format optimized for fast analytical queries
  5. Registers metadata — column names, types, and row count become available for the planner
The entire process takes seconds for most files. Very large files (millions of rows) take longer but are still processed automatically.

Automatic Data Cleaning

Zark handles common data quality issues without manual intervention:
IssueHow Zark handles it
Inconsistent formattingNormalizes values across rows
European number formatsDetects commas as decimal separators vs. thousands separators
Currency symbolsStrips symbols from numeric columns while preserving the values
Misplaced headersDetects header rows even when they’re not in row 1
Mixed typesInfers the most appropriate type per column
Encoding issuesHandles UTF-8, Latin-1, and other common encodings

Excel-Specific Behavior

For Excel files with multiple sheets:
  • Zark examines each sheet’s structure
  • Related sheets may be combined into a single table
  • Sheets with incompatible structures are processed separately
  • Formula results are captured (not the formulas themselves)

Large Datasets

Zark handles datasets with millions of rows. Data is stored in a columnar engine optimized for aggregation queries — counts, sums, averages, group-by, and time-series analysis all perform well at scale. Results are paginated at display time. The full dataset remains queryable regardless of size.

Verifying Your Data

After upload, you can verify the structure and contents:
How many rows are in this data?
What columns are available?
Show me a random sample of 10 rows
Are there any data quality issues?
Show me unique values in the status column
These verification queries help you confirm that Zark parsed your file correctly before running analysis.