Supported Formats
| Format | Extensions | Description |
|---|---|---|
| CSV | .csv, .tsv | Comma-separated or tab-separated values. The most common format for tabular data |
| Excel | .xlsx, .xls | Microsoft Excel workbooks in both modern and legacy formats. Multiple sheets are supported — Zark intelligently determines how to handle the structure |
| JSON | .json | JavaScript Object Notation. Ideal for nested or hierarchical data structures |
| Parquet | .parquet | Columnar storage format common in data engineering. Optimized for analytical queries on large datasets |
What Happens on Upload
When you upload a data file, Zark:- Detects the format — identifies the file type and encoding
- Parses the structure — identifies columns, rows, and data types
- Cleans formatting — handles inconsistencies automatically (see below)
- Creates a queryable table — stored in a format optimized for fast analytical queries
- Registers metadata — column names, types, and row count become available for the planner
Automatic Data Cleaning
Zark handles common data quality issues without manual intervention:| Issue | How Zark handles it |
|---|---|
| Inconsistent formatting | Normalizes values across rows |
| European number formats | Detects commas as decimal separators vs. thousands separators |
| Currency symbols | Strips symbols from numeric columns while preserving the values |
| Misplaced headers | Detects header rows even when they’re not in row 1 |
| Mixed types | Infers the most appropriate type per column |
| Encoding issues | Handles UTF-8, Latin-1, and other common encodings |
Excel-Specific Behavior
For Excel files with multiple sheets:- Zark examines each sheet’s structure
- Related sheets may be combined into a single table
- Sheets with incompatible structures are processed separately
- Formula results are captured (not the formulas themselves)