Importing and exporting custom text data using Templates and Regex in ClickHouse
We often have to deal with data in custom text formats. That could be a non-standard format, invalid JSON, or a broken CSV. Using standard parsers like CSV or JSON won't work in all such cases. But ClickHouse has us covered here with powerful Template and Regex formats.
Importing based on a template
Suppose we want to import data from the following log file:
We can use a Template format to import this data. We have to define a template string with values placeholders for each row of input data:
Let's create a table to import our data into:
To import data using a given template, we have to save our template string in a file (row.template in our case):
We define a name of a column and escaping rule in a ${name:escaping}
format. Multiple options are available here, like CSV, JSON, Escaped, or Quoted, which implement respective escaping rules.
Now we can use the given file as an argument to the format_template_row
settings option while importing data (note, that template and data files should not have an extra \n
symbol at the end of file):
And we can make sure our data was loaded into the table:
Skipping whitespaces
Consider using TemplateIgnoreSpaces, which allows skipping whitespaces between delimiters in a template:
Exporting data using templates
We can also export data to any text format using templates as well. In this case, we have to create two files:
Result set template, which defines the layout for the whole result set:
Here, rows_read
and time
are system metrics available for each request. While data
stands for generated rows (${data}
should always come as a first placeholder in this file), based on a template defined in a row template file:
Now let's use these templates to export the following query:
Exporting to HTML files
Template-based results can also be exported to files using an INTO OUTFILE
clause. Let's generate HTML files based on given resultset and row formats:
Exporting to XML
Template format can be used to generate all imaginable text format files, including XML. Just put a relevant template and do the export.
Also consider using an XML format to get standard XML results including metadata:
Importing data based on regular expressions
Regexp format addresses more sophisticated cases when input data needs to be parsed in a more complex way. Let's parse our error.log example file, but capture the file name and protocol this time to save them into separate columns. First, let's prepare a new table for that:
Now we can import data based on a regular expression:
ClickHouse will insert data from each capture group into the relevant column based on its order. Let's check the data:
By default, ClickHouse will raise an error in case of unmatched rows. If you want to skip unmatched rows instead, enable it using format_regexp_skip_unmatched option:
Other formats
ClickHouse introduces support for many formats, both text, and binary, to cover various scenarios and platforms. Explore more formats and ways to work with them in the following articles:
- CSV and TSV formats
- Parquet
- JSON formats
- Regex and templates
- Native and binary formats
- SQL formats
And also check clickhouse-local - a portable full-featured tool to work on local/remote files without the need for Clickhouse server.