A very common activity for those working with DB2 for i is to have to import information from external data sources and store it in DB2 tables.
There are various methods and contexts in which it is necessary to acquire data from sources outside DB2. In this article we focus on importing data from CSV type text files.
A CSV (Comma-Separated Value) file is a text file in which each line represents a record and within each line the “fields” are separated by a special character.
The reference document that describes the CSV format is RFC 4180 which reads as follows in the introductory part:
The comma separated values format (CSV) has been used for exchanging and converting data between various spreadsheet programs for quite some time. Surprisingly, while this forma[…]t is very common, it has never been formally documented
Index
The character with which the fields are delimited can obviously be the comma “,”, according to the name of the format itself, but other characters can be used. The most common are: tab, semicolon “;”, pipe “|”, space “”.
The choice of the field separator character is a choice that one undergoes, in the case in which the file is received from an external source that cannot be controlled by us and therefore one must adapt.
In case you have the possibility to choose the separator character personally I prefer either the pipe or the tabulation; or characters that very rarely may be present in the content of the fields.
As RFC 4180 highlights, there is NO recognized standard for the CSV format, so there is a variety of interpretations. In any case, RFC 4180 attempts to define some rules for writing “well” a csv file. I summarize the main rules:
It is not described in RFC 4180, but the parameter can be specified in the first line of a csv file
sep = <delimiter>
to indicate what is the separator character of the fields.
Although I have never seen a csv file containing this parameter.
To import a CSV file into a DB2 table, you must first copy it to a folder in the “root” file system and then use the CPYFRMIMPF command.
The CpyFrmImpF command copies an “import file” (in our case the CSV file) into a physical / logical file.
In general, “import file” is a file created for the purpose of copying data between heterogeneous systems.
Let’s analyze the most important parameters of this command useful for importing a csv file into a DB2 table.
The table must already exist in DB2 and the table fields must match the fields in the CSV file.
The data type of the DB2 table fields must be compatible with the contents of the CSV file.
In other words, if for example the 3rd field of the table is defined as numerical 5 along which 2 decimals, the content of the 3rd field in all lines of the CSV file must be either null or contain a valid number with at most 3 integers and 2 decimals and must there is the decimal separator character (or comma or dot).
For every error that the DB2 finds trying to import the contents of the various fields of the CSV file, a message is recorded in the job joblog and the line affected by the error is discarded. In the message present in the joblog it is in most cases specified exactly on which field the problem was found.
N.B. The import DOES NOT stop at the first line which has an error, but continues to import the following lines.
Let’s analyze in detail the most important parameters of the CPYFRMIMPF command:
The CpyFrmImpF command allows you to copy the lines of the CSV file that have not been imported into an “error” file. It is a possibility perhaps little known, but very convenient to identify more easily the lines that are rejected by the import. Very useful especially when the csv file contains hundreds or thousands of lines.
The parameters to use to manage the error file are:
CRTSRCPF FILE (QGPL / ERRORS) RCDLEN (550) MBR (* FILE) SIZE (* NOMAX)
If you have doubts about the end-of-line character of a csv file, you can open and check the last characters of the line by displaying them in hexadecimal mode.
Open the csv file stored in a folder on IFS with the DSPF comm
and or using option 5 = Display by the WRKLNK command
.
Move the view to the right until the end of the line is shown: type in the column placement the last column of the text file. Then press F10 = Hexadecimal display
In case you need to import text files with fixed columns into DB2, you can still use the CPYFRMIMPF command.
In addition to what has been written so far it is necessary to provide a definition file for the file to be imported or a FDF (field definition file) file.
Each line of the FDF file contains the following information:
The 4 values must be separated by a space.
Other useful information relating to the FDF file:
The null value indicator in the text file must have the values Y or N.
FDF file example:
-- item number CODART 1 15 0 -- article description DESART 16 50 0 - list price PRICE 51 13 0 * END these lines after * END will be ignored
Read this article and many others on the website https://www.markonetools.it
We are pleased to receive and share this "tip & trick" from Patrick Rizzi, which introduces a technique that allows…
I take inspiration from a response by Michael Mayer on the Midrange.com mailing lists to someone who asked how to…
Businesses are increasingly seeking tools to enhance efficiency, collaboration, and resource management. Enterprise Resource Planning (ERP) systems provide a comprehensive…
Early April saw the release of the "Spring Version" of ACS Access Client Solution, version 1.1.9.5 Interesting new features especially…
If the packed agenda of sessions at Common Europe Congress 2024, June 3-6 Milan, wasn't enough for you, here's another…
Debugging functions with Visual Studio Code have been available for some time but this new version 2.10.0 simplifies the handling…