BlogFaq400

Categories: 01 - Programming (EN)01e - Programming miscellanea (EN)

DB2 Import Text/CSV files

A very common activity for those working with DB2 for i is to have to import information from external data sources and store it in DB2 tables.

There are various methods and contexts in which it is necessary to acquire data from sources outside DB2. In this article we focus on importing data from CSV type text files.

A CSV (Comma-Separated Value) file is a text file in which each line represents a record and within each line the “fields” are separated by a special character.

The reference document that describes the CSV format is RFC 4180 which reads as follows in the introductory part:

The comma separated values format (CSV) has been used for exchanging and converting data between various spreadsheet programs for quite some time. Surprisingly, while this forma[…]t is very common, it has never been formally documented

Index

Toggle

Separator character

The character with which the fields are delimited can obviously be the comma “,”, according to the name of the format itself, but other characters can be used. The most common are: tab, semicolon “;”, pipe “|”, space “”.

The choice of the field separator character is a choice that one undergoes, in the case in which the file is received from an external source that cannot be controlled by us and therefore one must adapt.

In case you have the possibility to choose the separator character personally I prefer either the pipe or the tabulation; or characters that very rarely may be present in the content of the fields.

Main rules for a “well-formed” csv file

As RFC 4180 highlights, there is NO recognized standard for the CSV format, so there is a variety of interpretations. In any case, RFC 4180 attempts to define some rules for writing “well” a csv file. I summarize the main rules:

each line represents a record and each line must end with CRLF characters
optionally the first line of the file can contain the column headings
in each line one or more fields can be present. In any case all the lines must contain the same number of fields
optionally each field can be enclosed in double quotes “. (N.B. Microsoft Excel does not use double quotes)
if the fields are enclosed in double quotes and one field contains a double quote, this must be doubled

It is not described in RFC 4180, but the parameter can be specified in the first line of a csv file

sep = <delimiter>

to indicate what is the separator character of the fields.

Although I have never seen a csv file containing this parameter.

Importing a csv file into DB2 for i

To import a CSV file into a DB2 table, you must first copy it to a folder in the “root” file system and then use the CPYFRMIMPF command.

The CpyFrmImpF command copies an “import file” (in our case the CSV file) into a physical / logical file.

In general, “import file” is a file created for the purpose of copying data between heterogeneous systems.

Let’s analyze the most important parameters of this command useful for importing a csv file into a DB2 table.

The table must already exist in DB2 and the table fields must match the fields in the CSV file.

The data type of the DB2 table fields must be compatible with the contents of the CSV file.

In other words, if for example the 3rd field of the table is defined as numerical 5 along which 2 decimals, the content of the 3rd field in all lines of the CSV file must be either null or contain a valid number with at most 3 integers and 2 decimals and must there is the decimal separator character (or comma or dot).

For every error that the DB2 finds trying to import the contents of the various fields of the CSV file, a message is recorded in the job joblog and the line affected by the error is discarded. In the message present in the joblog it is in most cases specified exactly on which field the problem was found.

N.B. The import DOES NOT stop at the first line which has an error, but continues to import the following lines.

Let’s analyze in detail the most important parameters of the CPYFRMIMPF command:

RCDDLM: specifies the end-of-line character used in the CSV file. Normally the end-of-line character is * CRLF (or carriage return + line feed). If you have any doubts about which end of line character you can view the text file in hexadecimal mode. The hexadecimal code for the CR character is 0D, instead for the LF character it is 0A.
DTAFMT: * DLM specifies that you are importing a delimited text file
STRDLM: if in the csv file the alphanumeric fields are enclosed in double quotes, specify * DBLQUOTE, otherwise specify * NONE.
STRESCCHR: specify the escape character. By default the escape character coincides with the string delimiter (parameter STRDLM). Setting this parameter to * STRDLM complies with rule n. 5 described in the previous paragraph.
RMVBLANK: specifies whether blanks characters must be removed. I recommend using the value * BOTH (= Leading and trailing spaces are eliminated)
FLDDLM: specify the field delimiter. The default value is the comma. To specify the tabulation as the field delimiter use the special value * TAB. It can’t be space.
DECPNT: specifies the decimal separator. Note that the default value is * PERIOD (ie the point). To use the comma as a decimal separator, indicate * COMMA.
RPLNULLVAL: if the DB2 target table does not allow null values and there are null fields in the CSV file (that is, there are consecutive field separators without even a space between them) you need to set this parameter to * FLDDFT to make sure that the null value is replaced with the default value congruent with the data type of the DB2 field
RMVCOLNAM: if the first line of the CSV file contains the column headers, you must specify * YES for it to be excluded from the import.

Using a file to record errors

The CpyFrmImpF command allows you to copy the lines of the CSV file that have not been imported into an “error” file. It is a possibility perhaps little known, but very convenient to identify more easily the lines that are rejected by the import. Very useful especially when the csv file contains hundreds or thousands of lines.

The parameters to use to manage the error file are:

ERRRCDFILE: specifies the file in which the wrong records will be written. I recommend creating a source file of sufficient length to contain an entire line of the CSV file. Eg if the CSV file can have lines with a maximum length of 500 characters, the source file must be created with a length of 500 or more:

CRTSRCPF FILE (QGPL / ERRORS) RCDLEN (550) MBR (* FILE) SIZE (* NOMAX)

ERRRCDOPT: I recommend specifying * REPLACE so that with each execution of the CPYFRMIMPF command the error file is replaced.

Additional information

the fields of the text file that are longer than the corresponding field of the table will be truncated
two consecutive field delimiters or two consecutive string delimiters or a field delimiter followed by a record delimiter means that the corresponding field of the table will be null

How to display the end-of-line character

If you have doubts about the end-of-line character of a csv file, you can open and check the last characters of the line by displaying them in hexadecimal mode.

Open the csv file stored in a folder on IFS with the DSPF command or using option 5 = Display by the WRKLNK command.

Move the view to the right until the end of the line is shown: type in the column placement the last column of the text file. Then press F10 = Hexadecimal display

Import text files with fixed columns (fixed data)

In case you need to import text files with fixed columns into DB2, you can still use the CPYFRMIMPF command.

In addition to what has been written so far it is necessary to provide a definition file for the file to be imported or a FDF (field definition file) file.

Each line of the FDF file contains the following information:

field name (corresponding to the system field name of the physical file written in uppercase)
initial position in the text file
final position in the text file
position of the null value indicator in the text file (if the text file does not have the null value indicator specify 0)

The 4 values must be separated by a space.

Other useful information relating to the FDF file:

blank lines are ignored
the comment lines are preceded by a double dash –
the last line can be * END, the following lines will be ignored (ie will be considered as comments)
it is not necessary to specify all the fields of the physical file in the FDF file. The fields not defined in the FDF file will be set to the default value
the order in which the fields are written in the FDF file is not important
instead of the field name you can specify the keyword * COL. In this case, however, the FDF file must contain the fields in exact order.
the FDF file must be written either in a source file or in a physical file with only one field

The null value indicator in the text file must have the values Y or N.

FDF file example:

-- item number
CODART 1 15 0
-- article description
DESART 16 50 0
- list price
PRICE 51 13 0
* END
these lines after * END will be ignored

Bibliography

RFC 4180, Common Format and MIME Type for Comma-Separated Values (CSV) Files, Oct-2005: https://www.ietf.org/rfc/rfc4180.txt
ITJungle vol. article 13 n. 18 “CPYFRMIMPF And Fixed Data”, Sep 25, 2013, by Ted Holt: https://www.itjungle.com/2013/09/25/fhg092513-story02/

Read this article and many others on the website https://www.markonetools.it

5 years ago

Next RTVDSKINF / PRTDSKINF Goodbye! »

Previous « Print spoolfiles via pdf streaming

Managing Locked Display Files on IBM i: An Effective Solution

We are pleased to receive and share this "tip & trick" from Patrick Rizzi, which introduces a technique that allows…

3 months ago

04 - System Administration (EN)

Monitoring QSYSOPR Messages: SQL to Retrieve Messages and Replies

I take inspiration from a response by Michael Mayer on the Midrange.com mailing lists to someone who asked how to…

3 months ago

06d - Business organization and consulting

Why ERP is the Key to Success for Modern Businesses

Businesses are increasingly seeking tools to enhance efficiency, collaboration, and resource management. Enterprise Resource Planning (ERP) systems provide a comprehensive…

5 months ago

04a - Access Client Solution / Client Access (EN)

ACS Access Client Solution 1.1.9.5

Early April saw the release of the "Spring Version" of ACS Access Client Solution, version 1.1.9.5 Interesting new features especially…

9 months ago

06a - Training and Books

Tim Rowe and Scott Forstie for CEC 2024 – Milan

If the packed agenda of sessions at Common Europe Congress 2024, June 3-6 Milan, wasn't enough for you, here's another…

9 months ago

01 - Programming (EN)

Code for IBM i 2.10.0 – Debug IBM i App with Visual Studio Code

Debugging functions with Visual Studio Code have been available for some time but this new version 2.10.0 simplifies the handling…

9 months ago

DB2 Import Text/CSV files

Separator character

Main rules for a “well-formed” csv file

Importing a csv file into DB2 for i

Using a file to record errors

Additional information

How to display the end-of-line character

Import text files with fixed columns (fixed data)

Bibliography

Related Post

Recent Posts

Managing Locked Display Files on IBM i: An Effective Solution

Monitoring QSYSOPR Messages: SQL to Retrieve Messages and Replies

Why ERP is the Key to Success for Modern Businesses

ACS Access Client Solution 1.1.9.5

Tim Rowe and Scott Forstie for CEC 2024 – Milan

Code for IBM i 2.10.0 – Debug IBM i App with Visual Studio Code

Headline