load xml file into table with validation/correction  [ 582 views ]

We have many xml files and we have a smart ETL tool to process the data but the situation is not simple. The files contains some structural error and this fact is blocked this solution. Oracle time!

1. load the xml file into a table (be careful the xml is incorrect – sometimes)
2. correct the xml structure if we need
3. validate the xml with the oracle
4. extract the data into a plain table

1. load the files into a table (the file content stored in a CLOB column)
[bs]

CREATE TABLE SAMPLE_XML_FILE
(
  DT_TM      DATE                               DEFAULT sysdate,
  FILE_NAME  VARCHAR2(300 BYTE),
  CONTENT    CLOB
)

The file source is a web server and the code is:

procedure pr_readFile (p_url varchar2, P_DT_TM varchar2 default to_char(sysdate,'yyyymmdd')) is
  l_http_request   UTL_HTTP.req;
  l_http_response  UTL_HTTP.resp;
  l_clob           CLOB;
  l_text           VARCHAR2(32767);
  
BEGIN

  -- Initialize the CLOB.
  DBMS_LOB.createtemporary(l_clob, FALSE);

  -- Make a HTTP request and get the response.
  l_http_request  := UTL_HTTP.begin_request(p_url);
  l_http_response := UTL_HTTP.get_response(l_http_request);

  -- Copy the response into the CLOB.
  BEGIN
    LOOP
      UTL_HTTP.read_text(l_http_response, l_text, 32767);
      DBMS_LOB.writeappend (l_clob, LENGTH(l_text), l_text);
    END LOOP;
  EXCEPTION
    WHEN UTL_HTTP.end_of_body THEN
      UTL_HTTP.end_response(l_http_response);
  END;

  -- Insert the data into the table.  
  l_clob := replace(l_clob,cnull,'');
  
  if instr(l_clob,'\<title\>The page cannot be found\</title\>') = 0 then  
    delete SAMPLE_XML_FILE a
    where a.FILE_NAME = p_url;
    commit;
   
    INSERT INTO SAMPLE_XML_FILE (DT_TM, FILE_NAME, CONTENT)
    VALUES (to_date(P_DT_TM,'yyyymmdd'), p_url, l_clob);
    commit;
  end if;
  
  -- Relase the resources associated with the temporary LOB.
  DBMS_LOB.freetemporary(l_clob);
EXCEPTION
  WHEN OTHERS THEN
    UTL_HTTP.end_response(l_http_response);
    DBMS_LOB.freetemporary(l_clob);
end;

[/bs]
2. correct the xml structure if we need (imagine this)

3. validate the xml with the oracle (with XMLTYPE type)
[bs]

...
l_rec long;          -- plain text (xml content)
l_recX XMLTYPE;      -- xml container
l_cnt number := 1;   -- record number
...
-- load the file CONTENT from the SAMPLE_XML_FILE table into l_rec variable
...
begin
    l_recX := XMLTYPE (l_rec);       -- here is the validation
    insert into SAMPLE_XML_FILE_REC  -- the rows from the xml file
     (FILE_NAME, RECNUM, REC)
    values
     (P_FILENAME,l_cnt,l_recX);
exception when others then
  null;
  -- the xml is invalid!
end;

the target table was:

CREATE TABLE SAMPLE_XML_FILE_REC
(
  FILE_NAME  VARCHAR2(300 BYTE),
  RECNUM     NUMBER,
  REC        SYS.XMLTYPE
)

[/bs]
4. extract the data to plain table. There are 2 steps.

[bs]

  • we need to extract the data from the xml structure
  • we need to load the extracted data into a plain table

to extract the xml there is a quick way to create a view based on the xml column:

CREATE OR REPLACE FORCE VIEW SCHEMA.V_SAMPLE_XML_FILE_REC
(..., CONTACT_NAME, CREATION_TIME, CNT_PHONELOG, ...)
AS 
SELECT 
  ...
  trim(extractvalue(t.REC, '/Bsn/Contact_Name')) CONTACT_NAME,
  to_date(extractvalue(t.REC, '/Bsn/Cr_Time'),'MM/DD/YYYY HH:MI:SS AM') CREATION_TIME,
  to_number(extractvalue(t.REC, '/Bsn/PhnLog_Cnt')) CNT_PHONELOG,
  ...
FROM SAMPLE_XML_FILE_REC t

to insert the data into a table is simple

 execute immediate 'truncate table SAMPLE_XML_START_POINT';
 
 insert /*+ APPEND */ into SAMPLE_XML_START_POINT
   select * from V_SAMPLE_XML_FILE_REC;

the table itself:

CREATE TABLE SAMPLE_XML_START_POINT
(
  ...
  CONTACT_NAME               VARCHAR2(60 BYTE),
  CREATION_TIME              DATE,
  CNT_PHONELOG               NUMBER,
  ...
)

[/bs]
Read xml files from a web server and load into a plain table – DONE.

#sidebar a { color:#fff; } #sidebar ul ul li { color: #DEF585; } #sidebar h2 { color: #fff; } #sidebar ul p, #sidebar ul select { color: #BEDDBE; } #backfly { background: url(images/golfBallWallPaper.jpg) left bottom fixed repeat-x #65a51d; }