load xml file into table with validation/correction [ 836 views ]
We have many xml files and we have a smart ETL tool to process the data but the situation is not simple. The files contains some structural error and this fact is blocked this solution. Oracle time!
1. load the xml file into a table (be careful the xml is incorrect – sometimes)
2. correct the xml structure if we need
3. validate the xml with the oracle
4. extract the data into a plain table
1. load the files into a table (the file content stored in a CLOB
column)
[bs]
CREATE TABLE SAMPLE_XML_FILE ( DT_TM DATE DEFAULT sysdate, FILE_NAME VARCHAR2(300 BYTE), CONTENT CLOB )
The file source is a web server and the code is:
procedure pr_readFile (p_url varchar2, P_DT_TM varchar2 default to_char(sysdate,'yyyymmdd')) is l_http_request UTL_HTTP.req; l_http_response UTL_HTTP.resp; l_clob CLOB; l_text VARCHAR2(32767); BEGIN -- Initialize the CLOB. DBMS_LOB.createtemporary(l_clob, FALSE); -- Make a HTTP request and get the response. l_http_request := UTL_HTTP.begin_request(p_url); l_http_response := UTL_HTTP.get_response(l_http_request); -- Copy the response into the CLOB. BEGIN LOOP UTL_HTTP.read_text(l_http_response, l_text, 32767); DBMS_LOB.writeappend (l_clob, LENGTH(l_text), l_text); END LOOP; EXCEPTION WHEN UTL_HTTP.end_of_body THEN UTL_HTTP.end_response(l_http_response); END; -- Insert the data into the table. l_clob := replace(l_clob,cnull,''); if instr(l_clob,'\<title\>The page cannot be found\</title\>') = 0 then delete SAMPLE_XML_FILE a where a.FILE_NAME = p_url; commit; INSERT INTO SAMPLE_XML_FILE (DT_TM, FILE_NAME, CONTENT) VALUES (to_date(P_DT_TM,'yyyymmdd'), p_url, l_clob); commit; end if; -- Relase the resources associated with the temporary LOB. DBMS_LOB.freetemporary(l_clob); EXCEPTION WHEN OTHERS THEN UTL_HTTP.end_response(l_http_response); DBMS_LOB.freetemporary(l_clob); end;
[/bs]
2. correct the xml structure if we need (imagine this)
3. validate the xml with the oracle (with XMLTYPE type)
[bs]
... l_rec long; -- plain text (xml content) l_recX XMLTYPE; -- xml container l_cnt number := 1; -- record number ... -- load the file CONTENT from the SAMPLE_XML_FILE table into l_rec variable ... begin l_recX := XMLTYPE (l_rec); -- here is the validation insert into SAMPLE_XML_FILE_REC -- the rows from the xml file (FILE_NAME, RECNUM, REC) values (P_FILENAME,l_cnt,l_recX); exception when others then null; -- the xml is invalid! end;
the target table was:
CREATE TABLE SAMPLE_XML_FILE_REC ( FILE_NAME VARCHAR2(300 BYTE), RECNUM NUMBER, REC SYS.XMLTYPE )
[/bs]
4. extract the data to plain table. There are 2 steps.
[bs]
- we need to extract the data from the xml structure
- we need to load the extracted data into a plain table
to extract the xml there is a quick way to create a view based on the xml column:
CREATE OR REPLACE FORCE VIEW SCHEMA.V_SAMPLE_XML_FILE_REC (..., CONTACT_NAME, CREATION_TIME, CNT_PHONELOG, ...) AS SELECT ... trim(extractvalue(t.REC, '/Bsn/Contact_Name')) CONTACT_NAME, to_date(extractvalue(t.REC, '/Bsn/Cr_Time'),'MM/DD/YYYY HH:MI:SS AM') CREATION_TIME, to_number(extractvalue(t.REC, '/Bsn/PhnLog_Cnt')) CNT_PHONELOG, ... FROM SAMPLE_XML_FILE_REC t
to insert the data into a table is simple
execute immediate 'truncate table SAMPLE_XML_START_POINT'; insert /*+ APPEND */ into SAMPLE_XML_START_POINT select * from V_SAMPLE_XML_FILE_REC;
the table itself:
CREATE TABLE SAMPLE_XML_START_POINT ( ... CONTACT_NAME VARCHAR2(60 BYTE), CREATION_TIME DATE, CNT_PHONELOG NUMBER, ... )
[/bs]
Read xml files from a web server and load into a plain table – DONE.