{"id":121,"date":"2014-05-29T20:44:02","date_gmt":"2014-05-29T20:44:02","guid":{"rendered":"http:\/\/localhost\/__sites\/3d_blog\/?p=121"},"modified":"2014-05-30T11:27:53","modified_gmt":"2014-05-30T11:27:53","slug":"load-xml-file-to-table","status":"publish","type":"post","link":"https:\/\/blog.silverterra.net\/?p=121","title":{"rendered":"load xml file into table with validation\/correction"},"content":{"rendered":"<p>We have many xml files and we have a smart ETL tool to process the data but the situation is not simple. The files contains some structural error and this fact is blocked this solution. Oracle time!<\/p>\n<blockquote><p>\n1. load the xml file into a table (be careful the xml is incorrect &#8211; sometimes)<br \/>\n2. correct the xml structure if we need<br \/>\n3. validate the xml with the oracle<br \/>\n4. extract the data into a plain table\n<\/p><\/blockquote>\n<p><strong>1. load the files into a table<\/strong> (the file content stored in a <code data-enlighter-language=\"sql\" class=\"EnlighterJSRAW\">CLOB<\/code> column)<br \/>\n[bs]<\/p>\n<pre data-enlighter-language=\"sql\" class=\"EnlighterJSRAW\">\r\nCREATE TABLE SAMPLE_XML_FILE\r\n(\r\n  DT_TM      DATE                               DEFAULT sysdate,\r\n  FILE_NAME  VARCHAR2(300 BYTE),\r\n  CONTENT    CLOB\r\n)\r\n<\/pre>\n<p>The file source is a web server and the code is:<\/p>\n<pre data-enlighter-language=\"sql\" class=\"EnlighterJSRAW\">\r\nprocedure pr_readFile (p_url varchar2, P_DT_TM varchar2 default to_char(sysdate,'yyyymmdd')) is\r\n  l_http_request   UTL_HTTP.req;\r\n  l_http_response  UTL_HTTP.resp;\r\n  l_clob           CLOB;\r\n  l_text           VARCHAR2(32767);\r\n  \r\nBEGIN\r\n\r\n  -- Initialize the CLOB.\r\n  DBMS_LOB.createtemporary(l_clob, FALSE);\r\n\r\n  -- Make a HTTP request and get the response.\r\n  l_http_request  := UTL_HTTP.begin_request(p_url);\r\n  l_http_response := UTL_HTTP.get_response(l_http_request);\r\n\r\n  -- Copy the response into the CLOB.\r\n  BEGIN\r\n    LOOP\r\n      UTL_HTTP.read_text(l_http_response, l_text, 32767);\r\n      DBMS_LOB.writeappend (l_clob, LENGTH(l_text), l_text);\r\n    END LOOP;\r\n  EXCEPTION\r\n    WHEN UTL_HTTP.end_of_body THEN\r\n      UTL_HTTP.end_response(l_http_response);\r\n  END;\r\n\r\n  -- Insert the data into the table.  \r\n  l_clob := replace(l_clob,cnull,'');\r\n  \r\n  if instr(l_clob,'\\&lt;title\\&gt;The page cannot be found\\&lt;\/title\\&gt;') = 0 then  \r\n    delete SAMPLE_XML_FILE a\r\n    where a.FILE_NAME = p_url;\r\n    commit;\r\n   \r\n    INSERT INTO SAMPLE_XML_FILE (DT_TM, FILE_NAME, CONTENT)\r\n    VALUES (to_date(P_DT_TM,'yyyymmdd'), p_url, l_clob);\r\n    commit;\r\n  end if;\r\n  \r\n  -- Relase the resources associated with the temporary LOB.\r\n  DBMS_LOB.freetemporary(l_clob);\r\nEXCEPTION\r\n  WHEN OTHERS THEN\r\n    UTL_HTTP.end_response(l_http_response);\r\n    DBMS_LOB.freetemporary(l_clob);\r\nend;\r\n<\/pre>\n<p>[\/bs]<br \/>\n<strong>2. correct the xml structure<\/strong> if we need (imagine this)<\/p>\n<p><strong>3. validate the xml with the oracle<\/strong> (with XMLTYPE type)<br \/>\n[bs]<\/p>\n<pre data-enlighter-language=\"sql\" class=\"EnlighterJSRAW\">\r\n...\r\nl_rec long;          -- plain text (xml content)\r\nl_recX XMLTYPE;      -- xml container\r\nl_cnt number := 1;   -- record number\r\n...\r\n-- load the file CONTENT from the SAMPLE_XML_FILE table into l_rec variable\r\n...\r\nbegin\r\n    l_recX := XMLTYPE (l_rec);       -- here is the validation\r\n    insert into SAMPLE_XML_FILE_REC  -- the rows from the xml file\r\n     (FILE_NAME, RECNUM, REC)\r\n    values\r\n     (P_FILENAME,l_cnt,l_recX);\r\nexception when others then\r\n  null;\r\n  -- the xml is invalid!\r\nend;\r\n<\/pre>\n<p>the target table was:<\/p>\n<pre data-enlighter-language=\"sql\" class=\"EnlighterJSRAW\">\r\nCREATE TABLE SAMPLE_XML_FILE_REC\r\n(\r\n  FILE_NAME  VARCHAR2(300 BYTE),\r\n  RECNUM     NUMBER,\r\n  REC        SYS.XMLTYPE\r\n)\r\n<\/pre>\n<p>[\/bs]<br \/>\n<strong>4. extract the data to plain table.<\/strong> There are 2 steps.<\/p>\n<p>[bs]<\/p>\n<ul>\n<li>we need to extract the data from the xml structure<\/li>\n<li>we need to load the extracted data into a plain table<\/li>\n<\/ul>\n<p>to extract the xml there is a quick way to create a view based on the xml column:<\/p>\n<pre data-enlighter-language=\"sql\" class=\"EnlighterJSRAW\">\r\nCREATE OR REPLACE FORCE VIEW SCHEMA.V_SAMPLE_XML_FILE_REC\r\n(..., CONTACT_NAME, CREATION_TIME, CNT_PHONELOG, ...)\r\nAS \r\nSELECT \r\n  ...\r\n  trim(extractvalue(t.REC, '\/Bsn\/Contact_Name')) CONTACT_NAME,\r\n  to_date(extractvalue(t.REC, '\/Bsn\/Cr_Time'),'MM\/DD\/YYYY HH:MI:SS AM') CREATION_TIME,\r\n  to_number(extractvalue(t.REC, '\/Bsn\/PhnLog_Cnt')) CNT_PHONELOG,\r\n  ...\r\nFROM SAMPLE_XML_FILE_REC t\r\n<\/pre>\n<p>to insert the data into a table is simple<\/p>\n<pre data-enlighter-language=\"sql\" class=\"EnlighterJSRAW\">\r\n execute immediate 'truncate table SAMPLE_XML_START_POINT';\r\n \r\n insert \/*+ APPEND *\/ into SAMPLE_XML_START_POINT\r\n   select * from V_SAMPLE_XML_FILE_REC;\r\n<\/pre>\n<p>the table itself:<\/p>\n<pre data-enlighter-language=\"sql\" class=\"EnlighterJSRAW\">\r\nCREATE TABLE SAMPLE_XML_START_POINT\r\n(\r\n  ...\r\n  CONTACT_NAME               VARCHAR2(60 BYTE),\r\n  CREATION_TIME              DATE,\r\n  CNT_PHONELOG               NUMBER,\r\n  ...\r\n)\r\n<\/pre>\n<p>[\/bs]<br \/>\nRead xml files from a web server and load into a plain table &#8211; DONE.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We have many xml files and we have a smart ETL tool to process the data but the situation is not simple. The files contains some structural error and this fact is blocked this solution. Oracle time! 1. load the xml file into a table (be careful the xml is incorrect &#8211; sometimes) 2. correct [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-121","post","type-post","status-publish","format-standard","hentry","category-oracle"],"_links":{"self":[{"href":"https:\/\/blog.silverterra.net\/index.php?rest_route=\/wp\/v2\/posts\/121","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.silverterra.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.silverterra.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.silverterra.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.silverterra.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=121"}],"version-history":[{"count":24,"href":"https:\/\/blog.silverterra.net\/index.php?rest_route=\/wp\/v2\/posts\/121\/revisions"}],"predecessor-version":[{"id":159,"href":"https:\/\/blog.silverterra.net\/index.php?rest_route=\/wp\/v2\/posts\/121\/revisions\/159"}],"wp:attachment":[{"href":"https:\/\/blog.silverterra.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=121"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.silverterra.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=121"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.silverterra.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=121"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}