here you will find how to do it using regular expression , this code is written using C# :
private static DataSet ConvertHTMLTablesToDataSet(string HTML)
{
DataTable dt;
DataSet ds = new DataSet();
dt = new DataTable();
string TableExpression = "<table[^>]*>(.*?)</table>";
string HeaderExpression = "<th[^>]*>(.*?)</th>";
string RowExpression = "<tr[^>]*>(.*?)</tr>";
string ColumnExpression = "<td[^>]*>(.*?)</td>";
bool HeadersExist = false;
int iCurrentColumn = 0;
int iCurrentRow = 0;
MatchCollection Tables = Regex.Matches(HTML,
TableExpression,
RegexOptions.Singleline |
RegexOptions.Multiline |
RegexOptions.IgnoreCase);
foreach (Match Table in Tables)
{
iCurrentRow = 0;
HeadersExist = false;
dt = new DataTable();
if (Table.Value.Contains("<th"))
{
HeadersExist = true;
MatchCollection Headers = Regex.Matches(Table.Value,
HeaderExpression,
RegexOptions.Singleline |
RegexOptions.Multiline |
RegexOptions.IgnoreCase);
foreach (Match Header in Headers)
{
dt.Columns.Add(Header.Groups[1].ToString());
}
}
else
{
int myvar2222 = Regex.Matches(
Regex.Matches(
Regex.Matches(
Table.Value,
TableExpression,
RegexOptions.Singleline
| RegexOptions.Multiline |
RegexOptions.IgnoreCase)[0].ToString(),
RowExpression, RegexOptions.Singleline |
RegexOptions.Multiline |
RegexOptions.IgnoreCase)[0].ToString(),
ColumnExpression,
RegexOptions.Singleline |
RegexOptions.Multiline |
RegexOptions.IgnoreCase).Count;
for (int iColumns = 1; iColumns <= myvar2222; iColumns++)
{
dt.Columns.Add("Column " + System.Convert.ToString(iColumns));
}
}
MatchCollection Rows = Regex.Matches(Table.Value,
RowExpression,
RegexOptions.Singleline |
RegexOptions.Multiline | RegexOptions.IgnoreCase);
foreach (Match Row in Rows)
{
if (!((iCurrentRow == 0) & HeadersExist))
{
DataRow dr = dt.NewRow();
iCurrentColumn = 0;
MatchCollection Columns = Regex.Matches(Row.Value,
ColumnExpression,
RegexOptions.Singleline |
RegexOptions.Multiline |
RegexOptions.IgnoreCase);
foreach (Match Column in Columns)
{
dr[iCurrentColumn] = Column.Groups[1].ToString();
iCurrentColumn++;
}
dt.Rows.Add(dr);
}
iCurrentRow++;
}
ds.Tables.Add(dt);
}
return ds;
}
16 comments:
hi,
im assuming you found this code in VB.NET and converted. if thats the case, could you please post the original code?
many thanks!
ok , this what do you need ...
http://cid-38f3ba7902b36885.skydrive.live.com/self.aspx/class1%20html%20tables%20to%20ds%20by%20regex
many thanks!!!
welcome any time ......
Good one, but I found that its not working for nested tables.
For example, if you have table_2 in table_1, the regular expression returned only table_2.
ummmmmmmmmmm
Good notice - i will work to solve this problem soon
Thanks for your comment .....
awesome man, thanks
Hey, I'm kind of interested in the problem with nested tables. Is there any news of that?
Your function works very well, if possible, please email me once your new code done in c# to handle nested tables, thank you so much!!!
my email address schizoidia@yahoo.com.hk
Good job! But I encountered a similar problem with nested tables like others. Is there any way to solve this problem? I'll be much gratitude if you post such a solution :)
Works great for me (no nested tables). Thanks!
Great post, thanks. It really helped me out. :)
Man you are a life saver
Thanks for the code - this fit the bill perfectly. I was abl to adjust the regex for some elements to populate table names for better consuming of the data of the flip side.
Really good piece, man. Thanks.
Do you have this code in vb.net? if so can u post the same
Post a Comment