Use sed or awk to fix date format
I'm trying to convert a HTML containing a table to a .csv file using a
bash script.
So far I've acomplished the following steps:
Convert to Unix format (with dos2unix)
Remove all spaces and tabs (with sed 's/[ \t]//g')
Remove all the blank lines (with sed ':a;N;$!ba;s/\n//g') (this is
necesary, because the HTML file has a blank line for each cell of the
table... that's not my fault)
Remove the unnecesary <td> and <tr> tags (with sed 's/<t.>//g')
Replace </td> with ',' (with sed 's/<\/td/,/g')
Replace </tr> with end-of-line (\n) characters (with sed 's/<\/tr/\n/g')
Of course, I'm putting all this in a pipeline. So far, it's working great.
There's one final step I'm stuck with: The table has a column with dates,
which has the format dd/mm/yyyy, and I'd like to convert them to
yyyy-mm-dd.
Is there a (simple) way to do it (with sed or awk)?
Data sample (after the whole sed pipe):
500,2,13/09/2007,30000.00,12,B-1
501,2,15/09/2007,14000.00,8,B-2
Expected result:
500,2,2007-09-13,30000.00,12,B-1
501,2,2007-09-15,14000.00,8,B-2
The reason I need to do this is because I need to import this data to
MySQL. I could open the file in Excel and change the format by hand, but I
would like to skip that.
No comments:
Post a Comment