I'm stuck on what I hope is the final hurdle.
I'm reading the metadata of an HTML file and capturing in the variable dcDateMatches1_1 the date that has been entered.
What I now need to do is determine whether that date has been entered correctly or not. If a date has been entered in the wrong format but is a valid date, ideally I'd like to convert it into the correct format.
The correct date format should be: YYYY-MM-DD
However, it could be entered as any date format variation such as:
DD-MM-YYYY
DD-MM-YY
MM-DD-YYYY
MM-DD-YY
YYYY/MM/DD
DD/MM/YYYY
MM/DD/YYYY
YYYY.MM.DD
etc etc
How can I check what the date format is when I can't be sure either of the delimiter or the pattern used? I had thought about using things like MidStr to read specific characters but got stumped by the fact that the year could be written first or last and could be either a 2 digit or 4 digit number. Then thought it was probably a thing that RegEx could handle but I don't know enough (anything) about that to work it out.
Detect Date Format & Change If Required?
Moderators: Dorian (MJT support), JRL
- Marcus Tettmar
- Site Admin
- Posts: 7395
- Joined: Thu Sep 19, 2002 3:00 pm
- Location: Dorset, UK
- Contact:
Re: Detect Date Format & Change If Required?
You could certainly check for all these formats - I'd check for each in turn. However you are not always going to distinguish between DD-MM-YYYY and MM-DD-YYYY or DD/MM/YYYY and MM/DD/YYYY. If the day is less than 13 you have no way of knowing which one it is.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
Re: Detect Date Format & Change If Required?
Good point! Hmm. Seeing as the format needs to be YYYY-MM-DD and that it's more important that the date is in that format then it is that the date itself is accurate to the day, I guess I could look at the first 4 characters and as long as all were numeric and that the 5th and 8th characters were - I could say that the date was correct. If either of the first 4 characters weren't all numeric then I could generate a new date and if the 5th and 8th characters weren't a - I could replace them.Marcus Tettmar wrote: ↑Tue Nov 12, 2024 2:15 pmIf the day is less than 13 you have no way of knowing which one it is.
Is there a way of identifying if any of the first 4 characters are not numeric in one go, i.e. it would read 2024 as a number but 11-2 it would detect as not being a valid number, or do I need to check each character individually?
- Dorian (MJT support)
- Automation Wizard
- Posts: 1385
- Joined: Sun Nov 03, 2002 3:19 am
- Contact:
Re: Detect Date Format & Change If Required?
Regex isn't my strong point but Mr Google and I cobbled this togetherRNIB wrote: ↑Tue Nov 12, 2024 2:37 pmIs there a way of identifying if any of the first 4 characters are not numeric in one go, i.e. it would read 2024 as a number but 11-2 it would detect as not being a valid number, or do I need to check each character individually?Marcus Tettmar wrote: ↑Tue Nov 12, 2024 2:15 pmIf the day is less than 13 you have no way of knowing which one it is.
Code: Select all
//Try each of these three
Let>text=2024/12/12
//Let>text=12/12/24
//Let>text=12/12/2024
Let>pattern=^\d{4}
RegEx>pattern,text,0,matches,num,0
if>num>0
MDL>First four characters are digits
Else
MDL>First four characters are not digits
Endif
Yes, we have a Custom Scripting Service. Message me or go here
Re: Detect Date Format & Change If Required?
Do you have any control over the web page (parsed html)? If yes, I would address it there. If no, I think you're left guessing. Looks to me like Dorian's over-thinking has given you a better than 50/50 shot at getting the correct month and day.
For identifying the year you might try VBScript to put the date into your system format then parse the known format.
For identifying the year you might try VBScript to put the date into your system format then parse the known format.
Code: Select all
Let>d=2024/02/12
Let>d=02/12/2024
Let>d=24-02-12
Let>d=2-12-24
VBEval>FormatDateTime("%d%"),res1
VBEval>FormatDateTime("%d%",1),res2
VBEval>FormatDateTime("%d%",2),res3
Re: Detect Date Format & Change If Required?
Thank you both for your help on this.
The reasoning that if the language were en-GB that the date format would be DD-MM-YYYY is certainly reasonable. However, unfortunately this isn't the case. Nor is it sadly possible to have any control over the parsed HTML file.
The files I'm working with are those of a DAISY audiobook. The DAISY format consists of a series of HTML, SMIL and MP3 files that are all generated by a DAISY authoring application. The DAISY standard states that the date format of one particular field of the metadata must be in YYYY-MM-DD format but this is entirely separate to the language field which simply says what language the book is in.
You would think, and I cannot understand why this is not the case, that a DAISY authoring application would prevent you from entering data in the wrong format but they don't. Sometimes they warn you, but they don't stop you.
All DAISY audiobooks are passed through a DAISY validation application which checks not only the metadata but the structure, timing information and a whole host of other things (between 2000-4000 checks per title on average). This can take several minutes to perform and what I'm trying to do is write something that just performs 20-30 checks just on the metadata, things that can easily be changed in a text editor rather than a DAISY authoring application. As a result I have to conform precisely to the requirements of the DAISY standard.
This particular date field, whilst required, isn't particularly important as it just contains the date that the book was made into a DAISY audiobook. It would be helpful to us on an administration level if the date was accurate and hence I'd rather not just replace them with a new date but if there was any uncertainty over the date format that had been entered, simply replacing it wouldn't be the end of the world.
I had managed to get different method working where I used MidStr to look at the individual characters and determine whether they were a number or a - and handle accordingly but, whilst it worked, it was much, much more lengthy so I'm definitely going to give both of these a try.
The reasoning that if the language were en-GB that the date format would be DD-MM-YYYY is certainly reasonable. However, unfortunately this isn't the case. Nor is it sadly possible to have any control over the parsed HTML file.
The files I'm working with are those of a DAISY audiobook. The DAISY format consists of a series of HTML, SMIL and MP3 files that are all generated by a DAISY authoring application. The DAISY standard states that the date format of one particular field of the metadata must be in YYYY-MM-DD format but this is entirely separate to the language field which simply says what language the book is in.
You would think, and I cannot understand why this is not the case, that a DAISY authoring application would prevent you from entering data in the wrong format but they don't. Sometimes they warn you, but they don't stop you.
All DAISY audiobooks are passed through a DAISY validation application which checks not only the metadata but the structure, timing information and a whole host of other things (between 2000-4000 checks per title on average). This can take several minutes to perform and what I'm trying to do is write something that just performs 20-30 checks just on the metadata, things that can easily be changed in a text editor rather than a DAISY authoring application. As a result I have to conform precisely to the requirements of the DAISY standard.
This particular date field, whilst required, isn't particularly important as it just contains the date that the book was made into a DAISY audiobook. It would be helpful to us on an administration level if the date was accurate and hence I'd rather not just replace them with a new date but if there was any uncertainty over the date format that had been entered, simply replacing it wouldn't be the end of the world.
I had managed to get different method working where I used MidStr to look at the individual characters and determine whether they were a number or a - and handle accordingly but, whilst it worked, it was much, much more lengthy so I'm definitely going to give both of these a try.