Detect Date Format & Change If Required?

Technical support and scripting issues

Moderators: Dorian (MJT support), JRL

Post Reply
RNIB
Macro Veteran
Posts: 192
Joined: Thu Jan 10, 2008 10:25 am
Location: London, UK

Detect Date Format & Change If Required?

Post by RNIB » Tue Nov 12, 2024 11:08 am

I'm stuck on what I hope is the final hurdle.

I'm reading the metadata of an HTML file and capturing in the variable dcDateMatches1_1 the date that has been entered.

What I now need to do is determine whether that date has been entered correctly or not. If a date has been entered in the wrong format but is a valid date, ideally I'd like to convert it into the correct format.

The correct date format should be: YYYY-MM-DD

However, it could be entered as any date format variation such as:
DD-MM-YYYY
DD-MM-YY
MM-DD-YYYY
MM-DD-YY
YYYY/MM/DD
DD/MM/YYYY
MM/DD/YYYY
YYYY.MM.DD
etc etc

How can I check what the date format is when I can't be sure either of the delimiter or the pattern used? I had thought about using things like MidStr to read specific characters but got stumped by the fact that the year could be written first or last and could be either a 2 digit or 4 digit number. Then thought it was probably a thing that RegEx could handle but I don't know enough (anything) about that to work it out.

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Re: Detect Date Format & Change If Required?

Post by Marcus Tettmar » Tue Nov 12, 2024 2:15 pm

You could certainly check for all these formats - I'd check for each in turn. However you are not always going to distinguish between DD-MM-YYYY and MM-DD-YYYY or DD/MM/YYYY and MM/DD/YYYY. If the day is less than 13 you have no way of knowing which one it is.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

RNIB
Macro Veteran
Posts: 192
Joined: Thu Jan 10, 2008 10:25 am
Location: London, UK

Re: Detect Date Format & Change If Required?

Post by RNIB » Tue Nov 12, 2024 2:37 pm

Marcus Tettmar wrote:
Tue Nov 12, 2024 2:15 pm
If the day is less than 13 you have no way of knowing which one it is.
Good point! Hmm. Seeing as the format needs to be YYYY-MM-DD and that it's more important that the date is in that format then it is that the date itself is accurate to the day, I guess I could look at the first 4 characters and as long as all were numeric and that the 5th and 8th characters were - I could say that the date was correct. If either of the first 4 characters weren't all numeric then I could generate a new date and if the 5th and 8th characters weren't a - I could replace them.

Is there a way of identifying if any of the first 4 characters are not numeric in one go, i.e. it would read 2024 as a number but 11-2 it would detect as not being a valid number, or do I need to check each character individually?

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 1384
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: Detect Date Format & Change If Required?

Post by Dorian (MJT support) » Tue Nov 12, 2024 4:23 pm

RNIB wrote:
Tue Nov 12, 2024 2:37 pm
Marcus Tettmar wrote:
Tue Nov 12, 2024 2:15 pm
If the day is less than 13 you have no way of knowing which one it is.
Is there a way of identifying if any of the first 4 characters are not numeric in one go, i.e. it would read 2024 as a number but 11-2 it would detect as not being a valid number, or do I need to check each character individually?
Regex isn't my strong point but Mr Google and I cobbled this together :)

Code: Select all

//Try each of these three
Let>text=2024/12/12
//Let>text=12/12/24
//Let>text=12/12/2024

Let>pattern=^\d{4}
RegEx>pattern,text,0,matches,num,0
if>num>0
  MDL>First four characters are digits
  Else
  MDL>First four characters are not digits
Endif
I couldn't figure out how we can determine if dates/months <13 would be MM/DD/YYYY or DD/MM/YYYY either - but I wonder if it's possible to make an educated guess based on the language you already grabbed in another thread. This may be a total red herring and the result of me overthinking, but maybe you can assume that en-GB could follow DD/MM whereas en-US is more likely to be MM/DD. Not sure I'd want to stake my life on it though, and I also wouldn't want the job of determining which date format all those language codes are most likely to use.
Yes, we have a Custom Scripting Service. Message me or go here

User avatar
JRL
Automation Wizard
Posts: 3526
Joined: Mon Jan 10, 2005 6:22 pm
Location: Iowa

Re: Detect Date Format & Change If Required?

Post by JRL » Tue Nov 12, 2024 9:19 pm

Do you have any control over the web page (parsed html)? If yes, I would address it there. If no, I think you're left guessing. Looks to me like Dorian's over-thinking has given you a better than 50/50 shot at getting the correct month and day.

For identifying the year you might try VBScript to put the date into your system format then parse the known format.

Code: Select all

Let>d=2024/02/12
Let>d=02/12/2024
Let>d=24-02-12
Let>d=2-12-24

VBEval>FormatDateTime("%d%"),res1
VBEval>FormatDateTime("%d%",1),res2
VBEval>FormatDateTime("%d%",2),res3

RNIB
Macro Veteran
Posts: 192
Joined: Thu Jan 10, 2008 10:25 am
Location: London, UK

Re: Detect Date Format & Change If Required?

Post by RNIB » Wed Nov 13, 2024 10:00 am

Thank you both for your help on this.

The reasoning that if the language were en-GB that the date format would be DD-MM-YYYY is certainly reasonable. However, unfortunately this isn't the case. Nor is it sadly possible to have any control over the parsed HTML file.

The files I'm working with are those of a DAISY audiobook. The DAISY format consists of a series of HTML, SMIL and MP3 files that are all generated by a DAISY authoring application. The DAISY standard states that the date format of one particular field of the metadata must be in YYYY-MM-DD format but this is entirely separate to the language field which simply says what language the book is in.

You would think, and I cannot understand why this is not the case, that a DAISY authoring application would prevent you from entering data in the wrong format but they don't. Sometimes they warn you, but they don't stop you.

All DAISY audiobooks are passed through a DAISY validation application which checks not only the metadata but the structure, timing information and a whole host of other things (between 2000-4000 checks per title on average). This can take several minutes to perform and what I'm trying to do is write something that just performs 20-30 checks just on the metadata, things that can easily be changed in a text editor rather than a DAISY authoring application. As a result I have to conform precisely to the requirements of the DAISY standard.

This particular date field, whilst required, isn't particularly important as it just contains the date that the book was made into a DAISY audiobook. It would be helpful to us on an administration level if the date was accurate and hence I'd rather not just replace them with a new date but if there was any uncertainty over the date format that had been entered, simply replacing it wouldn't be the end of the world.

I had managed to get different method working where I used MidStr to look at the individual characters and determine whether they were a number or a - and handle accordingly but, whilst it worked, it was much, much more lengthy so I'm definitely going to give both of these a try.

Post Reply
cron
Sign up to our newsletter for free automation tips, tricks & discounts