remove duplicate jpegs from folder
Moderators: Dorian (MJT support), JRL
- Dorian (MJT support)
- Automation Wizard
- Posts: 1389
- Joined: Sun Nov 03, 2002 3:19 am
- Contact:
remove duplicate jpegs from folder
Hi Guys,
I have a folder containing 10,000+ jpegs and I want to automatically remove all the duplicates.
Where would I start?
I tried using GetFileList and CompareBitmaps but of course tha only compares .bmp files.
I also looked into the image recognition plugin but that seems intended for recognising images built into something as opposed to in a folder.
I have a folder containing 10,000+ jpegs and I want to automatically remove all the duplicates.
Where would I start?
I tried using GetFileList and CompareBitmaps but of course tha only compares .bmp files.
I also looked into the image recognition plugin but that seems intended for recognising images built into something as opposed to in a folder.
- Phil Pendlebury
- Automation Wizard
- Posts: 543
- Joined: Tue Jan 16, 2007 9:00 am
- Contact:
Presumably they're not named the same otherwise they wouldn't be allowed in the same folder...
I think the only easy way to do this is as steven mentioned - using file sizes.
It would take hours upon hours to compare thousands of actual bitmaps one at a time (that's if they were bitmaps).
Surely you don't want to actually open each file:
Repeat 10000 times
Of course many files will have similar file sizes too.
I wonder if there's a clever way of getting image dimensions - maybe opening in viewer and getting the window size to a variable THEN file size?
I think the only easy way to do this is as steven mentioned - using file sizes.
It would take hours upon hours to compare thousands of actual bitmaps one at a time (that's if they were bitmaps).
Surely you don't want to actually open each file:
Code: Select all
Open file 1 in viewer
Take screen capture
Open every OTHER file one at a time comparing each to file 1
If it is the same as file 1 delete it
Start from step one using file 2
Of course many files will have similar file sizes too.
I wonder if there's a clever way of getting image dimensions - maybe opening in viewer and getting the window size to a variable THEN file size?
Phil Pendlebury - Linktree
- Marcus Tettmar
- Site Admin
- Posts: 7395
- Joined: Thu Sep 19, 2002 3:00 pm
- Location: Dorset, UK
- Contact:
- Dorian (MJT support)
- Automation Wizard
- Posts: 1389
- Joined: Sun Nov 03, 2002 3:19 am
- Contact:
Thanks for the ideas guys. hmm, MD5, I'll have to investigate that. I haven't even heard of it so it'll give me the chance to learn something new.
There are quite a few which have similar file sizes, so I'd imagine many of them might easily be the same size. Many are also the same dimensions. They are randomly named.
I'll try the MD5 method (if I can figure it out) and let you know how I get on.
There are quite a few which have similar file sizes, so I'd imagine many of them might easily be the same size. Many are also the same dimensions. They are randomly named.
I'll try the MD5 method (if I can figure it out) and let you know how I get on.
Hi Horoscopes2000,
No need to re-invent the wheel, check out http://www.dupemaster.com/ which can do this for you, not just for jpegs but duplicate files of all sorts.
If you have problems with v1.7, try v1.5 which I found works better for me.
However, if you do end up coding a macro solution to this, why not share it with others here on the forum... trying out and extending the scripts of others here on the forums is one of the best ways to learn more.
No need to re-invent the wheel, check out http://www.dupemaster.com/ which can do this for you, not just for jpegs but duplicate files of all sorts.
If you have problems with v1.7, try v1.5 which I found works better for me.
However, if you do end up coding a macro solution to this, why not share it with others here on the forum... trying out and extending the scripts of others here on the forums is one of the best ways to learn more.
jpuziano
Note: If anyone else on the planet would find the following useful...
[Open] PlayWav command that plays from embedded script data
...then please add your thoughts/support at the above post -
Note: If anyone else on the planet would find the following useful...
[Open] PlayWav command that plays from embedded script data
...then please add your thoughts/support at the above post -
-
- Automation Wizard
- Posts: 1101
- Joined: Fri Jan 07, 2005 5:55 pm
- Location: Somewhere else on the planet
How does one compare MD5's? I'm sure I must be doing something dumb because I've tried regular If>'s and complex expressions but I can't get MS to believe they are the same.mtettmar wrote:You could loop through the list of files and get an MD5 hash of each one - populate an array. If the hash is already in the array, delete the file (if the hash is the same the files must be identical).
Use the MD5 hash library on the plugins page.
Code: Select all
Let>HashLib=c:\hashlib\HashLib.dll
LibFunc>Hashlib,FileMD5,r,c:\mydir\mypic.jpg,buf
Let>hash=r_2
MDL>hash
LibFunc>Hashlib,FileMD5,q,c:\mydir\mypic.jpg,buf2
Let>hash2=q_2
MDL>hash2
If>hash2=hash
Goto>match
Else
Goto>nomatch
Endif
Label>match
MessageModal>match
Goto>theend
Label>nomatch
MessageModal>nomatch
Label>theend
- Marcus Tettmar
- Site Admin
- Posts: 7395
- Joined: Thu Sep 19, 2002 3:00 pm
- Location: Dorset, UK
- Contact:
"Functions return length of hash or zero if an error occurred."
Code: Select all
//Get MD5 hash of a file
LibFunc>d:\Hashlib,FileMD5,r1,c:\mydir\mypic.jpg,buf1
Mid>r1_2,1,r1,hash1
//Get MD5 hash of a file
LibFunc>d:\Hashlib,FileMD5,r2,c:\mydir\mypic.jpg,buf2
Mid>r2_2,1,r2,hash2
If>hash1=hash2
MessageModal>Bitmap files are identical
else
MessageModal>Bitmap files are different
Endif
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
- Marcus Tettmar
- Site Admin
- Posts: 7395
- Joined: Thu Sep 19, 2002 3:00 pm
- Location: Dorset, UK
- Contact:
"Functions return length of hash or zero if an error occurred."
The example just displays the contents in a message box, so any null chars past the end of the string won't matter.
But if you want to compare them you need to remove just the pertinent data. DLLs pass/return references to memory not actual literal strings. So when dealing with strings we pass a buffer which is a reference to memory and somehow need to know how much data was written to that buffer. As HashLib.txt says the "Functions return length of hash or zero if an error occurred.". So the return value gives you the amount of data in the buffer, so you should use that to retrieve it. Hence the Mid statement.
The example just displays the contents in a message box, so any null chars past the end of the string won't matter.
But if you want to compare them you need to remove just the pertinent data. DLLs pass/return references to memory not actual literal strings. So when dealing with strings we pass a buffer which is a reference to memory and somehow need to know how much data was written to that buffer. As HashLib.txt says the "Functions return length of hash or zero if an error occurred.". So the return value gives you the amount of data in the buffer, so you should use that to retrieve it. Hence the Mid statement.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
- Marcus Tettmar
- Site Admin
- Posts: 7395
- Joined: Thu Sep 19, 2002 3:00 pm
- Location: Dorset, UK
- Contact:
No problem. I have also just updated HashLib.txt in the library distribution to use Mid in the examples. So hopefully that will avoid confusion in future.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?