Thread overview
File size
Aug 21, 2023
harakim
Aug 21, 2023
FeepingCreature
Aug 22, 2023
harakim
Aug 22, 2023
harakim
Aug 23, 2023
FeepingCreature
Aug 25, 2023
harakim
August 21, 2023

I have been doing some backups and I wrote a utility that determines if files are an exact match. As a shortcut, I check the file size. So far so good on this with millions of files until I found something odd: getSize() and DirEntry's .size are producing different values.

This is the relevant code:

	if (sourceFile.size != getSize(destinationFilename)) {
		if (getSize(sourceFile.name) != getSize(destinationFilename))
			writeln("Also did not match");
		else
			writeln("Did match so this is odd");

		return ArchivalStatus.SizeDidNotMatch;
	}

Whereas before it just returned SizeDidNotMatch, now it also prints "Did match so this is odd".

It seems really odd that getSize(sourceFile.name) is returning a different number than sourceFile.size. This is an external HDD on windows formatted in ntfs that it is reading. I believe I originally wrote the files to the file system in Windows, but then today I cut and paste them (in the same drive) in Linux. However, this is the first time this has happened after millions of comparisons and it only happened for about 6 files. It does happen consistently though.

I have verified that the file size is that reported by getSize and not sourceFile.size and that the files open correctly.

This is my compiler version:
DMD32 D Compiler v2.104.2-dirty

If this is actually a problem and I'm not missing something, I would not mind trying to fix this whenever I have some time.

August 21, 2023

On Monday, 21 August 2023 at 07:52:28 UTC, harakim wrote:

>

I have been doing some backups and I wrote a utility that determines if files are an exact match. As a shortcut, I check the file size. So far so good on this with millions of files until I found something odd: getSize() and DirEntry's .size are producing different values.

...

It seems really odd that getSize(sourceFile.name) is returning a different number than sourceFile.size. This is an external HDD on windows formatted in ntfs that it is reading. I believe I originally wrote the files to the file system in Windows, but then today I cut and paste them (in the same drive) in Linux. However, this is the first time this has happened after millions of comparisons and it only happened for about 6 files. It does happen consistently though.

I have verified that the file size is that reported by getSize and not sourceFile.size and that the files open correctly.

...

Can you print some of the wrong sizes? D's DirEntry iteration code just calls FindFirstFileW/FindNextFileW, so this shouldn't be a D-specific issue, and it should be possible to reproduce this in C.

August 22, 2023

On Monday, 21 August 2023 at 11:05:36 UTC, FeepingCreature wrote:

>

Can you print some of the wrong sizes? D's DirEntry iteration code just calls FindFirstFileW/FindNextFileW, so this shouldn't be a D-specific issue, and it should be possible to reproduce this in C.

Yes! I will get that information tomorrow.

August 22, 2023

On Monday, 21 August 2023 at 11:05:36 UTC, FeepingCreature wrote:

>

Can you print some of the wrong sizes? D's DirEntry iteration code just calls FindFirstFileW/FindNextFileW, so this shouldn't be a D-specific issue, and it should be possible to reproduce this in C.

Thanks for the suggestion. I was working on getting the list for you when I decided to first try and reproduce this on Linux. I was not able to do so. Then I opened the Linux File Explorer and went to one of the files. There were two files by that name, with names differing only by case.

In windows, I only saw one, because Windows Explorer only supports one file with an identical case-insensitive name per directory. Unsurprisingly, that is also the one that was selected by getSize(filename). The underlying windows functions must ignore case as well and select the same way as Explorer (which makes sense). That explains why Windows Explorer reported the same size as getsize(name) in every case, while DirEntry.size would match for the file with the same case as windows recognized and not for the file with a different case. I was able to get into this state because I copied the files (merged directories) in Linux.

It was interesting to look into. It seems everything is working as designed. It shouldn't be an issue for me going forward either as I move more and more towards Linux.

August 23, 2023

On Tuesday, 22 August 2023 at 16:22:52 UTC, harakim wrote:

>

On Monday, 21 August 2023 at 11:05:36 UTC, FeepingCreature wrote:

>

Can you print some of the wrong sizes? D's DirEntry iteration code just calls FindFirstFileW/FindNextFileW, so this shouldn't be a D-specific issue, and it should be possible to reproduce this in C.

Thanks for the suggestion. I was working on getting the list for you when I decided to first try and reproduce this on Linux. I was not able to do so. Then I opened the Linux File Explorer and went to one of the files. There were two files by that name, with names differing only by case.

In windows, I only saw one, because Windows Explorer only supports one file with an identical case-insensitive name per directory. Unsurprisingly, that is also the one that was selected by getSize(filename). The underlying windows functions must ignore case as well and select the same way as Explorer (which makes sense). That explains why Windows Explorer reported the same size as getsize(name) in every case, while DirEntry.size would match for the file with the same case as windows recognized and not for the file with a different case. I was able to get into this state because I copied the files (merged directories) in Linux.

It was interesting to look into. It seems everything is working as designed. It shouldn't be an issue for me going forward either as I move more and more towards Linux.

That's hilarious! I'm happy you found it.

August 25, 2023

On Wednesday, 23 August 2023 at 08:48:26 UTC, FeepingCreature wrote:

>

That's hilarious! I'm happy you found it.

Me too! Thanks for the support.
(PS I've already reformatted that drive to ext4.)