MacOS case insensitive file system


(Noseglasses) #1

Unfortunately, I do not have access to any Apple products, nor do I know anyone who owns an Apple. Nevertheless, I want to add MacOS support (travis testing with MacOS) to one of my projects.

As far as I can see, most MacOS systems are by default case insensitive. Because of that, I need a shell command that can be passed the absolute path to a file and that returns the path of the file with proper case.

Say, there is a file /aBc/Def.txt and I know that the lower case representation of the filename’s full path is /abc/def.txt. Thus, what I want is

> <magiccommand> /abc/def.txt
/aBc/Def.txt

EDIT 1: Maybe ls /abc/def.txt is enough, but I just can’t check. Can someone try that, please.

Any help highly appreciated.


(Michael Richters) #2

Unfortunately, ls foo/bar will always output foo/bar, even if the filename is FOO/bAr. However, with bind "set completion-ignore-case on", tab completion will correct the case of filenames on HFS+. I’m not sure how bash/readline does that, but it’s a place to start looking.


(Michael Richters) #3

GNU Emacs M-x find-file tab completion also corrects case on case-insensitive filesystems, but if I recall correctly, you prefer vi, so maybe that’s not helpful. I’d look at the code, but I don’t have time today, I’m afraid.

I’ll check back here, though; I’m sure I can run tests occasionally on an HFS+ filesystem if you think you’ve got a solution.

[Big thumbs down to case-insensitive filesystems, by the way. Boo!]


(Noseglasses) #4

Yes, at least from a developer point of view, case-insensitive filesystems are pretty anoying.
Unfortunately, I think Emacs and vi are no solution for my problem as I need some type of command that I can call from CMake. One that is available on any fresh MacOS system. After searching the web up and down I gave up and implemented a brute force recursive token-wise path comparison in CMake.

Thanks anyway.


(Michael Richters) #5

Oh, I didn’t mean that you could use Emacs to perform the conversion – I was thinking the source code might reveal a good way to do it.


(Michael Richters) #6

I don’t know why I didn’t think of this right away:

#!/bin/sh

DIR=$(dirname "${1}")

FILENAME=$(ls "${DIR}" | grep -i "${1}")

The argument to grep needs some escaping to handle some unusual filenames, of course, but this approach does work.


(Noseglasses) #7

Does your shell skript assume ${1} to be an absolute path? I tried it (on Linux) but it did not work.

I cannot see how grep-ing the absolute path in the path name returned by ls ${DIR} is supposed to work. Isn’t ls ${DIR} at most a substring of ${1} and not vice versa? Probably I am missing something.


(Michael Richters) #8

No, $1 is a relative pathname. I grep the dir listing (case-insensitive) to get just the filename of interest.


(Michael Richters) #9

Oh, sorry; that script just gets the relative filename, not the absolute path.


(Michael Richters) #10

A slightly better version:

#!/bin/sh

D=$(dirname "${1}")
F=$(basename "${1}")

(
    cd "${D}"
    REGEX='^'
    REGEX+="${F}"
    REGEX+='$'

    FILENAME=$(ls | grep -i "${REGEX}")

    echo $FILENAME
)

(Michael Richters) #11

I couldn’t get this off my mind, so here’s a much better version:

#!/bin/sh

get_canonical_name () {
    dir=$(dirname "${1}")
    file=$(basename "${1}")

    cd "${dir}"
    regexp='^'
    regexp+="${file}"
    regexp+='$'

    canonical_name=$(ls -1 | grep -i "${regexp}")
    
    echo "${canonical_name}"
}


D=$(dirname "${1}")
F=$(basename "${1}")

cd "${D}"

PATH_ELEMENTS=$(pwd | tr '/' '\n')/${F}

ABS_NAME=''
for dir in ${PATH_ELEMENTS}; do
    ABS_NAME+='/'
    ABS_NAME+=$(get_canonical_name "${ABS_NAME}/${dir}")
done

echo "${ABS_NAME}"

#12

No offense intended, but using the supplied filename as a regex gave me the hebbie-jebbies.

Also, it broke when given a path or filename containing a space.

Therefore I propose the following:

#!/bin/sh

get_canonical_name () {
    D=$(dirname "${1}")
    I=$(stat -f '%i' "${1}")
    R=$(ls -1ia "${D}" | grep "^ *${I} " | sed "s/^ *${I} //")

    if [ "${C}" ]
    then
        C="${R}/${C}"
    else
        C="${R}"
    fi

    if [ "${D}" != '.' ]
    then
        get_canonical_name "${D}"
    fi
}

G="${1}"

if [ -e "${G}" ]
then
    C=""
    get_canonical_name "${G}"
    echo "${C}"
fi

It uses the inode number (guaranteed to be unique with a directory) to match up the supplied name and the version returned by a ls of the containing directory.


(Michael Richters) #13

Oh, certainly; I didn’t have sufficient time to make it robust. I do think you shouldn’t use $C like that, though; better to make it the second argument to get_canonical_name.


#14

This is shell. All of the variables, including $1 and $2 are global. Since their globalness can’t be avoided, it might as well be embraced, right? :slight_smile:

From a certain perspective there’s danger in writing code that disguises the fact that the variables are global, especially with a recursive function. $C was the only variable that was being used in a way that required it to be global, making it stand out.

But, I also noticed that my original version went into an infinite loop when given an absolute path name, so since I had to update it anyway I went ahead and modified it as you suggested.

One minor bug: if it is given a filename with one (or more) leading ./ it will strip off the first one (i.e. ./foo will become foo and ././bar will become ./bar). I can’t see an easy way to eliminate it so I have left it alone.

#!/bin/sh

get_canonical_name () {
    D=$(dirname "${1}")
    I=$(stat -f '%i' "${1}")
    C=$(ls -1ia "${D}" | grep "^ *${I} " | sed "s/^ *${I} //")

    if [ "${2}" ]
    then
        C="${C}/${2}"
    fi

    if [ "${D}" = "/" ]
    then
        echo "/${C}"
    elif [ "${D}" = '.' ]
    then
        echo "${C}"
    else
        get_canonical_name "${D}" "${C}"
    fi
}

G="${1}"

if [ -e "${G}" ]
then
    get_canonical_name "${G}" ""
fi

(Michael Richters) #15

Not true, at least in bash, which is the default /bin/sh on macOS. Try this:

#!/bin/sh

testprint () {
    echo $1
}
echo $1
testprint foo
echo $1

$1 is a local variable in the function. There’s also local…

There’s no need for this script to be universal; it has a very specific purpose, for a particular OS, where bash is the default shell. There’s no need for any of these shenanigans on Linux or *BSD systems, so it’s really pointless to insist on POSIX compliance.


#16

Portability is a major concern at my day job. It’s a habit I like to keep in practice on. :slight_smile:


(Cy Rossignol) #17

Just out of curiosity…I’m not aware of any shells that place a function’s positional parameters back in the global scope. Is there a shell that does this? If so, I imagine that writing for such a shell would actually diminish a script’s portability because every other shell scopes function parameters locally.


(Cy Rossignol) #18

By the way @noseglasses, for simple cases, we don’t need to wrap the command if we’re using Bash. We can let the shell do the work with globbing:

<command> /a[Bb]c/[Dd]ef.txt

Course, this will match both /abc/def and /aBc/Def if they both exist, but I’m guessing this probably isn’t the case.

If you’ve got a Windows box, you can test on the case-insensitive filesystem in a Cygwin environment.


#19

I don’t recall, and I don’t have access to those systems any more. I spent a lot of time, mostly in korn shells, on several proprietary UNIXes, ending most of a decade ago.

Old habits die hard.