I've been configuring LDAP at work (specifically
OpenLDAP). Why, oh why does libnss-ldap find it necessary to do a reverse lookup on the IP address of the LDAP server it is configured to use
after it has already connected to it?
Presumably it's some sort of misguided attempt at checking the name matches what it expects (but I haven't given it a name in the first case...) or an attempt at getting a name for logging (but I'd much rather have the IP I put in the config file, thank you ery much).
It seems to serve no
real purpose other than to fsck up for poor unsuspecting people (like me) who gets stuck wondering why all the tools depending on it suddenly hangs in futex() syscalls all over the place. It was first after I suddenly noticed the connections to our DNS server via strace that it dawned on me why it was hanging.
/etc/nsswitch.conf helpfully warns that you need to make sure files or dns is consulted before LDAP for hosts entries to prevent infinite loops, but I had naively assumed that only applied to clients, and there were absolutely no warning that the entire thing would just refuse to work if there aren't DNS or /etc/hosts entries for the LDAP server itself.
The reason I specified the LDAP server (or rather LDAP OpenVz container) by IP instead of name was exactly to avoid running into problems with name resolution preventing LDAP lookups from working.
Multiple searches didn't turn up any hints either, so id you come across this, consider it a warning:
On Linux, always, always make sure your LDAP server have a reverse mapping, either by adding it to /etc/hosts on ALL your Linux clients or adding it to DNS (in which case you better make sure DNS works reliably), even when you specify your LDAP server by IP (of course if you specify by name you better make sure it resolves forward too without LDAP)
Secondly, if you have problems,
strace is your friend. Look for connect() and other network related syscalls when doing anything that you've set to require LDAP, such as "getent passwd", "getent hosts" etc.. Assuming you have stopped nscd (which makes debugging infinitely simpler) you should see a connect() and read/writes to your LDAP server. If you only see the connect it's either because the connect failed (look at the return code) or because the DNS lookup (forward and/or reverse) is still failing for libnss-ldap.