6 years agoFix wrong size of ub0_percpu v2.6.24-ovz006.4
Konstantin Khlebnikov [Tue, 7 Oct 2008 14:22:54 +0000]
Fix wrong size of ub0_percpu

The struct percpu_data dynamically allocated and have array only for
1 cpu, so static usage of it does not work.

Plus rework macros for static percpu variables declaration and

Singed-off-by: Konstantin Khlebnikov <>
Signed-off-by: Pavel Emelyanov <>

6 years agosunrpc: fix lost set_exec_env-back and unlock the op_sem v2.6.24-ovz006.3
Konstantin Khlebnikov [Wed, 1 Oct 2008 11:42:40 +0000]
sunrpc: fix lost set_exec_env-back and unlock the op_sem

Any NFS connect over TCP-IPv4 from VE block VE stop process.
This patch add missed unlock op_sem and set_exec_env.

(picked from openvz ubuntu branch patch
  2.6.18 not affected, 2.6.26+ already fixed by den@)

6 years agolinux-2.6.24-ovz006 released v2.6.24-ovz006
OpenVZ team [Wed, 24 Sep 2008 13:44:19 +0000]
linux-2.6.24-ovz006 released

6 years ago[NET]: sk_release_kernel needs to be exported to modules
David S. Miller [Fri, 29 Feb 2008 19:33:19 +0000]
[NET]: sk_release_kernel needs to be exported to modules


ERROR: "sk_release_kernel" [net/ipv6/ipv6.ko] undefined!

Signed-off-by: David S. Miller <>
(cherry picked from commit 45af1754bc09926b5e062bda24f789d7b320939f)

6 years ago[NET]: Make netlink_kernel_release publically available as sk_release_kernel.
Denis V. Lunev [Fri, 29 Feb 2008 19:18:32 +0000]
[NET]: Make netlink_kernel_release publically available as sk_release_kernel.

This staff will be needed for non-netlink kernel sockets, which should
also not pin a namespace like tcp_socket and icmp_socket.

Signed-off-by: Denis V. Lunev <>
Acked-by: Daniel Lezcano <>
Signed-off-by: David S. Miller <>
(cherry picked from commit edf0208702007ec1f6a36756fdd005f771a4cf17)

6 years ago[NETLINK]: No need for a separate __netlink_release call.
Denis V. Lunev [Fri, 29 Feb 2008 19:17:56 +0000]
[NETLINK]: No need for a separate __netlink_release call.

Merge it to netlink_kernel_release.

Signed-off-by: Denis V. Lunev <>
Acked-by: Daniel Lezcano <>
Signed-off-by: David S. Miller <>
(cherry picked from commit 9dfbec1fb2bedff6b118504055cd9f0485edba45)

6 years ago[NETNS]: Fix race between put_net() and netlink_kernel_create().
Pavel Emelyanov [Thu, 31 Jan 2008 03:31:06 +0000]
[NETNS]: Fix race between put_net() and netlink_kernel_create().

The comment about "race free view of the set of network
namespaces" was a bit hasty. Look (there even can be only
one CPU, as discovered by Alexey Dobriyan and Denis Lunev):

  if (atomic_dec_and_test(&net->refcnt))
    /* true */

 * note: the net now has refcnt 0, but still in
 * the global list of net namespaces

== re-schedule ==

       * we call netlink_kernel_create() here
       * in some places
            get_net(net); /* refcnt = 1 */
          * now we drop the net refcount not to
          * block the net namespace exit in the
          * future (or this can be done on the
          * error path)
             if (atomic_dec_and_test(&...))
                    * true. BOOOM! The net is
                    * scheduled for release twice

When thinking on this problem, I decided, that getting and
putting the net in init callback is wrong. If some init
callback needs to have a refcount-less reference on the struct
net, _it_ has to be careful himself, rather than relying on
the infrastructure to handle this correctly.

In case of netlink_kernel_create(), the problem is that the
sk_alloc() gets the given namespace, but passing the info
that we don't want to get it inside this call is too heavy.

Instead, I propose to crate the socket inside an init_net
namespace and then re-attach it to the desired one right
after the socket is created.

After doing this, we also have to be careful on error paths
not to drop the reference on the namespace, we didn't get
the one on.

Signed-off-by: Pavel Emelyanov <>
Acked-by: Denis Lunev <>
Signed-off-by: David S. Miller <>
(cherry picked from commit 23fe18669e7fdaf5b229747858d943a723124e2e)

6 years ago[NETNS]: Namespace stop vs 'ip r l' race.
Denis V. Lunev [Sat, 19 Jan 2008 07:55:19 +0000]
[NETNS]: Namespace stop vs 'ip r l' race.

backport mainline commit 775516bfa2bd7993620c9039191a0c30b8d8a496

During network namespace stop process kernel side netlink sockets
belonging to a namespace should be closed. They should not prevent
namespace to stop, so they do not increment namespace usage
counter. Though this counter will be put during last sock_put.

The raplacement of the correct netns for init_ns solves the problem
only partial as socket to be stoped until proper stop is a valid
netlink kernel socket and can be looked up by the user processes. This
is not a problem until it resides in initial namespace (no processes
inside this net), but this is not true for init_net.

So, hold the referrence for a socket, remove it from lookup tables and
only after that change namespace and perform a last put.

Signed-off-by: Denis V. Lunev <>
Tested-by: Alexey Dobriyan <>
Signed-off-by: David S. Miller <>

6 years ago[NETNS]: Consolidate kernel netlink socket destruction.
Denis V. Lunev [Mon, 28 Jan 2008 22:41:19 +0000]
[NETNS]: Consolidate kernel netlink socket destruction.

backport mainline commit b7c6ba6eb1234e35a74fb8ba8123232a7b1ba9e4

Create a specific helper for netlink kernel socket disposal. This just
let the code look better and provides a ground for proper disposal
inside a namespace.

Signed-off-by: Denis V. Lunev <>
Tested-by: Alexey Dobriyan <>
Signed-off-by: David S. Miller <>

6 years ago[NETNS]: Double free in netlink_release.
Denis V. Lunev [Sat, 19 Jan 2008 07:53:31 +0000]
[NETNS]: Double free in netlink_release.

Netlink protocol table is global for all namespaces. Some netlink
protocols have been virtualized, i.e. they have per/namespace netlink
socket. This difference can easily lead to double free if more than 1
namespace is started. Count the number of kernel netlink sockets to
track that this table is not used any more.

Signed-off-by: Denis V. Lunev <>
Tested-by: Alexey Dobriyan <>
Signed-off-by: David S. Miller <>
(cherry picked from commit 869e58f87094b1e8a0df49232e4a5172678d46c9)

6 years ago[UBC]: Double free for UDP socket aka
Denis Lunev [Tue, 23 Sep 2008 13:43:51 +0000]
[UBC]: Double free for UDP socket aka

The socket resided in UB space waiting queue could be released. In this
case ub_snd_wakeup running on the another CPU could hold/release that
socket effectively hitting 0 refcounter second time.

Signed-off-by: Denis V. Lunev <>
Signed-off-by: Pavel Emelyanov <>

6 years agoubc: uncharging too much for TCPSNDBUF
Denis V. Lunev [Mon, 14 Jul 2008 07:04:29 +0000]
ubc: uncharging too much for TCPSNDBUF

ubc: uncharging too much for TCPSNDBUF

It is not allowed to go to the label wait_for_memory with chargesize != 0
when this space is already placed to the skb.

Signed-off-by: Denis V. Lunev <>
Signed-off-by: Pavel Emelyanov <>

6 years agoEndless loop in __sk_stream_wait_memory.
Denis V. Lunev [Mon, 30 Jun 2008 07:05:14 +0000]
Endless loop in __sk_stream_wait_memory.

[UBC]: Endless loop in __sk_stream_wait_memory.

The loop in __sk_stream_wait_memory when tcp_sendmsg asks to wait for
TCPSNDBUF space is endless when the timeout is not specified. The only way
out is to queue a signal for that process.

Lets return a status flag from ub_sock_snd_queue_add that UB space is
available. This is enough to make a correct decision to leave the cycle.

Signed-off-by: Denis V. Lunev <>
Signed-off-by: Pavel Emelyanov <>
Signed-off-by: Pavel Emelyanov <>

6 years agoAllow envID fields in /proc/self/status in VE. Also allow get VPid,
Vitaliy Gusev [Thu, 21 Aug 2008 08:33:08 +0000]
Allow envID fields in /proc/self/status in VE. Also allow get VPid,
PNState, StopState, etc.

OpenVZ Bug #936

Signed-off-by: Vitaliy Gusev <>
Signed-off-by: Pavel Emelyanov <>

6 years agofutexes: fix fault handling in futex_lock_pi
Thomas Gleixner [Mon, 23 Jun 2008 23:30:13 +0000]
futexes: fix fault handling in futex_lock_pi

futexes: fix fault handling in futex_lock_pi

commit 1b7558e457ed0de61023cfc913d2c342c7c3d9f2 upstream

This patch addresses a very sporadic pi-futex related failure in
highly threaded java apps on large SMP systems.

David Holmes reported that the pi_state consistency check in
lookup_pi_state triggered with his test application. This means that
the kernel internal pi_state and the user space futex variable are out
of sync. First we assumed that this is a user space data corruption,
but deeper investigation revieled that the problem happend because the
pi-futex code is not handling a fault in the futex_lock_pi path when
the user space variable needs to be fixed up.

The fault happens when a fork mapped the anon memory which contains
the futex readonly for COW or the page got swapped out exactly between
the unlock of the futex and the return of either the new futex owner
or the task which was the expected owner but failed to acquire the
kernel internal rtmutex. The current futex_lock_pi() code drops out
with an inconsistent in case it faults and returns -EFAULT to user
space. User space has no way to fixup that state.

When we wrote this code we thought that we could not drop the hash
bucket lock at this point to handle the fault.

After analysing the code again it turned out to be wrong because there
are only two tasks involved which might modify the pi_state and the
user space variable:

 - the task which acquired the rtmutex
 - the pending owner of the pi_state which did not get the rtmutex

Both tasks drop into the fixup_pi_state() function before returning to
user space. The first task which acquired the hash bucket lock faults
in the fixup of the user space variable, drops the spinlock and calls
futex_handle_fault() to fault in the page. Now the second task could
acquire the hash bucket lock and tries to fixup the user space
variable as well. It either faults as well or it succeeds because the
first task already faulted the page in.

One caveat is to avoid a double fixup. After returning from the fault
handling we reacquire the hash bucket lock and check whether the
pi_state owner has been modified already.


Reported-by: David Holmes <>
Signed-off-by: Thomas Gleixner <>
Cc: Andrew Morton <>
Cc: David Holmes <>
Cc: Peter Zijlstra <>
Cc: Linus Torvalds <>
Cc: Peter Zijlstra <>
Signed-off-by: Ingo Molnar <>
Signed-off-by: Greg Kroah-Hartman <>
Signed-off-by: Pavel Emelyanov <>

6 years agoCPT: fix restore of inotify on symlink
Andrey Mirkin [Tue, 10 Jun 2008 14:47:57 +0000]
CPT: fix restore of inotify on symlink

Inside VE file /etc/mtab is a symlink to /proc/mounts.
FreeNX server with KDE creates inotify on /etc/mtab file.
To restore such inotify we need to obtain dentry with path_lookup() and
restore inotify on it.

Bug #96464

6 years agoNETFILTER: destroy nf_conntrack_cache correctly
Alexey Dobriyan [Tue, 10 Jun 2008 12:55:18 +0000]
NETFILTER: destroy nf_conntrack_cache correctly

6 years agoCPT: fix EXIT_DEAD/TASK_DEAD checks
Alexey Dobriyan [Mon, 9 Jun 2008 16:06:27 +0000]

For one thing EXIT_DEAD was moved to ->exit_state only.
For another, this task state is called TASK_DEAD now and lives in ->state;

6 years agoCPT: assign ->net_ns of restored tun/tap device
Alexey Dobriyan [Mon, 9 Jun 2008 13:08:16 +0000]
CPT: assign ->net_ns of restored tun/tap device

otherwise init_net is used and device becomes invisible in CT.

6 years agoVE: let ->ve_netns live a bit more
Alexey Dobriyan [Mon, 9 Jun 2008 11:53:08 +0000]
VE: let ->ve_netns live a bit more

1. netns shutdown is done asynchronously
2. nsproxy free is done synchronously
which means we can't use "get_exec_env()->ve_ns->net_ns" construct
anywhere in netns teardown codepath. ->ve_ns will be NULL (fixable) or
will point to freed memory (hardly fixable).

The solution it to pin netns one more time, and use get_exec_env()->ve_netns .
get_exec_env() is always valid. It's ->ve_netns will also be valid during
shutdown. As for ->ve_ns, we don't care from now.

6 years agoVE: introduce ->ve_netns
Alexey Dobriyan [Mon, 9 Jun 2008 11:46:39 +0000]
VE: introduce ->ve_netns

Preparations for fixing "NULL ->ve_ns" oops in inet6_rt_notify().

6 years agoCPT: fix compilation with CONFIG_SYSVIPC=n
Konstantin Khlebnikov [Sat, 7 Jun 2008 15:26:21 +0000]
CPT: fix compilation with CONFIG_SYSVIPC=n

6 years agoMemory leak on network namespace stop.
Denis V. Lunev [Fri, 6 Jun 2008 17:10:14 +0000]
Memory leak on network namespace stop.

mainline commit 4f84d82f7a623f8641af2574425c329431ff158f

Network namespace allocates 2 kernel netlink sockets, fibnl &
rtnl. These sockets should be disposed properly, i.e. by
sock_release. Plain sock_put is not enough.

Signed-off-by: Denis V. Lunev <>
Tested-by: Alexey Dobriyan <>
Signed-off-by: David S. Miller <>

6 years agoBackport "[NET]: Make rtnetlink infrastructure network namespace aware (v3)"
Alexey Dobriyan [Fri, 6 Jun 2008 16:26:24 +0000]
Backport "[NET]: Make rtnetlink infrastructure network namespace aware (v3)"

mainline commit 97c53cacf00d1f5aa04adabfebcc806ca8b22b10 + tweaks to get
netns from either netdevice ot something else.

[NET]: Make rtnetlink infrastructure network namespace aware (v3)

After this patch none of the netlink callback support anything
except the initial network namespace but the rtnetlink infrastructure
now handles multiple network namespaces.

Changes from v2:
- IPv6 addrlabel processing

Changes from v1:
- no need for special rtnl_unlock handling
- fixed IPv6 ndisc

Signed-off-by: Denis V. Lunev <>
Signed-off-by: Eric W. Biederman <>
Signed-off-by: David S. Miller <>

6 years agoIPv6: give owner_ve to fib_table and fib6_local_table
Alexey Dobriyan [Fri, 6 Jun 2008 16:20:58 +0000]
IPv6: give owner_ve to fib_table and fib6_local_table

otherwise eventually fib6_clean_all will execute code in NULL context
which is no-no.

6 years agonetlink: fix lookup check
Alexey Dobriyan [Fri, 6 Jun 2008 15:01:31 +0000]
netlink: fix lookup check

netlink_unicast() is done in init_net context because
a) rtnl socket is bound to init_net,
b) kernel-space socket is successfully looked up by any VE,
c) rtnl is kernel-spase socket.
which is b-r-o-k-e-n, because e.g. just about any manipulation with
netdevices via netlink will be projected onto VE0.

Fix (after per-netns rtnl socket patches)

6 years agoproc: fix proc_cwd_link
Alexey Dobriyan [Mon, 2 Jun 2008 13:53:28 +0000]
proc: fix proc_cwd_link

If d_root_check() in there fails, we shouldn't pretend everything is OK
and leave mnt unitialized or NULL (in case /proc/*/cwd).

6 years agoIPv6: get frag's owner VE from inet_frag_queue
Alexey Dobriyan [Thu, 29 May 2008 11:18:19 +0000]
IPv6: get frag's owner VE from inet_frag_queue

IPv6 specific frag queue doesn't need owner_ve, because it's already in core
data structure (struct inet_frag_queue).

And it's in fact NULL, which is the cause of

6 years agoRemove spurious warnings in kernel/time.c
Alexey Dobriyan [Wed, 28 May 2008 15:51:14 +0000]
Remove spurious warnings in kernel/time.c

E.g. code in clock_t_to_jiffies() divides ~0UL thus assuming that all
"unsigned long" range is valid. Ditto for other functions. Alexey said
these warnings are old debugging stuff.

6 years agoUBC: drop cpuset lock from OOM handling
Alexey Dobriyan [Mon, 26 May 2008 10:36:30 +0000]
UBC: drop cpuset lock from OOM handling

cpuset_lock dances around OOM killing are gone in main code, so
no need to account for them.

Mainline commit 3ff566963ce804809af9e32331b287eedeeff501
Bug 112959

[ BUG: bad unlock balance detected! ]
tstspoof/29391 is trying to release lock (callback_mutex) at: [<c04488d2>] ub_oom_lock+0x9a/0xd6
but there are no more locks to release!
other info that might help us debug this:
1 lock held by tstspoof/29391:
 #0:  (&mm->mmap_sem){----}, at: [<c060b06d>] do_page_fault+0x1d9/0x5fb
stack backtrace:
Pid: 29391, comm: tstspoof Not tainted 2.6.24-openvz #4
 [<c044dc24>] print_unlock_inbalance_bug+0xe7/0xf3
 [<c04488d2>] ub_oom_lock+0x9a/0xd6
 [<c0440265>] ktime_get_ts+0x16/0x44
 [<c0444b22>] tick_program_event+0x33/0x52
 [<c044e984>] mark_held_locks+0x39/0x53
 [<c040516b>] restore_nocheck+0x12/0x15
 [<c044eb6b>] trace_hardirqs_on+0x122/0x145
 [<c04488d2>] ub_oom_lock+0x9a/0xd6
 [<c044fdfc>] lock_release+0x148/0x16e
 [<c0608291>] __mutex_unlock_slowpath+0xd3/0x140
 [<c04488d2>] ub_oom_lock+0x9a/0xd6
 [<c043d365>] autoremove_wake_function+0x0/0x35
 [<c046a876>] out_of_memory+0x5d/0x177
 [<c046c823>] __alloc_pages+0xc3/0x38b
 [<c04761ff>] handle_mm_fault+0x226/0x87e
 [<c060b06d>] do_page_fault+0x1d9/0x5fb
 [<c060b115>] do_page_fault+0x281/0x5fb
 [<c040516b>] restore_nocheck+0x12/0x15
 [<c060ae94>] do_page_fault+0x0/0x5fb
 [<c0609a92>] error_code+0x72/0x78

6 years ago[PATCH] Stick back to mainline behaviour of zero length mmap(2)
Alexey Dobriyan [Thu, 22 May 2008 15:39:49 +0000]
[PATCH] Stick back to mainline behaviour of zero length mmap(2)

6 years agoVLAN: fix rmmod 8021q with vlan interface setup
Alexey Dobriyan [Tue, 20 May 2008 12:42:16 +0000]
VLAN: fix rmmod 8021q with vlan interface setup

6 years agoNETFILTER: make ip_conntrack_disable_ve0 option do something
Alexey Dobriyan [Fri, 16 May 2008 13:39:02 +0000]
NETFILTER: make ip_conntrack_disable_ve0 option do something

6 years agoNETFILTER: changes for conntrack CPT
Alexey Dobriyan [Fri, 16 May 2008 11:22:41 +0000]
NETFILTER: changes for conntrack CPT

6 years ago[PATCH] kernel.cap-bound sysctl cleanup
Vasily Averin [Fri, 16 May 2008 10:07:37 +0000]
[PATCH] kernel.cap-bound sysctl cleanup
- proc entry is global and therefore it is ReadOnly-accessible from inside VE

6 years agoAdd /proc/sys/fs/lsyscall_enable
Dmitry Monakhov [Fri, 16 May 2008 09:29:51 +0000]
Add /proc/sys/fs/lsyscall_enable

Sysctl instoduced mostly for testing purposes.

6 years agoAllow to change SysRq in Alt+SysRq+* combo
Alexandr Andreev [Fri, 16 May 2008 08:53:42 +0000]
Allow to change SysRq in Alt+SysRq+* combo

You can get scancodes of your keyboard with programs like showkey or evtest.
The default Alt+SysRq combination still works after redifinition.

6 years agoCPT: SMP race in detecting state of ptraced processes
Alexey Kuznetsov [Thu, 15 May 2008 14:54:23 +0000]
CPT: SMP race in detecting state of ptraced processes

When suspending VE, we test state of processes while they are
still running. It is not a bug: we have to verify for invalid state
before checkpointing, real state is saved after processes are scheduled

The impact is that we can see process in a bad state, f.e. stopped
without any reasons. It is also not a bug, but this rersults in random
failures of checkpointing. The only way to fix this is to order updates
of state variables. The order is correct almost everywhere.

6 years agoVZDQ: correct size on /proc/vz/aquota/*/aquota.*
Vasily Tarasov [Thu, 15 May 2008 14:52:00 +0000]
VZDQ: correct size on /proc/vz/aquota/*/aquota.*

Bug #59920

Signed-off-by: Vasily Tarasov <>
Signed-off-by: Denis Lunev <>

6 years agoBRIDGE: correct checking for input packets
Vitaliy Gusev [Thu, 15 May 2008 14:45:50 +0000]
BRIDGE: correct checking for input packets

When via_phys_dev flag is set then bridge doesn't have any ip address.
Therefore ip-traffic HW->VE passes only if brigge has the same MAC-address as
real ethernet interface.

Bug #92737

6 years agoia64: generate cpu_khz
Alexey Dobriyan [Thu, 15 May 2008 14:28:06 +0000]
ia64: generate cpu_khz

6 years agoCPT: changes to core shmem to support iterative shmem migration
Alexey Kuznetsov [Thu, 15 May 2008 14:22:24 +0000]
CPT: changes to core shmem to support iterative shmem migration

New exported function shmem_insert_page() to insert new page to shmem inode.
No ifdefs. It cannot be private to CPT because triggers too much of exports.

6 years agoLinux 2.6.24-ovz005 v2.6.24-ovz005
Alexey Dobriyan [Thu, 8 May 2008 09:04:01 +0000]
Linux 2.6.24-ovz005

6 years agoMerge
Alexey Dobriyan [Wed, 7 May 2008 08:12:29 +0000]

Merge branch 'master' of git:// into 2.6.24-openvz



6 years agoLinux
Greg Kroah-Hartman [Tue, 6 May 2008 23:22:34 +0000]

6 years agofix SMP ordering hole in fcntl_setlk() (CVE-2008-1669)
Al Viro [Tue, 6 May 2008 17:58:34 +0000]
fix SMP ordering hole in fcntl_setlk() (CVE-2008-1669)

commit 0b2bac2f1ea0d33a3621b27ca68b9ae760fca2e9 upstream.

fcntl_setlk()/close() race prevention has a subtle hole - we need to
make sure that if we *do* have an fcntl/close race on SMP box, the
access to descriptor table and inode->i_flock won't get reordered.

As it is, we get STORE inode->i_flock, LOAD descriptor table entry vs.
STORE descriptor table entry, LOAD inode->i_flock with not a single
lock in common on both sides.  We do have BKL around the first STORE,
but check in locks_remove_posix() is outside of BKL and for a good
reason - we don't want BKL on common path of close(2).

Solution is to hold ->file_lock around fcheck() in there; that orders
us wrt removal from descriptor table that preceded locks_remove_posix()
on close path and we either come first (in which case eviction will be
handled by the close side) or we'll see the effect of close and do
eviction ourselves.  Note that even though it's read-only access,
we do need ->file_lock here - rcu_read_lock() won't be enough to
order the things.

Signed-off-by: Al Viro <>
Signed-off-by: Greg Kroah-Hartman <>

6 years agoMerge
Alexey Dobriyan [Mon, 5 May 2008 15:15:50 +0000]

Merge branch 'master' of git:// into 2.6.24-openvz



6 years agoCPT: fix shmat(2)'ted segments
Alexey Dobriyan [Sun, 4 May 2008 13:45:04 +0000]
CPT: fix shmat(2)'ted segments

Commit bc56bba8f31bd99f350a5ebfd43d50f411b620c7 aka
"[PATCH] shm: make sysv ipc shared memory use stacked files"...

It changed number and relationship of "struct file"s associated
with SysV shmem:

Before: one struct file for each shmem segment
 After: one struct file for each shmem segment
        + one struct file (different) for each shmat(2) call.

Obviously checkpointing broke horribly. There aren't any files of second sort
in image and they have to be recreated by hand.

What code will do:
a) if CPT_OBJ_SYSV_SHM object restored first -- fine, restore as previous kernels did
b) if CPT_VMA_TYPE_SHM restored first -- restore corresponding segment, then do more
or less similar to what do_shmat() does.
c) if shmem segment already was restored, correct refcounting and just do shmat() part

6 years agoLinux
Greg Kroah-Hartman [Thu, 1 May 2008 21:50:00 +0000]

6 years agoFix dnotify/close race (CVE-2008-1375)
Al Viro [Thu, 1 May 2008 02:52:22 +0000]
Fix dnotify/close race (CVE-2008-1375)

commit 214b7049a7929f03bbd2786aaef04b8b79db34e2 upstream.

We have a race between fcntl() and close() that can lead to
dnotify_struct inserted into inode's list *after* the last descriptor
had been gone from current->files.

Since that's the only point where dnotify_struct gets evicted, we are
screwed - it will stick around indefinitely.  Even after struct file in
question is gone and freed.  Worse, we can trigger send_sigio() on it at
any later point, which allows to send an arbitrary signal to arbitrary
process if we manage to apply enough memory pressure to get the page
that used to host that struct file and fill it with the right pattern...

Signed-off-by: Al Viro <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>

6 years agoISDN: Do not validate ISDN net device address prior to interface-up
Paul Bolle [Mon, 14 Apr 2008 05:44:20 +0000]
ISDN: Do not validate ISDN net device address prior to interface-up

Commit bada339 (Validate device addr prior to interface-up) caused a regression
in the ISDN network code, see:
The trivial fix is to remove the pointer to eth_validate_addr() in the
net_device struct in isdn_net_init().

Signed-off-by: Paul Bolle <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>

6 years agoV4L: cx88: enable radio GPIO correctly
Steven Toth [Fri, 25 Apr 2008 00:52:42 +0000]
V4L: cx88: enable radio GPIO correctly

This patch fixes an issue on the HVR1300, where GPIO is blown away due to
the radio input being undefined, breaking the functionality of the DVB
demodulator and MPEG2 encoder used on the cx8802 mpeg TS port.

This is a minimal patch for 2.6.26 and the -stable series.  This must be
fixed a better way for 2.6.27.

Signed-off-by: Steven Toth <>
Signed-off-by: Mauro Carvalho Chehab <>
Signed-off-by: Michael Krufky <>
(cherry picked from commit 6b92b3bd7ac91b7e255541f4be9bfd55b12dae41)
Signed-off-by: Greg Kroah-Hartman <>

6 years agoV4L: Fix VIDIOCGAP corruption in ivtv
Alan Cox [Fri, 25 Apr 2008 00:52:26 +0000]
V4L: Fix VIDIOCGAP corruption in ivtv

Frank Bennett reported that ivtv was causing skype to crash. With help
from one of their developers he showed it was a kernel problem.
VIDIOCGCAP copies a name into a fixed length buffer - ivtv uses names
that are too long and does not truncate them so corrupts a few bytes of
the app data area.

Possibly the names also want trimming but for now this should fix the
corruption case.

Signed-off-by: Alan Cox <>
Signed-off-by: Hans Verkuil <>
Signed-off-by: Mauro Carvalho Chehab <>
Signed-off-by: Michael Krufky <>
(cherry picked from commit d2b213f7b76f187c4391079c7581d3a08b940133)
Signed-off-by: Greg Kroah-Hartman <>

6 years agoUSB: remove broken usb-serial num_endpoints check
Greg Kroah-Hartman [Thu, 17 Apr 2008 03:05:15 +0000]
USB: remove broken usb-serial num_endpoints check

commit: 07c3b1a1001614442c665570942a3107a722c314

The num_interrupt_in, num_bulk_in, and other checks in the usb-serial
code are just wrong, there are too many different devices out there with
different numbers of endpoints.  We need to just be sticking with the
device ids instead of trying to catch this kind of thing.  It broke too
many different devices.

This fixes a large number of usb-serial devices to get them working
properly again.

Cc: Oliver Neukum <>
Signed-off-by: Greg Kroah-Hartman <>

6 years agoIncrease the max_burst threshold from 3 to tp->reordering.
John Heffner [Fri, 25 Apr 2008 08:43:57 +0000]
Increase the max_burst threshold from 3 to tp->reordering.

[ Upstream commit: dd9e0dda66ba38a2ddd1405ac279894260dc5c36 ]

This change is necessary to allow cwnd to grow during persistent
reordering.  Cwnd moderation is applied when in the disorder state
and an ack that fills the hole comes in.  If the hole was greater
than 3 packets, but less than tp->reordering, cwnd will shrink when
it should not have.

Signed-off-by: John Heffner <jheffner@napa.none>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>

6 years agoJFFS2: Fix free space leak with in-band cleanmarkers
David Woodhouse [Wed, 23 Apr 2008 10:15:35 +0000]
JFFS2: Fix free space leak with in-band cleanmarkers

We were accounting for the cleanmarker by calling jffs2_link_node_ref()
(without locking!), which adjusted both superblock and per-eraseblock
accounting, subtracting the size of the cleanmarker from {jeb,c}->free_size
and adding it to {jeb,c}->used_size.

But only _then_ were we adding the size of the newly-erased block back
to the superblock counts, and we were adding each of jeb->{free,used}_size
to the corresponding superblock counts. Thus, the size of the cleanmarker
was effectively subtracted from the superblock's free_size _twice_.

Fix this, by always adding a full eraseblock size to c->free_size when
we've erased a block. And call jffs2_link_node_ref() under the proper
lock, while we're at it.

Thanks to Alexander Yurchenko and/or Damir Shayhutdinov for (almost)
pinpointing the problem.

[Backport of commit 014b164e1392a166fe96e003d2f0e7ad2e2a0bb7]

Signed-off-by: David Woodhouse <>
Signed-off-by: Greg Kroah-Hartman <>

6 years agoUSB: gadget: queue usb USB_CDC_GET_ENCAPSULATED_RESPONSE message
Jan Altenberg [Tue, 19 Feb 2008 00:44:50 +0000]
USB: gadget: queue usb USB_CDC_GET_ENCAPSULATED_RESPONSE message

backport of 41566bcf35a8b23ce4715dadb5acfd1098c1d3e4

commit 0cf4f2de0a0f4100795f38ef894d4910678c74f8 introduced a bug, which
prevents sending an USB_CDC_GET_ENCAPSULATED_RESPONSE message. This
breaks the RNDIS initialization (especially / only Windoze machines
dislike this behavior...).

Signed-off-by: Benedikt Spranger <>
Signed-off-by: Jan Altenberg <>
Acked-by: David Brownell <>
Cc: Vernon Sauder <>
Signed-off-by: Greg Kroah-Hartman <>

6 years agotehuti: move ioctl perm check closer to function start (CVE-2008-1675)
Jeff Garzik [Fri, 25 Apr 2008 07:11:31 +0000]
tehuti: move ioctl perm check closer to function start (CVE-2008-1675)

Commit f946dffed6334f08da065a89ed65026ebf8b33b4 upstream

Noticed by davem.

Signed-off-by: Jeff Garzik <>
Signed-off-by: Greg Kroah-Hartman <>

6 years agotehuti: check register size (CVE-2008-1675)
Francois Romieu [Sun, 20 Apr 2008 17:32:34 +0000]
tehuti: check register size (CVE-2008-1675)

Signed-off-by: Francois Romieu <>
Signed-off-by: Jeff Garzik <>
Signed-off-by: Greg Kroah-Hartman <>

6 years agox86: Fix 32-bit x86 MSI-X allocation leakage
PJ Waskiewicz [Mon, 28 Apr 2008 18:56:03 +0000]
x86: Fix 32-bit x86 MSI-X allocation leakage

commit 9d9ad4b51d2b29b5bbeb4011f5e76f7538119cf9 upstream

This bug was introduced in the 2.6.24 lguest tree merge, where
MSI-X vector allocation will eventually fail.  The cause is the new
bit array tracking used vectors is not getting cleared properly on
IRQ destruction on the 32-bit APIC code.

This can be seen easily using the ixgbe 10 GbE driver on multi-core
systems by simply loading and unloading the driver a few times.
Depending on the number of available vectors on the host system, the
MSI-X allocation will eventually fail, and the driver will only be
able to use legacy interrupts.

Signed-off-by: Peter P Waskiewicz Jr <>
Signed-off-by: Greg Kroah-Hartman <>

6 years agofix oops on rmmod capidrv
Karsten Keil [Fri, 25 Jan 2008 10:55:28 +0000]
fix oops on rmmod capidrv

commit eb36f4fc019835cecf0788907f6cab774508087b upstream.

Fix overwriting the stack with the version string
(it is currently 10 bytes + zero) when unloading the
capidrv module. Safeguard against overwriting it
should the version string grow in the future.

Should fix Kernel Bug Tracker Bug 9696.

Signed-off-by: Gerd v. Egidy <>
Acked-by: Karsten Keil <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>

6 years agosplice: use mapping_gfp_mask
Hugh Dickins [Thu, 3 Apr 2008 22:35:22 +0000]
splice: use mapping_gfp_mask

upstream commit: 4cd13504652d28e16bf186c6bb2bbb3725369383

The loop block driver is careful to mask __GFP_IO|__GFP_FS out of its
mapping_gfp_mask, to avoid hangs under memory pressure.  But nowadays
it uses splice, usually going through __generic_file_splice_read.  That
must use mapping_gfp_mask instead of GFP_KERNEL to avoid those hangs.

Signed-off-by: Hugh Dickins <>
Cc: Jens Axboe <>
Cc: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Chris Wright <>

6 years agoFAIRSCHED: move to kernel/fairsched.c
Alexey Dobriyan [Wed, 30 Apr 2008 14:40:47 +0000]
FAIRSCHED: move to kernel/fairsched.c

It was there before, so make patch application slightly easier.

6 years agoNETFILTER: remove mismerge in mark_source_chains()
Alexey Dobriyan [Wed, 30 Apr 2008 09:55:27 +0000]
NETFILTER: remove mismerge in mark_source_chains()

7 years agoBackport "SLUB: Do not upset lockdep"
Peter Zijlstra [Fri, 25 Apr 2008 09:11:31 +0000]
Backport "SLUB: Do not upset lockdep"

commit ba84c73c7ae21fc891a3c2576fa3be42752fce53
Author: root <>
Date:   Mon Jan 7 23:20:28 2008 -0800

    SLUB: Do not upset lockdep

    inconsistent {softirq-on-W} -> {in-softirq-W} usage.
    swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
     (&n->list_lock){-+..}, at: [<ffffffff802935c1>] add_partial+0x31/0xa0
    {softirq-on-W} state was registered at:
      [<ffffffff80259fb8>] __lock_acquire+0x3e8/0x1140
      [<ffffffff80259838>] debug_check_no_locks_freed+0x188/0x1a0
      [<ffffffff8025ad65>] lock_acquire+0x55/0x70
      [<ffffffff802935c1>] add_partial+0x31/0xa0
      [<ffffffff805c76de>] _spin_lock+0x1e/0x30
      [<ffffffff802935c1>] add_partial+0x31/0xa0
      [<ffffffff80296f9c>] kmem_cache_open+0x1cc/0x330
      [<ffffffff805c7984>] _spin_unlock_irq+0x24/0x30
      [<ffffffff802974f4>] create_kmalloc_cache+0x64/0xf0
      [<ffffffff80295640>] init_alloc_cpu_cpu+0x70/0x90
      [<ffffffff8080ada5>] kmem_cache_init+0x65/0x1d0
      [<ffffffff807f1b4e>] start_kernel+0x23e/0x350
      [<ffffffff807f112d>] _sinittext+0x12d/0x140
      [<ffffffffffffffff>] 0xffffffffffffffff

    This change isn't really necessary for correctness, but it prevents lockdep
    from getting upset and then disabling itself.

Signed-off-by: Peter Zijlstra <>
Cc: Christoph Lameter <>
Cc: Kamalesh Babulal <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Christoph Lameter <>

7 years agoExpand VE0 cpu stats
Pavel Emelianov [Tue, 22 Apr 2008 15:39:43 +0000]
Expand VE0 cpu stats

Stable commit 28680bfb8269703def997e2269caf9bfe2de489c
shrank struct percpu_data from NR_CPUS pointers to just 1,
so space for VE0 cpu statistics (which is allocated very early)
was too small resulting in oops in

7 years agoMerge
Alexey Dobriyan [Tue, 22 Apr 2008 13:10:13 +0000]

7 years agoLeave irq state alone during call_console_drivers()
Alexey Dobriyan [Tue, 22 Apr 2008 11:20:19 +0000]
Leave irq state alone during call_console_drivers()

Mainline does so at least.

7 years agoFix dcache accounting interaction with SLUB
Alexey Dobriyan [Tue, 22 Apr 2008 10:47:50 +0000]
Fix dcache accounting interaction with SLUB

SLUB passes allocations greater than PAGE_SIZE/2 directly to page
allocator, so in case of large names there is no cache associated with
them and no ->objuse counter.

Account for PAGE_SIZE in such cases.

7 years agoLinux
Chris Wright [Sat, 19 Apr 2008 01:53:39 +0000]

7 years agolocks: fix possible infinite loop in fcntl(F_SETLKW) over nfs
J. Bruce Fields [Mon, 14 Apr 2008 19:03:02 +0000]
locks: fix possible infinite loop in fcntl(F_SETLKW) over nfs

upstream commit: 19e729a928172103e101ffd0829fd13e68c13f78

Miklos Szeredi found the bug:

"Basically what happens is that on the server nlm_fopen() calls
nfsd_open() which returns -EACCES, to which nlm_fopen() returns

"On the client this will turn into a -EAGAIN (nlm_stat_to_errno()),
which in will cause fcntl_setlk() to retry forever."

So, for example, opening a file on an nfs filesystem, changing
permissions to forbid further access, then trying to lock the file,
could result in an infinite loop.

And Trond Myklebust identified the culprit, from Marc Eshel and I:

7723ec9777d9832849b76475b1a21a2872a40d20 "locks: factor out
generic/filesystem switch from setlock code"

That commit claimed to just be reshuffling code, but actually introduced
a behavioral change by calling the lock method repeatedly as long as it
returned -EAGAIN.

We assumed this would be safe, since we assumed a lock of type SETLKW
would only return with either success or an error other than -EAGAIN.
However, nfs does can in fact return -EAGAIN in this situation, and
independently of whether that behavior is correct or not, we don't
actually need this change, and it seems far safer not to depend on such
assumptions about the filesystem's ->lock method.

Therefore, revert the problematic part of the original commit.  This
leaves vfs_lock_file() and its other callers unchanged, while returning
fcntl_setlk and fcntl_setlk64 to their former behavior.

Signed-off-by: J. Bruce Fields <>
Tested-by: Miklos Szeredi <>
Cc: Trond Myklebust <>
Cc: Marc Eshel <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Chris Wright <>

7 years agofile capabilities: remove cap_task_kill()
Serge Hallyn [Fri, 29 Feb 2008 15:14:57 +0000]
file capabilities: remove cap_task_kill()

upstream commit: aedb60a67c10a0861af179725d060765262ba0fb

The original justification for cap_task_kill() was as follows:

check_kill_permission() does appropriate uid equivalence checks.
However with file capabilities it becomes possible for an
unprivileged user to execute a file with file capabilities
resulting in a more privileged task with the same uid.

However now that cap_task_kill() always returns 0 (permission
granted) when p->uid==current->uid, the whole hook is worthless,
and only likely to create more subtle problems in the corner cases
where it might still be called but return -EPERM.  Those cases
are basically when uids are different but euid/suid is equivalent
as per the check in check_kill_permission().

One example of a still-broken application is 'at' for non-root users.

This patch removes cap_task_kill().

Signed-off-by: Serge Hallyn <>
Acked-by: Andrew G. Morgan <>
Earlier-version-tested-by: Luiz Fernando N. Capitulino <>
Acked-by: Casey Schaufler <>
Signed-off-by: Linus Torvalds <>
[ backport to]
Signed-off-by: Chris Wright <>

7 years agomacb: Call phy_disconnect on removing
Atsushi Nemoto [Thu, 10 Apr 2008 14:30:07 +0000]
macb: Call phy_disconnect on removing

upstream commit: 84b7901f8d5a17536ef2df7fd628ab865df8fe3a

Call phy_disconnect() on remove routine.  Otherwise the phy timer
causes a kernel crash when unloading.

Signed-off-by: Atsushi Nemoto <>
Signed-off-by: Jeff Garzik <>
Cc: Haavard Skinnemoen <>
Signed-off-by: Chris Wright <>

7 years agofbdev: fix /proc/fb oops after module removal
Alexey Dobriyan [Wed, 16 Apr 2008 02:45:07 +0000]
fbdev: fix /proc/fb oops after module removal

upstream commit: c43f89c2084f46e3ec59ddcbc52ecf4b1e9b015a

/proc/fb is not removed during rmmod.

Steps to reproduce:

modprobe fb
rmmod fb
ls /proc

BUG: unable to handle kernel paging request at ffffffffa0094370
IP: [<ffffffff802b92a1>] proc_get_inode+0x101/0x130
PGD 203067 PUD 207063 PMD 17e758067 PTE 0
Oops: 0000 [1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:05:02.0/resource
Modules linked in: nf_conntrack_irc xt_state iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables x_tables vfat fat usbhid ehci_hcd uhci_hcd usbcore sr_mod cdrom [last unloaded: fb]
Pid: 21205, comm: ls Not tainted 2.6.25-rc8-mm2 #14
RIP: 0010:[<ffffffff802b92a1>]  [<ffffffff802b92a1>] proc_get_inode+0x101/0x130
RSP: 0018:ffff81017c4bfc78  EFLAGS: 00010246
RAX: 0000000000008000 RBX: ffff8101787f5470 RCX: 0000000048011ccc
RDX: ffffffffa0094320 RSI: ffff810006ad43b0 RDI: ffff81017fc2cc00
RBP: ffff81017e450300 R08: 0000000000000002 R09: ffff81017c5d1000
R10: 0000000000000000 R11: 0000000000000246 R12: ffff81016b903a28
R13: ffff81017f822020 R14: ffff81017c4bfd58 R15: ffff81017f822020
FS:  00007f08e71696f0(0000) GS:ffff81017fc06480(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffa0094370 CR3: 000000017e54a000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ls (pid: 21205, threadinfo ffff81017c4be000, task ffff81017de48770)
Stack:  ffff81017c5d1000 00000000ffffffea ffff81017e450300 ffffffff802bdd1e
 ffff81017f802258 ffff81017c4bfe48 ffff81016b903a28 ffff81017f822020
 ffff81017c4bfd48 ffffffff802b9ba0 ffff81016b903a28 ffff81017f802258
Call Trace:
 [<ffffffff802bdd1e>] ? proc_lookup_de+0x8e/0x100
 [<ffffffff802b9ba0>] ? proc_root_lookup+0x20/0x60
 [<ffffffff802882a7>] ? do_lookup+0x1b7/0x210
 [<ffffffff8028883d>] ? __link_path_walk+0x53d/0x7f0
 [<ffffffff80295eb8>] ? mntput_no_expire+0x28/0x130
 [<ffffffff80288b4a>] ? path_walk+0x5a/0xc0
 [<ffffffff80288dd3>] ? do_path_lookup+0x83/0x1c0
 [<ffffffff80287785>] ? getname+0xe5/0x210
 [<ffffffff80289adb>] ? __user_walk_fd+0x4b/0x80
 [<ffffffff8028236c>] ? vfs_lstat_fd+0x2c/0x70
 [<ffffffff8028bf1e>] ? filldir+0xae/0xf0
 [<ffffffff802b92e9>] ? de_put+0x9/0x50
 [<ffffffff8029633d>] ? mnt_want_write+0x2d/0x80
 [<ffffffff8029339f>] ? touch_atime+0x1f/0x170
 [<ffffffff802b9b1d>] ? proc_root_readdir+0x7d/0xa0
 [<ffffffff802825e7>] ? sys_newlstat+0x27/0x50
 [<ffffffff8028bffb>] ? vfs_readdir+0x9b/0xd0
 [<ffffffff8028c0fe>] ? sys_getdents+0xce/0xe0
 [<ffffffff8020b39b>] ? system_call_after_swapgs+0x7b/0x80

Code: b7 83 b2 00 00 00 25 00 f0 00 00 3d 00 80 00 00 74 19 48 89 93 f0 00 00 00 48 89 df e8 39 9a fd ff 48 89 d8 48 83 c4 08 5b 5d c3 <48> 83 7a 50 00 48 c7 c0 60 16 45 80 48 c7 c2 40 17 45 80 48 0f
RIP  [<ffffffff802b92a1>] proc_get_inode+0x101/0x130
 RSP <ffff81017c4bfc78>
CR2: ffffffffa0094370
---[ end trace c71hiarjan8ab739 ]---

Signed-off-by: Alexey Dobriyan <>
"Antonino A. Daplas" <>
Cc: <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Chris Wright <>

7 years agoacpi: bus: check once more for an empty list after locking it
Chuck Ebbert [Wed, 16 Apr 2008 02:45:05 +0000]
acpi: bus: check once more for an empty list after locking it

upstream commit: f0a37e008750ead1751b7d5e89d220a260a46147

List could have become empty after the unlocked check that was made earlier,
so check again inside the lock.

Should fix

Signed-off-by: Chuck Ebbert <>
Cc: <>
Cc: Len Brown <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Chris Wright <>

7 years agoPARISC fix signal trampoline cache flushing
Kyle McMartin [Tue, 15 Apr 2008 22:36:38 +0000]
PARISC fix signal trampoline cache flushing

upstream commit: cf39cc3b56bc4a562db6242d3069f65034ec7549

The signal trampolines were accidently flushing the kernel I$ instead of
the users.  Fix that up, and also add a missing user D$ flush while
we're at it.

Signed-off-by: Kyle McMartin <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Chris Wright <>

7 years agoPARISC pdc_console: fix bizarre panic on boot
Kyle McMartin [Tue, 15 Apr 2008 16:46:03 +0000]
PARISC pdc_console: fix bizarre panic on boot

upstream commit ef1afd4d79f0479960ff36bb5fe6ec6eba1ebff2

commit 721fdf34167580ff98263c74cead8871d76936e6
Author: Kyle McMartin <>
Date:   Thu Dec 6 09:32:15 2007 -0800

    [PARISC] print more than one character at a time for pdc console

introduced a subtle bug by accidentally removing the "static" from
iodc_dbuf. This resulted in, what appeared to be, a trap without
*current set to a task. Probably the result of a trap in real mode
while calling firmware.

Also do other misc clean ups. Since the only input from firmware is non
blocking, share iodc_dbuf between input and output, and spinlock the
only callers.

[jejb: fixed up rejections against the stable tree]

Signed-off-by: Kyle McMartin <>
Signed-off-by: James Bottomley <>
Signed-off-by: Chris Wright <>

7 years agoPARISC futex: special case cmpxchg NULL in kernel space
Kyle McMartin [Tue, 15 Apr 2008 15:45:11 +0000]
PARISC futex: special case cmpxchg NULL in kernel space

upstream commit: c20a84c91048c76c1379011c96b1a5cee5c7d9a0

commit f9e77acd4060fefbb60a351cdb8d30fca27fe194
Author: Thomas Gleixner <>
Date:   Sun Feb 24 02:10:05 2008 +0000

    futex: runtime enable pi and robust functionality

which was backported to stable based on mainline Commit
a0c1e9073ef7428a14309cba010633a6cd6719ea added code to futex.c
to detect whether futex_atomic_cmpxchg_inatomic was implemented at run

+       curval = cmpxchg_futex_value_locked(NULL, 0, 0);
+       if (curval == -EFAULT)
+               futex_cmpxchg_enabled = 1;

This is bogus on parisc, since page zero in kernel virtual space is the
gateway page for syscall entry, and should not be read from the kernel.
(That, and we really don't like the kernel faulting on its own address

Signed-off-by: Kyle McMartin <>
Signed-off-by: James Bottomley <>
Signed-off-by: Chris Wright <>

7 years agopnpacpi: reduce printk severity for "pnpacpi: exceeded the max number of ..."
Len Brown [Tue, 15 Apr 2008 07:16:56 +0000]
pnpacpi: reduce printk severity for "pnpacpi: exceeded the max number of ..."

upstream commit 33fd7afd66ffdc6addf1b085fe6403b6af532f8e

We have been printing these messages at KERN_ERR since 2.6.24,

But KERN_ERR pops up on a console booted with "quiet"
and causes users to get alarmed and file bugs
about the message itself:

So reduce the severity of these messages to
KERN_WARNING, which is not printed by "quiet".

This message will still be seen without "quiet",
but a lot of messages are printed in that mode
and it will be less likely to cause undue alarm.

We could go all the way to KERN_DEBUG, but this
is a real warning after all, so it seems prudent
not to require "debug" to see it.

Signed-off-by: Len Brown <>
Signed-off-by: Chris Wright <>

7 years agoPOWERPC: Fix build of modular drivers/macintosh/apm_emu.c
Guido Guenther [Tue, 15 Apr 2008 13:45:51 +0000]
POWERPC: Fix build of modular drivers/macintosh/apm_emu.c

upstream commit: 620a245978d007279bc5c7c64e15f5f63af9af98

Currently, if drivers/macintosh/apm_emu is a module and the config
doesn't have CONFIG_SUSPEND we get:

ERROR: "pmu_batteries" [drivers/macintosh/apm_emu.ko] undefined!
ERROR: "pmu_battery_count" [drivers/macintosh/apm_emu.ko] undefined!
ERROR: "pmu_power_flags" [drivers/macintosh/apm_emu.ko] undefined!

on PPC32.  The variables aren't wrapped in '#if defined(CONFIG_SUSPEND)'
so we probably shouldn't wrap the exports either.  This removes the
CONFIG_SUSPEND part of the export, which fixes compilation on ppc32.

Signed-off-by: Guido Guenther <>
Signed-off-by: Paul Mackerras <> notes:

The details can be found at

Cc: Mike Pagano <>
Signed-off-by: Chris Wright <>

7 years agomd: close a livelock window in handle_parity_checks5
Dan Williams [Fri, 11 Apr 2008 16:55:06 +0000]
md: close a livelock window in handle_parity_checks5

upstream commit: bd2ab67030e9116f1e4aae1289220255412b37fd

If a failure is detected after a parity check operation has been initiated,
but before it completes handle_parity_checks5 will never quiesce operations on
the stripe.

Explicitly handle this case by "canceling" the parity check, i.e.  clear the
STRIPE_OP_CHECK flags and queue the stripe on the handle list again to refresh
any non-uptodate blocks.

Kernel versions >= 2.6.23 are susceptible.

Cc: <>
Cc: NeilBrown <>
Signed-off-by: Dan Williams <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Chris Wright <>

7 years agosignalfd: fix for incorrect SI_QUEUE user data reporting
Davide Libenzi [Fri, 11 Apr 2008 16:55:04 +0000]
signalfd: fix for incorrect SI_QUEUE user data reporting

upstream commit: 0859ab59a8a48d2a96b9d2b7100889bcb6bb5818

Michael Kerrisk found out that signalfd was not reporting back user data
pushed using sigqueue:

The following patch makes signalfd report back the ssi_ptr and ssi_int members
of the signalfd_siginfo structure.

Signed-off-by: Davide Libenzi <>
Acked-by: Michael Kerrisk <>
Cc: <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Chris Wright <>

7 years agoplip: replace spin_lock_irq with spin_lock_irqsave in irq context
Mikulas Patocka [Mon, 31 Mar 2008 23:22:45 +0000]
plip: replace spin_lock_irq with spin_lock_irqsave in irq context

upstream commit: cabce28ec0a0ae3d0ddfa4461f0e8be94ade9e46

Plip uses spin_lock_irq/spin_unlock_irq in its IRQ handler (called from
parport IRQ handler), the latter enables interrupts without parport
subsystem IRQ handler expecting it.

The bug can be seen if you compile kernel with lock dependency checking
and use plip --- it produces a warning.

This patch changes it to spin_lock_irqsave/spin_lock_irqrestore, so that
it doesn't enable interrupts when already disabled.

Signed-off-by: Linus Torvalds <>
Signed-off-by: Chris Wright <>

7 years agoacpi: fix "buggy BIOS check" when CPUs are hot removed
Alok Kataria [Thu, 10 Apr 2008 01:50:05 +0000]
acpi: fix "buggy BIOS check" when CPUs are hot removed

upstream commit: ba62b077871a5255e271f4fdae57167651839277

Fixes a BUG in ACPI hotplugging.

processor_device_array[pr->id] needs to be set to NULL when removing a CPU.
Else the "buggy BIOS check" in acpi_processor_start mistakenly fires when a
CPU is removed from the system and then later re-added.

Signed-off-by: Alok N Kataria <>
Signed-off-by: Dan Arai <>
Cc: Len Brown <>
Cc: <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Chris Wright <>

7 years agoHFS+: fix unlink of links
Roman Zippel [Wed, 9 Apr 2008 15:44:07 +0000]
HFS+: fix unlink of links

upstream commit: 76b0c26af2736b7e5b87e6ed7ab63901483d5736

Some time ago while attempting to handle invalid link counts, I botched
the unlink of links itself, so this patch fixes this now correctly, so
that only the link count of nodes that don't point to links is ignored.
Thanks to Vlado Plaga <> to notify me of this

Signed-off-by: Roman Zippel <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Chris Wright <>

7 years agoDVB: tda10086: make the 22kHz tone for DISEQC a config option
Hartmut Hackmann [Wed, 9 Apr 2008 01:12:41 +0000]
DVB: tda10086: make the 22kHz tone for DISEQC a config option

(backported from commit ea75baf4b0f117564bd50827a49c4b14d61d24e9)

Some cards need the diseqc signal modulated, while some just need
the envelope to control the LNB supply.

This fixes Bug 9887

Signed-off-by: Hartmut Hackmann <>
Acked-by: Oliver Endriss <>
Signed-off-by: Mauro Carvalho Chehab <>
Cc: Hermann Pitton <>
Signed-off-by: Michael Krufky <>
Signed-off-by: Chris Wright <>

7 years agoSPARC64: Fix FPU saving in 64-bit signal handling.
David S. Miller [Tue, 8 Apr 2008 05:24:24 +0000]
SPARC64: Fix FPU saving in 64-bit signal handling.

Upstream commit: 7c3cce978e4f933ac13758ec5d2554fc8d0927d2

The calculation of the FPU reg save area pointer
was wrong.

Based upon an OOPS report from Tom Callaway.

Signed-off-by: David S. Miller <>
Signed-off-by: Chris Wright <>

7 years agobluetooth: hci_core: defer hci_unregister_sysfs()
Dave Young [Thu, 6 Mar 2008 02:45:59 +0000]
bluetooth: hci_core: defer hci_unregister_sysfs()

upstream commit: 147e2d59833e994cc99341806a88b9e59be41391

Alon Bar-Lev reports:

 Feb 16 23:41:33 alon1 usb 3-1: configuration #1 chosen from 1 choice
Feb 16 23:41:33 alon1 BUG: unable to handle kernel NULL pointer
dereference at virtual address 00000008
Feb 16 23:41:33 alon1 printing eip: c01b2db6 *pde = 00000000
Feb 16 23:41:33 alon1 Oops: 0000 [#1] PREEMPT
Feb 16 23:41:33 alon1 Modules linked in: ppp_deflate zlib_deflate
zlib_inflate bsd_comp ppp_async rfcomm l2cap hci_usb vmnet(P)
vmmon(P) tun radeon drm autofs4 ipv6 aes_generic crypto_algapi
ieee80211_crypt_ccmp nf_nat_irc nf_nat_ftp nf_conntrack_irc
nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat ipt_REJECT
xt_tcpudp ipt_LOG xt_limit xt_state nf_conntrack_ipv4 nf_conntrack
iptable_filter ip_tables x_tables snd_pcm_oss snd_mixer_oss
snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device
bluetooth ppp_generic slhc ioatdma dca cfq_iosched cpufreq_powersave
cpufreq_ondemand cpufreq_conservative acpi_cpufreq freq_table uinput
fan af_packet nls_cp1255 nls_iso8859_1 nls_utf8 nls_base pcmcia
snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm nsc_ircc snd_timer
ipw2200 thinkpad_acpi irda snd ehci_hcd yenta_socket uhci_hcd
psmouse ieee80211 soundcore intel_agp hwmon rsrc_nonstatic pcspkr
e1000 crc_ccitt snd_page_alloc i2c_i801 ieee80211_crypt pcmcia_core
agpgart thermal bat!
tery nvram rtc sr_mod ac sg firmware_class button processor cdrom
unix usbcore evdev ext3 jbd ext2 mbcache loop ata_piix libata sd_mod
Feb 16 23:41:33 alon1
Feb 16 23:41:33 alon1 Pid: 4, comm: events/0 Tainted: P
(2.6.24-gentoo-r2 #1)
Feb 16 23:41:33 alon1 EIP: 0060:[<c01b2db6>] EFLAGS: 00010282 CPU: 0
Feb 16 23:41:33 alon1 EIP is at sysfs_get_dentry+0x26/0x80
Feb 16 23:41:33 alon1 EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX:
Feb 16 23:41:33 alon1 ESI: f72eb900 EDI: f4803ae0 EBP: f4803ae0 ESP:
Feb 16 23:41:33 alon1 hcid[7004]: HCI dev 0 registered
Feb 16 23:41:33 alon1 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Feb 16 23:41:33 alon1 Process events/0 (pid: 4, ti=f7c48000
task=f7c3efc0 task.ti=f7c48000)
Feb 16 23:41:33 alon1 Stack: f7cb6140 f4822668 f7e71e10 c01b304d
ffffffff ffffffff fffffffe c030ba9c
Feb 16 23:41:33 alon1 f7cb6140 f4822668 f6da6720 f7cb6140 f4822668
f6da6720 c030ba8e c01ce20b
Feb 16 23:41:33 alon1 f6e9dd00 c030ba8e f6da6720 f6e9dd00 f6e9dd00
00000000 f4822600 00000000
Feb 16 23:41:33 alon1 Call Trace:
Feb 16 23:41:33 alon1 [<c01b304d>] sysfs_move_dir+0x3d/0x1f0
Feb 16 23:41:33 alon1 [<c01ce20b>] kobject_move+0x9b/0x120
Feb 16 23:41:33 alon1 [<c0241711>] device_move+0x51/0x110
Feb 16 23:41:33 alon1 [<f9aaed80>] del_conn+0x0/0x70 [bluetooth]
Feb 16 23:41:33 alon1 [<f9aaed99>] del_conn+0x19/0x70 [bluetooth]
Feb 16 23:41:33 alon1 [<c012c1a1>] run_workqueue+0x81/0x140
Feb 16 23:41:33 alon1 [<c02c0c88>] schedule+0x168/0x2e0
Feb 16 23:41:33 alon1 [<c012fc70>] autoremove_wake_function+0x0/0x50
Feb 16 23:41:33 alon1 [<c012c9cb>] worker_thread+0x9b/0xf0
Feb 16 23:41:33 alon1 [<c012fc70>] autoremove_wake_function+0x0/0x50
Feb 16 23:41:33 alon1 [<c012c930>] worker_thread+0x0/0xf0
Feb 16 23:41:33 alon1 [<c012f962>] kthread+0x42/0x70
Feb 16 23:41:33 alon1 [<c012f920>] kthread+0x0/0x70
Feb 16 23:41:33 alon1 [<c0104c2f>] kernel_thread_helper+0x7/0x18
Feb 16 23:41:33 alon1 =======================
Feb 16 23:41:33 alon1 Code: 26 00 00 00 00 57 89 c7 a1 50 1b 3a c0
56 53 8b 70 38 85 f6 74 08 8b 0e 85 c9 74 58 ff 06 8b 56 50 39 fa 74
47 89 fb eb 02 89 c3 <8b> 43 08 39 c2 75 f7 8b 46 08 83 c0 68 e8 98
e7 10 00 8b 43 10
Feb 16 23:41:33 alon1 EIP: [<c01b2db6>] sysfs_get_dentry+0x26/0x80
SS:ESP 0068:f7c49efc
Feb 16 23:41:33 alon1 ---[ end trace aae864e9592acc1d ]---

Defer hci_unregister_sysfs because hci device could be destructed
while hci conn devices still there.

Signed-off-by: Dave Young <>
Tested-by: Stefan Seyfried <>
Acked-by: Alon Bar-Lev <>
Signed-off-by: Andrew Morton <>
Acked-by: Marcel Holtmann <> notes:

This patch fixes

Cc: Daniel Drake <>
Signed-off-by: Chris Wright <>

7 years agosis190: read the mac address from the eeprom first
Francois Romieu [Mon, 18 Feb 2008 20:20:32 +0000]
sis190: read the mac address from the eeprom first

upstream commit: 563e0ae06ff18f0b280f11cf706ba0172255ce52

Reading a serie of zero from the cmos sram area do not work
well with is_valid_ether_addr(). Let's read the mac address
from the eeprom first as it seems more reliable.

Fix for

Signed-off-by: Francois Romieu <>
Signed-off-by: Jeff Garzik <> notes:
This patch fixes

Cc: Daniel Drake <>
Signed-off-by: Chris Wright <>

7 years agolibata: assume no device is attached if both IDENTIFYs are aborted
Tejun Heo [Sun, 23 Mar 2008 06:16:53 +0000]
libata: assume no device is attached if both IDENTIFYs are aborted

upstream commit: 1ffc151fcddf524d0c76709d7e7a2af0255acb6b

This is to fix bugzilla #10254.  QSI cdrom attached to pata_sis as
secondary master appears as phantom device for the slave.
Interestingly, instead of not setting DRQ after IDENTIFY which
triggers NODEV_HINT, it aborts both IDENTIFY and IDENTIFY PACKET which
makes EH retry.

Modify EH such that it assumes no device is attached if both flavors
of IDENTIFY are aborted by the device.  There really isn't much point
in retrying when the device actively aborts the commands.

While at it, convert NODEV detection message to ata_dev_printk() to
help debugging obscure detection problems.

This problem was reported by Jan Bücken.

Signed-off-by: Tejun Heo <>
Cc: Jan Bücken <>
Acked-by: Alan Cox <>
Signed-off-by: Jeff Garzik <> notes:

This patch fixes

Cc: Daniel Drake <>
Signed-off-by: Chris Wright <>

7 years agoSPARC64: flush_ptrace_access() needs preemption disable.
David S. Miller [Mon, 7 Apr 2008 07:26:11 +0000]
SPARC64: flush_ptrace_access() needs preemption disable.

Upstream commit: f6a843d939ade435e060d580f5c56d958464f8a5

Based upon a report by Mariusz Kozlowski.

Signed-off-by: David S. Miller <>
Signed-off-by: Chris Wright <>

7 years agoSPARC64: Fix __get_cpu_var in preemption-enabled area.
David S. Miller [Mon, 7 Apr 2008 07:25:35 +0000]
SPARC64: Fix __get_cpu_var in preemption-enabled area.

Upstream commit: 69072f6e8e4bd4799d2a54e4ff8771d0657512c1

Reported by Mariusz Kozlowski.

Signed-off-by: David S. Miller <>
Signed-off-by: Chris Wright <>

7 years agoSPARC64: Fix atomic backoff limit.
David S. Miller [Mon, 7 Apr 2008 07:25:20 +0000]
SPARC64: Fix atomic backoff limit.

Upstream commit: 4cfea5a7dfcc2766251e50ca30271a782d5004ad

4096 will not fit into the immediate field of a compare instruction,
in fact it will end up being -4096 causing the check to fail every
time and thus disabling backoff.

Signed-off-by: David S. Miller <>
Signed-off-by: Chris Wright <>

7 years agoVLAN: Don't copy ALLMULTI/PROMISC flags from underlying device
Patrick McHardy [Mon, 7 Apr 2008 06:46:45 +0000]
VLAN: Don't copy ALLMULTI/PROMISC flags from underlying device

Upstream commit: 0ed21b321a13421e2dfeaa70a6c324e05e3e91e6

Changing these flags requires to use dev_set_allmulti/dev_set_promiscuity
or dev_change_flags. Setting it directly causes two unwanted effects:

- the next dev_change_flags call will notice a difference between
  dev->gflags and the actual flags, enable promisc/allmulti
  mode and incorrectly update dev->gflags

- this keeps the underlying device in promisc/allmulti mode until
  the VLAN device is deleted

[ Ported back to 2.6.24 VLAN code. -DaveM ]

Signed-off-by: Patrick McHardy <>
Signed-off-by: David S. Miller <>
Signed-off-by: Chris Wright <>

7 years agoTCP: Let skbs grow over a page on fast peers
Herbert Xu [Mon, 7 Apr 2008 06:43:38 +0000]
TCP: Let skbs grow over a page on fast peers

Upstream commit: 69d1506731168d6845a76a303b2c45f7c05f3f2c

While testing the virtio-net driver on KVM with TSO I noticed
that TSO performance with a 1500 MTU is significantly worse
compared to the performance of non-TSO with a 16436 MTU.  The
packet dump shows that most of the packets sent are smaller
than a page.

Looking at the code this actually is quite obvious as it always
stop extending the packet if it's the first packet yet to be
sent and if it's larger than the MSS.  Since each extension is
bound by the page size, this means that (given a 1500 MTU) we're
very unlikely to construct packets greater than a page, provided
that the receiver and the path is fast enough so that packets can
always be sent immediately.

The fix is also quite obvious.  The push calls inside the loop
is just an optimisation so that we don't end up doing all the
sending at the end of the loop.  Therefore there is no specific
reason why it has to do so at MSS boundaries.  For TSO, the
most natural extension of this optimisation is to do the pushing
once the skb exceeds the TSO size goal.

This is what the patch does and testing with KVM shows that the
TSO performance with a 1500 MTU easily surpasses that of a 16436
MTU and indeed the packet sizes sent are generally larger than

I don't see any obvious downsides for slower peers or connections,
but it would be prudent to test this extensively to ensure that
those cases don't regress.

Signed-off-by: Herbert Xu <>
Signed-off-by: David S. Miller <>
Signed-off-by: Chris Wright <>

7 years agoTCP: Fix shrinking windows with window scaling
Patrick McHardy [Mon, 7 Apr 2008 06:43:18 +0000]
TCP: Fix shrinking windows with window scaling

Upstream commit: 607bfbf2d55dd1cfe5368b41c2a81a8c9ccf4723

When selecting a new window, tcp_select_window() tries not to shrink
the offered window by using the maximum of the remaining offered window
size and the newly calculated window size. The newly calculated window
size is always a multiple of the window scaling factor, the remaining
window size however might not be since it depends on rcv_wup/rcv_nxt.
This means we're effectively shrinking the window when scaling it down.

The dump below shows the problem (scaling factor 2^7):

- Window size of 557 (71296) is advertised, up to 3111907257:

IP > . ack 3111835961 win 557 <...>

- New window size of 514 (65792) is advertised, up to 3111907217, 40 bytes
  below the last end:

IP > . 3113575668:3113577116(1448) ack 3111841425 win 514 <...>

The number 40 results from downscaling the remaining window:

3111907257 - 3111841425 = 65832
65832 / 2^7 = 514
65832 % 2^7 = 40

If the sender uses up the entire window before it is shrunk, this can have
chaotic effects on the connection. When sending ACKs, tcp_acceptable_seq()
will notice that the window has been shrunk since tcp_wnd_end() is before
tp->snd_nxt, which makes it choose tcp_wnd_end() as sequence number.
This will fail the receivers checks in tcp_sequence() however since it
is before it's tp->rcv_wup, making it respond with a dupack.

If both sides are in this condition, this leads to a constant flood of
ACKs until the connection times out.

Make sure the window is never shrunk by aligning the remaining window to
the window scaling factor.

Signed-off-by: Patrick McHardy <>
Signed-off-by: David S. Miller <>
Signed-off-by: Chris Wright <>

7 years agoNET: Fix multicast device ioctl checks
Patrick McHardy [Mon, 7 Apr 2008 06:42:55 +0000]
NET: Fix multicast device ioctl checks

Upstream commit: 61ee6bd487b9cc160e533034eb338f2085dc7922

SIOCADDMULTI/SIOCDELMULTI check whether the driver has a set_multicast_list
method to determine whether it supports multicast. Drivers implementing
secondary unicast support use set_rx_mode however.

Check for both dev->set_multicast_mode and dev->set_rx_mode to determine
multicast capabilities.

Signed-off-by: Patrick McHardy <>
Signed-off-by: David S. Miller <>
Signed-off-by: Chris Wright <>

7 years agoSCTP: Fix local_addr deletions during list traversals.
Chidambar 'ilLogict' Zinnoury [Mon, 7 Apr 2008 06:42:35 +0000]
SCTP: Fix local_addr deletions during list traversals.

Upstream commit: 22626216c46f2ec86287e75ea86dd9ac3df54265

Since the lists are circular, we need to explicitely tag
the address to be deleted since we might end up freeing
the list head instead.  This fixes some interesting SCTP

Signed-off-by: Chidambar 'ilLogict' Zinnoury <>
Signed-off-by: Vlad Yasevich <>
Signed-off-by: David S. Miller <>
Signed-off-by: Chris Wright <>

7 years agosch_htb: fix "too many events" situation
Martin Devera [Mon, 7 Apr 2008 06:42:10 +0000]
sch_htb: fix "too many events" situation

Upstream commit: 8f3ea33a5078a09eba12bfe57424507809367756

HTB is event driven algorithm and part of its work is to apply
scheduled events at proper times. It tried to defend itself from
livelock by processing only limited number of events per dequeue.
Because of faster computers some users already hit this hardcoded

This patch limits processing up to 2 jiffies (why not 1 jiffie ?
because it might stop prematurely when only fraction of jiffie

Signed-off-by: Martin Devera <>
Signed-off-by: David S. Miller <>
Signed-off-by: Chris Wright <>

7 years agoNET: Add preemption point in qdisc_run
Herbert Xu [Mon, 7 Apr 2008 06:41:50 +0000]
NET: Add preemption point in qdisc_run

Upstream commit: 2ba2506ca7ca62c56edaa334b0fe61eb5eab6ab0

The qdisc_run loop is currently unbounded and runs entirely in a
softirq.  This is bad as it may create an unbounded softirq run.

This patch fixes this by calling need_resched and breaking out if

It also adds a break out if the jiffies value changes since that would
indicate we've been transmitting for too long which starves other

Signed-off-by: Herbert Xu <>
Signed-off-by: David S. Miller <>
Signed-off-by: Chris Wright <>