commit 6eb2b7e3bd16f6e63fd028f6213d352281d9663e Author: Jiri Slaby Date: Fri Oct 31 13:41:16 2014 +0100 Linux 3.12.32 commit 08f5fb0099a95c28676f8afe36e1d83f8234d002 Author: Jan Kara Date: Tue Nov 5 01:15:38 2013 +0100 ext2: Fix fs corruption in ext2_get_xip_mem() commit 7ba3ec5749ddb61f79f7be17b5fd7720eebc52de upstream. Commit 8e3dffc651cb "Ext2: mark inode dirty after the function dquot_free_block_nodirty is called" unveiled a bug in __ext2_get_block() called from ext2_get_xip_mem(). That function called ext2_get_block() mistakenly asking it to map 0 blocks while 1 was intended. Before the above mentioned commit things worked out fine by luck but after that commit we started returning that we allocated 0 blocks while we in fact allocated 1 block and thus allocation was looping until all blocks in the filesystem were exhausted. Fix the problem by properly asking for one block and also add assertion in ext2_get_blocks() to catch similar problems. Reported-and-tested-by: Andiry Xu Signed-off-by: Jan Kara Signed-off-by: Jiri Slaby commit 19911e16366f1a6ac8dc4ac7d4709c133e165638 Author: Johannes Weiner Date: Thu Oct 2 16:16:57 2014 -0700 mm: memcontrol: do not iterate uninitialized memcgs commit 2f7dd7a4100ad4affcb141605bef178ab98ccb18 upstream. The cgroup iterators yield css objects that have not yet gone through css_online(), but they are not complete memcgs at this point and so the memcg iterators should not return them. Commit d8ad30559715 ("mm/memcg: iteration skip memcgs not yet fully initialized") set out to implement exactly this, but it uses CSS_ONLINE, a cgroup-internal flag that does not meet the ordering requirements for memcg, and so the iterator may skip over initialized groups, or return partially initialized memcgs. The cgroup core can not reasonably provide a clear answer on whether the object around the css has been fully initialized, as that depends on controller-specific locking and lifetime rules. Thus, introduce a memcg-specific flag that is set after the memcg has been initialized in css_online(), and read before mem_cgroup_iter() callers access the memcg members. Signed-off-by: Johannes Weiner Cc: Tejun Heo Acked-by: Michal Hocko Cc: Hugh Dickins Cc: Peter Zijlstra Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Jiri Slaby commit d9fc4e657d6fe0886a2a960eb09102c27a7babfd Author: Michael Ellerman Date: Tue Oct 14 12:07:56 2014 +1100 powerpc: Add smp_mb()s to arch_spin_unlock_wait() commit 78e05b1421fa41ae8457701140933baa5e7d9479 upstream. Similar to the previous commit which described why we need to add a barrier to arch_spin_is_locked(), we have a similar problem with spin_unlock_wait(). We need a barrier on entry to ensure any spinlock we have previously taken is visibly locked prior to the load of lock->slock. It's also not clear if spin_unlock_wait() is intended to have ACQUIRE semantics. For now be conservative and add a barrier on exit to give it ACQUIRE semantics. Signed-off-by: Michael Ellerman Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Greg Kroah-Hartman Signed-off-by: Jiri Slaby commit 5aaee42d255a81d3b010432265254353946d88b3 Author: Michael Ellerman Date: Tue Oct 14 12:07:18 2014 +1100 powerpc: Add smp_mb() to arch_spin_is_locked() commit 51d7d5205d3389a32859f9939f1093f267409929 upstream. The kernel defines the function spin_is_locked(), which can be used to check if a spinlock is currently locked. Using spin_is_locked() on a lock you don't hold is obviously racy. That is, even though you may observe that the lock is unlocked, it may become locked at any time. There is (at least) one exception to that, which is if two locks are used as a pair, and the holder of each checks the status of the other before doing any update. Assuming *A and *B are two locks, and *COUNTER is a shared non-atomic value: The first CPU does: spin_lock(*A) if spin_is_locked(*B) # nothing else smp_mb() LOAD r = *COUNTER r++ STORE *COUNTER = r spin_unlock(*A) And the second CPU does: spin_lock(*B) if spin_is_locked(*A) # nothing else smp_mb() LOAD r = *COUNTER r++ STORE *COUNTER = r spin_unlock(*B) Although this is a strange locking construct, it should work. It seems to be understood, but not documented, that spin_is_locked() is not a memory barrier, so in the examples above and below the caller inserts its own memory barrier before acting on the result of spin_is_locked(). For now we assume spin_is_locked() is implemented as below, and we break it out in our examples: bool spin_is_locked(*LOCK) { LOAD l = *LOCK return l.locked } Our intuition is that there should be no problem even if the two code sequences run simultaneously such as: CPU 0 CPU 1 ================================================== spin_lock(*A) spin_lock(*B) LOAD b = *B LOAD a = *A if b.locked # true if a.locked # true # nothing # nothing spin_unlock(*A) spin_unlock(*B) If one CPU gets the lock before the other then it will do the update and the other CPU will back off: CPU 0 CPU 1 ================================================== spin_lock(*A) LOAD b = *B spin_lock(*B) if b.locked # false LOAD a = *A else if a.locked # true smp_mb() # nothing LOAD r1 = *COUNTER spin_unlock(*B) r1++ STORE *COUNTER = r1 spin_unlock(*A) However in reality spin_lock() itself is not indivisible. On powerpc we implement it as a load-and-reserve and store-conditional. Ignoring the retry logic for the lost reservation case, it boils down to: spin_lock(*LOCK) { LOAD l = *LOCK l.locked = true STORE *LOCK = l ACQUIRE_BARRIER } The ACQUIRE_BARRIER is required to give spin_lock() ACQUIRE semantics as defined in memory-barriers.txt: This acts as a one-way permeable barrier. It guarantees that all memory operations after the ACQUIRE operation will appear to happen after the ACQUIRE operation with respect to the other components of the system. On modern powerpc systems we use lwsync for ACQUIRE_BARRIER. lwsync is also know as "lightweight sync", or "sync 1". As described in Power ISA v2.07 section B.2.1.1, in this scenario the lwsync is not the barrier itself. It instead causes the LOAD of *LOCK to act as the barrier, preventing any loads or stores in the locked region from occurring prior to the load of *LOCK. Whether this behaviour is in accordance with the definition of ACQUIRE semantics in memory-barriers.txt is open to discussion, we may switch to a different barrier in future. What this means in practice is that the following can occur: CPU 0 CPU 1 ================================================== LOAD a = *A LOAD b = *B a.locked = true b.locked = true LOAD b = *B LOAD a = *A STORE *A = a STORE *B = b if b.locked # false if a.locked # false else else smp_mb() smp_mb() LOAD r1 = *COUNTER LOAD r2 = *COUNTER r1++ r2++ STORE *COUNTER = r1 STORE *COUNTER = r2 # Lost update spin_unlock(*A) spin_unlock(*B) That is, the load of *B can occur prior to the store that makes *A visibly locked. And similarly for CPU 1. The result is both CPUs hold their lock and believe the other lock is unlocked. The easiest fix for this is to add a full memory barrier to the start of spin_is_locked(), so adding to our previous definition would give us: bool spin_is_locked(*LOCK) { smp_mb() LOAD l = *LOCK return l.locked } The new barrier orders the store to the lock we are locking vs the load of the other lock: CPU 0 CPU 1 ================================================== LOAD a = *A LOAD b = *B a.locked = true b.locked = true STORE *A = a STORE *B = b smp_mb() smp_mb() LOAD b = *B LOAD a = *A if b.locked # true if a.locked # true # nothing # nothing spin_unlock(*A) spin_unlock(*B) Although the above example is theoretical, there is code similar to this example in sem_lock() in ipc/sem.c. This commit in addition to the next commit appears to be a fix for crashes we are seeing in that code where we believe this race happens in practice. Signed-off-by: Michael Ellerman Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Jiri Slaby commit 292b588419a9fdd4950ae7944bed6efafa7571ae Author: Ilya Dryomov Date: Fri Oct 10 16:39:05 2014 +0400 libceph: ceph-msgr workqueue needs a resque worker commit f9865f06f7f18c6661c88d0511f05c48612319cc upstream. Commit f363e45fd118 ("net/ceph: make ceph_msgr_wq non-reentrant") effectively removed WQ_MEM_RECLAIM flag from ceph_msgr_wq. This is wrong - libceph is very much a memory reclaim path, so restore it. Cc: stable@vger.kernel.org # needs backporting for < 3.12 Signed-off-by: Ilya Dryomov Tested-by: Micha Krause Reviewed-by: Sage Weil Signed-off-by: Jiri Slaby commit 99a52137266e8cf196b0cd2facd325241e59e094 Author: Ezequiel Garcia Date: Tue Sep 2 09:51:15 2014 -0300 drm/tilcdc: Fix the error path in tilcdc_load() commit b478e336b3e75505707a11e78ef8b964ef0a03af upstream. The current error path calls tilcdc_unload() in case of an error to release the resources. However, this is wrong because not all resources have been allocated by the time an error occurs in tilcdc_load(). To fix it, this commit adds proper labels to bail out at the different stages in the load function, and release only the resources actually allocated. Tested-by: Darren Etheridge Tested-by: Johannes Pointner Signed-off-by: Ezequiel Garcia Signed-off-by: Dave Airlie Signed-off-by: Jiri Slaby commit 5436a41fe9dc95db899e96c10032c90626e289bb Author: Shen Guang Date: Wed Jan 8 14:45:42 2014 +0800 usb:hub set hub->change_bits when over-current happens commit 08d1dec6f4054e3613f32051d9b149d4203ce0d2 upstream. When we are doing compliance test with xHCI, we found that if we enable CONFIG_USB_SUSPEND and plug in a bad device which causes over-current condition to the root port, software will not be noticed. The reason is that current code don't set hub->change_bits in hub_activate() when over-current happens, and then hub_events() will not check the port status because it thinks nothing changed. If CONFIG_USB_SUSPEND is disabled, the interrupt pipe of the hub will report the change and set hub->event_bits, and then hub_events() will check what events happened.In this case over-current can be detected. Signed-off-by: Shen Guang Acked-by: Alan Stern Acked-by: Sarah Sharp Signed-off-by: Greg Kroah-Hartman Signed-off-by: Jiri Slaby commit 470023f49367ce285fc8c5644192e6ab6ea2772b Author: Pawel Moll Date: Fri Jun 13 16:03:32 2014 +0100 perf: Handle compat ioctl commit b3f207855f57b9c8f43a547a801340bb5cbc59e5 upstream. When running a 32-bit userspace on a 64-bit kernel (eg. i386 application on x86_64 kernel or 32-bit arm userspace on arm64 kernel) some of the perf ioctls must be treated with special care, as they have a pointer size encoded in the command. For example, PERF_EVENT_IOC_ID in 32-bit world will be encoded as 0x80042407, but 64-bit kernel will expect 0x80082407. In result the ioctl will fail returning -ENOTTY. This patch solves the problem by adding code fixing up the size as compat_ioctl file operation. Reported-by: Drew Richardson Signed-off-by: Pawel Moll Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Link: http://lkml.kernel.org/r/1402671812-9078-1-git-send-email-pawel.moll@arm.com Signed-off-by: Ingo Molnar Signed-off-by: Jiri Slaby commit 7eca260f3ff84dbc3759937b2796d723a3037549 Author: Vince Weaver Date: Mon Jul 14 15:33:25 2014 -0400 perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge commit 1996388e9f4e3444db8273bc08d25164d2967c21 upstream. This was discussed back in February: https://lkml.org/lkml/2014/2/18/956 But I never saw a patch come out of it. On IvyBridge we share the SandyBridge cache event tables, but the dTLB-load-miss event is not compatible. Patch it up after the fact to the proper DTLB_LOAD_MISSES.DEMAND_LD_MISS_CAUSES_A_WALK Signed-off-by: Vince Weaver Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Linus Torvalds Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1407141528200.17214@vincent-weaver-1.umelst.maine.edu Signed-off-by: Ingo Molnar Signed-off-by: Jiri Slaby commit 3ff95cbd5e1be3022a50be19bc0d7ceb1d845a34 Author: Dave Chinner Date: Tue Sep 23 15:36:27 2014 +1000 xfs: ensure WB_SYNC_ALL writeback handles partial pages correctly commit 0d085a529b427d97710e6a41f8a4f23e1757cd12 upstream. XFS has been having trouble with stray delayed allocation extents beyond EOF for a long time. Recent changes to the collapse range code has triggered erroneous EBUSY errors on page invalidtion for block size smaller than page size filesystems. These have been caused by dirty buffers beyond EOF on a partial page which do not get written to disk during a sync. The issue is that write-ahead in xfs_cluster_write() finds such a partial page and handles it by leaving the page dirty but pushing it into a writeback state. This used to work just fine, as the write_cache_pages() code would then find the dirty partial page in the next mapping tree lookup as the dirty tag is still set. Unfortunately, when we moved to a mark and sweep approach to writeback to fix other writeback sync issues, we broken this. THe act of marking the page as under writeback now clears the TOWRITE tag in the radix tree, even though the page is still dirty. This causes the TOWRITE tag to be cleared, and hence the next lookup on the mapping tree does not find the dirty partial page and so doesn't try to write it again. This same writeback bug was found recently in ext4 and fixed in commit 1c8349a ("ext4: fix data integrity sync in ordered mode") without communication to the wider filesystem community. We can use exactly the same fix here so the TOWRITE flag is not cleared on partial page writes. cc: stable@vger.kernel.org # dependent on 1c8349a17137b93f0a83f276c764a6df1b9a116e Root-cause-found-by: Brian Foster Signed-off-by: Dave Chinner Reviewed-by: Brian Foster Signed-off-by: Dave Chinner Signed-off-by: Jiri Slaby commit 660e27e875b6de94393634edea40d75f844b0f97 Author: Chao Yu Date: Thu Jul 24 17:25:42 2014 +0800 ecryptfs: avoid to access NULL pointer when write metadata in xattr commit 35425ea2492175fd39f6116481fe98b2b3ddd4ca upstream. Christopher Head 2014-06-28 05:26:20 UTC described: "I tried to reproduce this on 3.12.21. Instead, when I do "echo hello > foo" in an ecryptfs mount with ecryptfs_xattr specified, I get a kernel crash: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [] fsstack_copy_attr_all+0x2/0x61 PGD d7840067 PUD b2c3c067 PMD 0 Oops: 0002 [#1] SMP Modules linked in: nvidia(PO) CPU: 3 PID: 3566 Comm: bash Tainted: P O 3.12.21-gentoo-r1 #2 Hardware name: ASUSTek Computer Inc. G60JX/G60JX, BIOS 206 03/15/2010 task: ffff8801948944c0 ti: ffff8800bad70000 task.ti: ffff8800bad70000 RIP: 0010:[] [] fsstack_copy_attr_all+0x2/0x61 RSP: 0018:ffff8800bad71c10 EFLAGS: 00010246 RAX: 00000000000181a4 RBX: ffff880198648480 RCX: 0000000000000000 RDX: 0000000000000004 RSI: ffff880172010450 RDI: 0000000000000000 RBP: ffff880198490e40 R08: 0000000000000000 R09: 0000000000000000 R10: ffff880172010450 R11: ffffea0002c51e80 R12: 0000000000002000 R13: 000000000000001a R14: 0000000000000000 R15: ffff880198490e40 FS: 00007ff224caa700(0000) GS:ffff88019fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000bb07f000 CR4: 00000000000007e0 Stack: ffffffff811826e8 ffff8800a39d8000 0000000000000000 000000000000001a ffff8800a01d0000 ffff8800a39d8000 ffffffff81185fd5 ffffffff81082c2c 00000001a39d8000 53d0abbc98490e40 0000000000000037 ffff8800a39d8220 Call Trace: [] ? ecryptfs_setxattr+0x40/0x52 [] ? ecryptfs_write_metadata+0x1b3/0x223 [] ? should_resched+0x5/0x23 [] ? ecryptfs_initialize_file+0xaf/0xd4 [] ? ecryptfs_create+0xf4/0x142 [] ? vfs_create+0x48/0x71 [] ? do_last.isra.68+0x559/0x952 [] ? link_path_walk+0xbd/0x458 [] ? path_openat+0x224/0x472 [] ? do_filp_open+0x2b/0x6f [] ? __alloc_fd+0xd6/0xe7 [] ? do_sys_open+0x65/0xe9 [] ? system_call_fastpath+0x16/0x1b RIP [] fsstack_copy_attr_all+0x2/0x61 RSP CR2: 0000000000000000 ---[ end trace df9dba5f1ddb8565 ]---" If we create a file when we mount with ecryptfs_xattr_metadata option, we will encounter a crash in this path: ->ecryptfs_create ->ecryptfs_initialize_file ->ecryptfs_write_metadata ->ecryptfs_write_metadata_to_xattr ->ecryptfs_setxattr ->fsstack_copy_attr_all It's because our dentry->d_inode used in fsstack_copy_attr_all is NULL, and it will be initialized when ecryptfs_initialize_file finish. So we should skip copying attr from lower inode when the value of ->d_inode is invalid. Signed-off-by: Chao Yu Signed-off-by: Tyler Hicks Signed-off-by: Jiri Slaby commit f7726152f1314004335f7516719fdb7558a7e3aa Author: Ludovic Desroches Date: Mon Sep 22 15:51:33 2014 +0200 ARM: at91/PMC: don't forget to write PMC_PCDR register to disable clocks commit cfa1950e6c6b72251e80adc736af3c3d2907ab0e upstream. When introducing support for sama5d3, the write to PMC_PCDR register has been accidentally removed. Reported-by: Nathalie Cyrille Signed-off-by: Ludovic Desroches Signed-off-by: Nicolas Ferre Signed-off-by: Jiri Slaby commit 479b576f2d6b1d226611b955b4d6ac887531fac6 Author: Andreas Henriksson Date: Tue Sep 23 17:12:52 2014 +0200 ARM: at91: fix at91sam9263ek DT mmc pinmuxing settings commit b65e0fb3d046cc65d0a3c45d43de351fb363271b upstream. As discovered on a custom board similar to at91sam9263ek and basing its devicetree on that one apparently the pin muxing doesn't get set up properly. This was discovered since the custom boards u-boot does funky stuff with the pin muxing and leaved it set to SPI which made the MMC driver not work under Linux. The fix is simply to define the given configuration as the default. This probably worked by pure luck before, but it's better to make the muxing explicitly set. Signed-off-by: Andreas Henriksson Acked-by: Boris Brezillon Signed-off-by: Nicolas Ferre Signed-off-by: Jiri Slaby commit 7158dc66f236fad1db58bb95f34ee748a758d572 Author: Anssi Hannula Date: Sun Oct 19 19:25:19 2014 +0300 ALSA: hda - hdmi: Fix missing ELD change event on plug/unplug commit 6acce400d9daf1353fbf497302670c90a3205e1d upstream. The ELD ALSA control change event is sent by hdmi_present_sense() when eld_changed is true. Currently, it is only true when the ELD buffer contents have been modified. However, the user-visible ELD controls also change to a zero-length value and back when eld_valid is unset/set, and no event is currently sent in such cases (such as when unplugging or replugging a sink). Fix the code to always set eld_changed if eld_valid value is changed, and therefore to always send the change event when the user-visible value changes. Signed-off-by: Anssi Hannula Cc: David Henningsson Signed-off-by: Takashi Iwai Signed-off-by: Jiri Slaby commit b36df611fcba2872673c859f2e6e85e5f71d461c Author: Vlad Catoi Date: Sat Oct 18 17:45:41 2014 -0500 ALSA: usb-audio: Add support for Steinberg UR22 USB interface commit f0b127fbfdc8756eba7437ab668f3169280bd358 upstream. Adding support for Steinberg UR22 USB interface via quirks table patch See Ubuntu bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1317244 Also see threads: http://linux-audio.4202.n7.nabble.com/Support-for-Steinberg-UR22-Yamaha-USB-chipset-0499-1509-tc82888.html#a82917 http://www.steinberg.net/forums/viewtopic.php?t=62290 Tested by at least 4 people judging by the threads. Did not test MIDI interface, but audio output and capture both are functional. Built 3.17 kernel with this driver on Ubuntu 14.04 & tested with mpg123 Patch applied to 3.13 Ubuntu kernel works well enough for daily use. Signed-off-by: Vlad Catoi Acked-by: Clemens Ladisch Signed-off-by: Takashi Iwai Signed-off-by: Jiri Slaby commit 3bbcc0f4a75fc643aee7ab5d097bf017b16a4868 Author: Harsha Priya Date: Thu Oct 9 11:04:56 2014 +0000 ALSA: ALC283 codec - Avoid pop noise on headphones during suspend/resume commit b450b17c156e264bc44a198046d3ebaaef5a041d upstream. This patch sets the headphones mode to default before suspending which helps avoid the pop noise on headphones Signed-off-by: Harsha Priya Signed-off-by: Takashi Iwai Signed-off-by: Jiri Slaby commit 348d783dab0f4db707cb98e362d4d2ab959192c4 Author: Takashi Iwai Date: Mon Oct 13 23:18:02 2014 +0200 ALSA: emu10k1: Fix deadlock in synth voice lookup commit 95926035b187cc9fee6fb61385b7da9c28123f74 upstream. The emu10k1 voice allocator takes voice_lock spinlock. When there is no empty stream available, it tries to release a voice used by synth, and calls get_synth_voice. The callback function, snd_emu10k1_synth_get_voice(), however, also takes the voice_lock, thus it deadlocks. The fix is simply removing the voice_lock holds in snd_emu10k1_synth_get_voice(), as this is always called in the spinlock context. Reported-and-tested-by: Arthur Marsh Signed-off-by: Takashi Iwai Signed-off-by: Jiri Slaby commit 493d45eba80eb73cbca0eb809e10813dc612e410 Author: Anatol Pomozov Date: Fri Oct 17 12:43:34 2014 -0700 ALSA: pcm: use the same dma mmap codepath both for arm and arm64 commit a011e213f3700233ed2a676f1ef0a74a052d7162 upstream. This avoids following kernel crash when try to playback on arm64 [ 107.497203] [] snd_pcm_mmap_data_fault+0x90/0xd4 [ 107.503405] [] __do_fault+0xb0/0x498 [ 107.508565] [] handle_mm_fault+0x224/0x7b0 [ 107.514246] [] do_page_fault+0x11c/0x310 [ 107.519738] [] do_mem_abort+0x38/0x98 Tested: backported to 3.14 and tried to playback on arm64 machine Signed-off-by: Anatol Pomozov Signed-off-by: Takashi Iwai Signed-off-by: Jiri Slaby commit b58c3a9feb55d077a33789e3e6e2d27e5e70a4a8 Author: Victor Kamensky Date: Tue Oct 14 06:55:05 2014 +0100 arm64: compat: fix compat types affecting struct compat_elf_prpsinfo commit 971a5b6fe634bb7b617d8c5f25b6a3ddbc600194 upstream. The compat_elf_prpsinfo structure does not match the arch/arm struct elf_pspsinfo definition. As result NT_PRPSINFO note in core file created by arm64 kernel for aarch32 (compat) process has wrong size. So gdb cannot display command that caused process crash. Fix is to change size of __compat_uid_t, __compat_gid_t so it would match size of similar fields in arch/arm case. Signed-off-by: Victor Kamensky Acked-by: Arnd Bergmann Signed-off-by: Catalin Marinas Signed-off-by: Jiri Slaby commit 3e3bef45db6a9b5791b61a8d24d69ceb753a29e1 Author: Andy Shevchenko Date: Thu Sep 18 20:08:53 2014 +0300 spi: dw-mid: terminate ongoing transfers at exit commit 8e45ef682cb31fda62ed4eeede5d9745a0a1b1e2 upstream. Do full clean up at exit, means terminate all ongoing DMA transfers. Signed-off-by: Andy Shevchenko Signed-off-by: Mark Brown Signed-off-by: Jiri Slaby commit 478a5f81defe61a89083f3b719e142f250427098 Author: Sasha Levin Date: Mon Oct 13 15:51:05 2014 -0700 kernel: add support for gcc 5 commit 71458cfc782eafe4b27656e078d379a34e472adf upstream. We're missing include/linux/compiler-gcc5.h which is required now because gcc branched off to v5 in trunk. Just copy the relevant bits out of include/linux/compiler-gcc4.h, no new code is added as of now. This fixes a build error when using gcc 5. Signed-off-by: Sasha Levin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Jiri Slaby commit c901375df3622717e3926038c0d350fbc291df2c Author: Yann Droneaud Date: Thu Oct 9 15:24:40 2014 -0700 fanotify: enable close-on-exec on events' fd when requested in fanotify_init() commit 0b37e097a648aa71d4db1ad108001e95b69a2da4 upstream. According to commit 80af258867648 ("fanotify: groups can specify their f_flags for new fd"), file descriptors created as part of file access notification events inherit flags from the event_f_flags argument passed to syscall fanotify_init(2)[1]. Unfortunately O_CLOEXEC is currently silently ignored. Indeed, event_f_flags are only given to dentry_open(), which only seems to care about O_ACCMODE and O_PATH in do_dentry_open(), O_DIRECT in open_check_o_direct() and O_LARGEFILE in generic_file_open(). It's a pity, since, according to some lookup on various search engines and http://codesearch.debian.net/, there's already some userspace code which use O_CLOEXEC: - in systemd's readahead[2]: fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME); - in clsync[3]: #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC) int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS); - in examples [4] from "Filesystem monitoring in the Linux kernel" article[5] by Aleksander Morgado: if ((fanotify_fd = fanotify_init (FAN_CLOEXEC, O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0) Additionally, since commit 48149e9d3a7e ("fanotify: check file flags passed in fanotify_init"). having O_CLOEXEC as part of fanotify_init() second argument is expressly allowed. So it seems expected to set close-on-exec flag on the file descriptors if userspace is allowed to request it with O_CLOEXEC. But Andrew Morton raised[6] the concern that enabling now close-on-exec might break existing applications which ask for O_CLOEXEC but expect the file descriptor to be inherited across exec(). In the other hand, as reported by Mihai Dontu[7] close-on-exec on the file descriptor returned as part of file access notify can break applications due to deadlock. So close-on-exec is needed for most applications. More, applications asking for close-on-exec are likely expecting it to be enabled, relying on O_CLOEXEC being effective. If not, it might weaken their security, as noted by Jan Kara[8]. So this patch replaces call to macro get_unused_fd() by a call to function get_unused_fd_flags() with event_f_flags value as argument. This way O_CLOEXEC flag in the second argument of fanotify_init(2) syscall is interpreted and close-on-exec get enabled when requested. [1] http://man7.org/linux/man-pages/man2/fanotify_init.2.html [2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294 [3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631 https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38 [4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c [5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/ [6] http://lkml.kernel.org/r/20141001153621.65e9258e65a6167bf2e4cb50@linux-foundation.org [7] http://lkml.kernel.org/r/20141002095046.3715eb69@mdontu-l [8] http://lkml.kernel.org/r/20141002104410.GB19748@quack.suse.cz Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com Signed-off-by: Yann Droneaud Reviewed-by: Jan Kara Reviewed by: Heinrich Schuchardt Tested-by: Heinrich Schuchardt Cc: Mihai Don\u021bu Cc: Pádraig Brady Cc: Heinrich Schuchardt Cc: Jan Kara Cc: Valdis Kletnieks Cc: Michael Kerrisk-manpages Cc: Lino Sanfilippo Cc: Richard Guy Briggs Cc: Eric Paris Cc: Al Viro Cc: Michael Kerrisk Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Jiri Slaby commit e0094b8a584c79d7cd683cb7c57fddad020a3287 Author: Junxiao Bi Date: Thu Oct 9 15:28:23 2014 -0700 mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set commit 934f3072c17cc8886f4c043b47eeeb1b12f8de33 upstream. commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation") introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still run into I/O, like in superblock shrinker. And this will make the kernel run into the deadlock case described in that commit. See Dave Chinner's comment about io in superblock shrinker: Filesystem shrinkers do indeed perform IO from the superblock shrinker and have for years. Even clean inodes can require IO before they can be freed - e.g. on an orphan list, need truncation of post-eof blocks, need to wait for ordered operations to complete before it can be freed, etc. IOWs, Ext4, btrfs and XFS all can issue and/or block on arbitrary amounts of IO in the superblock shrinker context. XFS, in particular, has been doing transactions and IO from the VFS inode cache shrinker since it was first introduced.... Fix this by clearing __GFP_FS in memalloc_noio_flags(), this function has masked all the gfp_mask that will be passed into fs for the processes setting PF_MEMALLOC_NOIO in the direct reclaim path. v1 thread at: https://lkml.org/lkml/2014/9/3/32 Signed-off-by: Junxiao Bi Cc: Dave Chinner Cc: joyce.xue Cc: Ming Lei Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Jiri Slaby commit 24fdb4c16fd0486ed2781c3877344e271c050f05 Author: Champion Chen Date: Sat Sep 6 14:06:08 2014 -0500 Bluetooth: Fix issue with USB suspend in btusb driver commit 85560c4a828ec9c8573840c9b66487b6ae584768 upstream. Suspend could fail for some platforms because btusb_suspend==> btusb_stop_traffic ==> usb_kill_anchored_urbs. When btusb_bulk_complete returns before system suspend and resubmits an URB, the system cannot enter suspend state. Signed-off-by: Champion Chen Signed-off-by: Larry Finger Signed-off-by: Marcel Holtmann Signed-off-by: Jiri Slaby commit d511101482bad689a3f19afa8d44d834e5c0e0a9 Author: Loic Poulain Date: Fri Aug 8 19:07:16 2014 +0200 Bluetooth: Fix HCI H5 corrupted ack value commit 4807b51895dce8aa650ebebc51fa4a795ed6b8b8 upstream. In this expression: seq = (seq - 1) % 8 seq (u8) is implicitly converted to an int in the arithmetic operation. So if seq value is 0, operation is ((0 - 1) % 8) => (-1 % 8) => -1. The new seq value is 0xff which is an invalid ACK value, we expect 0x07. It leads to frequent dropped ACK and retransmission. Fix this by using '&' binary operator instead of '%'. Signed-off-by: Loic Poulain Signed-off-by: Marcel Holtmann Signed-off-by: Jiri Slaby commit 82176ab062d34f524121f11af4aa770c517d1898 Author: Stanislaw Gruszka Date: Wed Sep 24 11:24:54 2014 +0200 rt2800: correct BBP1_TX_POWER_CTRL mask commit 01f7feeaf4528bec83798316b3c811701bac5d3e upstream. Two bits control TX power on BBP_R1 register. Correct the mask, otherwise we clear additional bit on BBP_R1 register, what can have unknown, possible negative effect. Signed-off-by: Stanislaw Gruszka Signed-off-by: John W. Linville Signed-off-by: Jiri Slaby commit f68f77bc97180325052d48255e747105447af4f7 Author: Ricardo Ribalda Delgado Date: Wed Aug 27 14:57:57 2014 +0200 PCI: Generate uppercase hex for modalias interface class commit 89ec3dcf17fd3fa009ecf8faaba36828dd6bc416 upstream. Some implementations of modprobe fail to load the driver for a PCI device automatically because the "interface" part of the modalias from the kernel is lowercase, and the modalias from file2alias is uppercase. The "interface" is the low-order byte of the Class Code, defined in PCI r3.0, Appendix D. Most interface types defined in the spec do not use alpha characters, so they won't be affected. For example, 00h, 01h, 10h, 20h, etc. are unaffected. Print the "interface" byte of the Class Code in uppercase hex, as we already do for the Vendor ID, Device ID, Class, etc. [bhelgaas: changelog] Signed-off-by: Ricardo Ribalda Delgado Signed-off-by: Bjorn Helgaas Acked-by: Greg Kroah-Hartman Signed-off-by: Jiri Slaby commit 3ffc90858ac8e620118e98494b1243b4a22fdaa5 Author: Douglas Lehr Date: Thu Aug 21 09:26:52 2014 +1000 PCI: Increase IBM ipr SAS Crocodile BARs to at least system page size commit 9fe373f9997b48fcd6222b95baf4a20c134b587a upstream. The Crocodile chip occasionally comes up with 4k and 8k BAR sizes. Due to an erratum, setting the SR-IOV page size causes the physical function BARs to expand to the system page size. Since ppc64 uses 64k pages, when Linux tries to assign the smaller resource sizes to the now 64k BARs the address will be truncated and the BARs will overlap. Force Linux to allocate the resource as a full page, which avoids the overlap. [bhelgaas: print expanded resource, too] Signed-off-by: Douglas Lehr Signed-off-by: Anton Blanchard Signed-off-by: Bjorn Helgaas Acked-by: Milton Miller Signed-off-by: Jiri Slaby commit b25f6852b313d61224e2e693ad11a180ff055cb6 Author: Thomas Petazzoni Date: Wed Sep 17 17:58:27 2014 +0200 PCI: mvebu: Fix uninitialized variable in mvebu_get_tgt_attr() commit 56fab6e189441d714a2bfc8a64f3df9c0749dff7 upstream. Geert Uytterhoeven reported a warning when building pci-mvebu: drivers/pci/host/pci-mvebu.c: In function 'mvebu_get_tgt_attr': drivers/pci/host/pci-mvebu.c:887:39: warning: 'rtype' may be used uninitialized in this function [-Wmaybe-uninitialized] if (slot == PCI_SLOT(devfn) && type == rtype) { ^ And indeed, the code of mvebu_get_tgt_attr() may lead to the usage of rtype when being uninitialized, even though it would only happen if we had entries other than I/O space and 32 bits memory space. This commit fixes that by simply skipping the current DT range being considered, if it doesn't match the resource type we're looking for. Reported-by: Geert Uytterhoeven Signed-off-by: Thomas Petazzoni Signed-off-by: Bjorn Helgaas Signed-off-by: Jiri Slaby commit 1866cdf3c46ba5f02bb5162006e1928af77c5ebb Author: Oren Givon Date: Wed Sep 17 10:31:56 2014 +0300 iwlwifi: Add missing PCI IDs for the 7260 series commit 4f08970f5284dce486f0e2290834aefb2a262189 upstream. Add 4 missing PCI IDs for the 7260 series. Signed-off-by: Oren Givon Signed-off-by: Emmanuel Grumbach Signed-off-by: Jiri Slaby commit db90763491f9a60cd7814898cddee6f48d9a9233 Author: Andy Adamson Date: Mon Sep 29 12:31:57 2014 -0400 NFSv4.1: Fix an NFSv4.1 state renewal regression commit d1f456b0b9545f1606a54cd17c20775f159bd2ce upstream. Commit 2f60ea6b8ced ("NFSv4: The NFSv4.0 client must send RENEW calls if it holds a delegation") set the NFS4_RENEW_TIMEOUT flag in nfs4_renew_state, and does not put an nfs41_proc_async_sequence call, the NFSv4.1 lease renewal heartbeat call, on the wire to renew the NFSv4.1 state if the flag was not set. The NFS4_RENEW_TIMEOUT flag is set when "now" is after the last renewal (cl_last_renewal) plus the lease time divided by 3. This is arbitrary and sometimes does the following: In normal operation, the only way a future state renewal call is put on the wire is via a call to nfs4_schedule_state_renewal, which schedules a nfs4_renew_state workqueue task. nfs4_renew_state determines if the NFS4_RENEW_TIMEOUT should be set, and the calls nfs41_proc_async_sequence, which only gets sent if the NFS4_RENEW_TIMEOUT flag is set. Then the nfs41_proc_async_sequence rpc_release function schedules another state remewal via nfs4_schedule_state_renewal. Without this change we can get into a state where an application stops accessing the NFSv4.1 share, state renewal calls stop due to the NFS4_RENEW_TIMEOUT flag _not_ being set. The only way to recover from this situation is with a clientid re-establishment, once the application resumes and the server has timed out the lease and so returns NFS4ERR_BAD_SESSION on the subsequent SEQUENCE operation. An example application: open, lock, write a file. sleep for 6 * lease (could be less) ulock, close. In the above example with NFSv4.1 delegations enabled, without this change, there are no OP_SEQUENCE state renewal calls during the sleep, and the clientid is recovered due to lease expiration on the close. This issue does not occur with NFSv4.1 delegations disabled, nor with NFSv4.0, with or without delegations enabled. Signed-off-by: Andy Adamson Link: http://lkml.kernel.org/r/1411486536-23401-1-git-send-email-andros@netapp.com Fixes: 2f60ea6b8ced (NFSv4: The NFSv4.0 client must send RENEW calls...) Signed-off-by: Trond Myklebust Signed-off-by: Jiri Slaby commit 655a42808ddd513ee764a1a2e8d62871bcface3b Author: Trond Myklebust Date: Sat Sep 27 17:41:51 2014 -0400 NFSv4: fix open/lock state recovery error handling commit df817ba35736db2d62b07de6f050a4db53492ad8 upstream. The current open/lock state recovery unfortunately does not handle errors such as NFS4ERR_CONN_NOT_BOUND_TO_SESSION correctly. Instead of looping, just proceeds as if the state manager is finished recovering. This patch ensures that we loop back, handle higher priority errors and complete the open/lock state recovery. Signed-off-by: Trond Myklebust Signed-off-by: Jiri Slaby commit 1bb9921c95d773f093fe861b0ee11572b0c5b775 Author: Trond Myklebust Date: Sat Sep 27 17:02:26 2014 -0400 NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails commit a4339b7b686b4acc8b6de2b07d7bacbe3ae44b83 upstream. If a NFSv4.x server returns NFS4ERR_STALE_CLIENTID in response to a CREATE_SESSION or SETCLIENTID_CONFIRM in order to tell us that it rebooted a second time, then the client will currently take this to mean that it must declare all locks to be stale, and hence ineligible for reboot recovery. RFC3530 and RFC5661 both suggest that the client should instead rely on the server to respond to inelegible open share, lock and delegation reclaim requests with NFS4ERR_NO_GRACE in this situation. Signed-off-by: Trond Myklebust Signed-off-by: Jiri Slaby commit aa239d85e03075f3b4406d82171d879dae5bf0b2 Author: Frans Klaver Date: Thu Sep 25 11:19:51 2014 +0200 tty: omap-serial: fix division by zero commit dc3187564e61260f49eceb21a4e7eb5e4428e90a upstream. If the chosen baud rate is large enough (e.g. 3.5 megabaud), the calculated n values in serial_omap_is_baud_mode16() may become 0. This causes a division by zero when calculating the difference between calculated and desired baud rates. To prevent this, cap the n13 and n16 values on 1. Division by zero in kernel. [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [] (show_stack) from [] (Ldiv0+0x8/0x10) [] (Ldiv0) from [] (serial_omap_baud_is_mode16+0x4c/0x68) [] (serial_omap_baud_is_mode16) from [] (serial_omap_set_termios+0x90/0x8d8) [] (serial_omap_set_termios) from [] (uart_change_speed+0xa4/0xa8) [] (uart_change_speed) from [] (uart_set_termios+0xa0/0x1fc) [] (uart_set_termios) from [] (tty_set_termios+0x248/0x2c0) [] (tty_set_termios) from [] (set_termios+0x248/0x29c) [] (set_termios) from [] (tty_mode_ioctl+0x1c8/0x4e8) [] (tty_mode_ioctl) from [] (tty_ioctl+0xa94/0xb18) [] (tty_ioctl) from [] (do_vfs_ioctl+0x4a0/0x560) [] (do_vfs_ioctl) from [] (SyS_ioctl+0x4c/0x74) [] (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x30) Signed-off-by: Frans Klaver Signed-off-by: Jiri Slaby commit 4277fc429c1ae9f815aa4e5713514d952032f2fa Author: Willy Tarreau Date: Sat Sep 27 12:31:37 2014 +0200 lzo: check for length overrun in variable length encoding. commit 72cf90124e87d975d0b2114d930808c58b4c05e4 upstream. This fix ensures that we never meet an integer overflow while adding 255 while parsing a variable length encoding. It works differently from commit 206a81c ("lzo: properly check for overruns") because instead of ensuring that we don't overrun the input, which is tricky to guarantee due to many assumptions in the code, it simply checks that the cumulated number of 255 read cannot overflow by bounding this number. The MAX_255_COUNT is the maximum number of times we can add 255 to a base count without overflowing an integer. The multiply will overflow when multiplying 255 by more than MAXINT/255. The sum will overflow earlier depending on the base count. Since the base count is taken from a u8 and a few bits, it is safe to assume that it will always be lower than or equal to 2*255, thus we can always prevent any overflow by accepting two less 255 steps. This patch also reduces the CPU overhead and actually increases performance by 1.1% compared to the initial code, while the previous fix costs 3.1% (measured on x86_64). The fix needs to be backported to all currently supported stable kernels. Reported-by: Willem Pinckaers Cc: "Don A. Bailey" Signed-off-by: Willy Tarreau Signed-off-by: Jiri Slaby commit 84d9ae2555acd65570bab9c4dc98c0345995ffbb Author: Willy Tarreau Date: Sat Sep 27 12:31:36 2014 +0200 Revert "lzo: properly check for overruns" commit af958a38a60c7ca3d8a39c918c1baa2ff7b6b233 upstream. This reverts commit 206a81c ("lzo: properly check for overruns"). As analysed by Willem Pinckaers, this fix is still incomplete on certain rare corner cases, and it is easier to restart from the original code. Reported-by: Willem Pinckaers Cc: "Don A. Bailey" Signed-off-by: Willy Tarreau Signed-off-by: Jiri Slaby commit 46f35744a3befcd6ee7c4897443ba1278affc68d Author: Willy Tarreau Date: Sat Sep 27 12:31:35 2014 +0200 Documentation: lzo: document part of the encoding commit d98a0526434d27e261f622cf9d2e0028b5ff1a00 upstream. Add a complete description of the LZO format as processed by the decompressor. I have not found a public specification of this format hence this analysis, which will be used to better understand the code. Cc: Willem Pinckaers Cc: "Don A. Bailey" Signed-off-by: Willy Tarreau Signed-off-by: Jiri Slaby commit 9ac8b73be7efd81c8560f1a3ca16e53efc155425 Author: Geert Uytterhoeven Date: Sun Sep 28 10:50:06 2014 +0200 m68k: Disable/restore interrupts in hwreg_present()/hwreg_write() commit e4dc601bf99ccd1c95b7e6eef1d3cf3c4b0d4961 upstream. hwreg_present() and hwreg_write() temporarily change the VBR register to another vector table. This table contains a valid bus error handler only, all other entries point to arbitrary addresses. If an interrupt comes in while the temporary table is active, the processor will start executing at such an arbitrary address, and the kernel will crash. While most callers run early, before interrupts are enabled, or explicitly disable interrupts, Finn Thain pointed out that macsonic has one callsite that doesn't, causing intermittent boot crashes. There's another unsafe callsite in hilkbd. Fix this for good by disabling and restoring interrupts inside hwreg_present() and hwreg_write(). Explicitly disabling interrupts can be removed from the callsites later. Reported-by: Finn Thain Signed-off-by: Geert Uytterhoeven Signed-off-by: Jiri Slaby commit 9085c3c4214282cb8fbf3024110bf5cd88999b7f Author: Alexander Usyskin Date: Mon Aug 25 16:46:53 2014 +0300 mei: bus: fix possible boundaries violation commit cfda2794b5afe7ce64ee9605c64bef0e56a48125 upstream. function 'strncpy' will fill whole buffer 'id.name' of fixed size (32) with string value and will not leave place for NULL-terminator. Possible buffer boundaries violation in following string operations. Replace strncpy with strlcpy. Signed-off-by: Alexander Usyskin Signed-off-by: Tomas Winkler Signed-off-by: Jiri Slaby commit 30063396d818b279a5571476a80db4b656d61610 Author: K. Y. Srinivasan Date: Wed Aug 27 16:25:35 2014 -0700 Drivers: hv: vmbus: Fix a bug in vmbus_open() commit 45d727cee9e200f5b351528b9fb063b69cf702c8 upstream. Fix a bug in vmbus_open() and properly propagate the error. I would like to thank Dexuan Cui for identifying the issue. Signed-off-by: K. Y. Srinivasan Tested-by: Sitsofe Wheeler Signed-off-by: Jiri Slaby commit 2b251def910acd8394c4da3c02ee55cdf5f37a69 Author: K. Y. Srinivasan Date: Wed Aug 27 16:25:34 2014 -0700 Drivers: hv: vmbus: Cleanup vmbus_establish_gpadl() commit 72c6b71c245dac8f371167d97ef471b367d0b66b upstream. Eliminate the call to BUG_ON() by waiting for the host to respond. We are trying to reclaim the ownership of memory that was given to the host and so we will have to wait until the host responds. Signed-off-by: K. Y. Srinivasan Tested-by: Sitsofe Wheeler Signed-off-by: Jiri Slaby commit 41d8d79c7431b537377de85be543e65fefa012e4 Author: K. Y. Srinivasan Date: Wed Aug 27 16:25:33 2014 -0700 Drivers: hv: vmbus: Cleanup vmbus_close_internal() commit 98d731bb064a9d1817a6ca9bf8b97051334a7cfe upstream. Eliminate calls to BUG_ON() in vmbus_close_internal(). We have chosen to potentially leak memory, than crash the guest in case of failures. In this version of the patch I have addressed comments from Dan Carpenter (dan.carpenter@oracle.com). Signed-off-by: K. Y. Srinivasan Tested-by: Sitsofe Wheeler Signed-off-by: Jiri Slaby commit 37bc0b98d04c7630543a0c5d4e78859db6db275f Author: K. Y. Srinivasan Date: Wed Aug 27 16:25:32 2014 -0700 Drivers: hv: vmbus: Cleanup vmbus_teardown_gpadl() commit 66be653083057358724d56d817e870e53fb81ca7 upstream. Eliminate calls to BUG_ON() by properly handling errors. In cases where rollback is possible, we will return the appropriate error to have the calling code decide how to rollback state. In the case where we are transferring ownership of the guest physical pages to the host, we will wait for the host to respond. Signed-off-by: K. Y. Srinivasan Tested-by: Sitsofe Wheeler Signed-off-by: Jiri Slaby commit 9c07ce2f1e4468aa013a52bd9b788b3b1f8cbff3 Author: K. Y. Srinivasan Date: Wed Aug 27 16:25:31 2014 -0700 Drivers: hv: vmbus: Cleanup vmbus_post_msg() commit fdeebcc62279119dbeafbc1a2e39e773839025fd upstream. Posting messages to the host can fail because of transient resource related failures. Correctly deal with these failures and increase the number of attempts to post the message before giving up. In this version of the patch, I have normalized the error code to Linux error code. Signed-off-by: K. Y. Srinivasan Tested-by: Sitsofe Wheeler Signed-off-by: Jiri Slaby commit 7c77b7f1a637718d4acdc6dd09e56b3016a5f42b Author: Kees Cook Date: Thu Sep 18 11:25:37 2014 -0700 firmware_class: make sure fw requests contain a name commit 471b095dfe0d693a8d624cbc716d1ee4d74eb437 upstream. An empty firmware request name will trigger warnings when building device names. Make sure this is caught earlier and rejected. The warning was visible via the test_firmware.ko module interface: echo -ne "\x00" > /sys/devices/virtual/misc/test_firmware/trigger_request Reported-by: Sasha Levin Signed-off-by: Kees Cook Tested-by: Sasha Levin Signed-off-by: Jiri Slaby commit 9235aef181b7c3829630cdfeef1b75c0f0cbe2db Author: Arun Easi Date: Thu Sep 25 06:14:45 2014 -0400 qla2xxx: Use correct offset to req-q-out for reserve calculation commit 75554b68ac1e018bca00d68a430b92ada8ab52dd upstream. Signed-off-by: Arun Easi Signed-off-by: Saurav Kashyap Signed-off-by: Christoph Hellwig Signed-off-by: Jiri Slaby commit bb51226f6521c7d64c8422dc4a93750c32d907c5 Author: Chris J Arges Date: Tue Sep 23 09:22:25 2014 -0500 mptfusion: enable no_write_same for vmware scsi disks commit 4089b71cc820a426d601283c92fcd4ffeb5139c2 upstream. When using a virtual SCSI disk in a VMWare VM if blkdev_issue_zeroout is used data can be improperly zeroed out using the mptfusion driver. This patch disables write_same for this driver and the vmware subsystem_vendor which ensures that manual zeroing out is used instead. BugLink: http://bugs.launchpad.net/bugs/1371591 Reported-by: Bruce Lucas Tested-by: Chris J Arges Signed-off-by: Chris J Arges Reviewed-by: Martin K. Petersen Signed-off-by: Christoph Hellwig Signed-off-by: Jiri Slaby commit 557e6425d7abb2b9766aaa512aaf85714189a51d Author: Mike Christie Date: Mon Sep 29 13:55:41 2014 -0500 be2iscsi: check ip buffer before copying commit a41a9ad3bbf61fae0b6bfb232153da60d14fdbd9 upstream. Dan Carpenter found a issue where be2iscsi would copy the ip from userspace to the driver buffer before checking the len of the data being copied: http://marc.info/?l=linux-scsi&m=140982651504251&w=2 This patch just has us only copy what we the driver buffer can support. Tested-by: John Soni Jose Signed-off-by: Mike Christie Signed-off-by: Christoph Hellwig Signed-off-by: Jiri Slaby commit a0b8d8d906d267987d507138003048c5fdf77473 Author: Xiubo Li Date: Sun Sep 28 17:09:54 2014 +0800 regmap: fix possible ZERO_SIZE_PTR pointer dereferencing error. commit d6b41cb06044a7d895db82bdd54f6e4219970510 upstream. Since we cannot make sure the 'val_count' will always be none zero here, and then if it equals to zero, the kmemdup() will return ZERO_SIZE_PTR, which equals to ((void *)16). So this patch fix this with just doing the zero check before calling kmemdup(). Signed-off-by: Xiubo Li Signed-off-by: Mark Brown Signed-off-by: Jiri Slaby commit 94113759ec0fda45201a214418558164187c03a4 Author: Pankaj Dubey Date: Sat Sep 27 09:47:55 2014 +0530 regmap: fix NULL pointer dereference in _regmap_write/read commit 5336be8416a71b5568d2cf54a2f2066abe9f2a53 upstream. If LOG_DEVICE is defined and map->dev is NULL it will lead to NULL pointer dereference. This patch fixes this issue by adding check for dev->NULL in all such places in regmap.c Signed-off-by: Pankaj Dubey Signed-off-by: Mark Brown Signed-off-by: Jiri Slaby commit 71922da4fea3b75fd9ade739d6f97e4a9af77015 Author: Xiubo Li Date: Sun Sep 28 11:35:25 2014 +0800 regmap: debugfs: fix possbile NULL pointer dereference commit 2c98e0c1cc6b8e86f1978286c3d4e0769ee9d733 upstream. If 'map->dev' is NULL and there will lead dev_name() to be NULL pointer dereference. So before dev_name(), we need to have check of the map->dev pionter. We also should make sure that the 'name' pointer shouldn't be NULL for debugfs_create_dir(). So here using one default "dummy" debugfs name when the 'name' pointer and 'map->dev' are both NULL. Signed-off-by: Xiubo Li Signed-off-by: Mark Brown Signed-off-by: Jiri Slaby commit eeef301b6d8d548d8577cfb14ca1c56ca323bfb7 Author: Andy Shevchenko Date: Fri Sep 12 15:11:58 2014 +0300 spi: dw-mid: check that DMA was inited before exit commit fb57862ead652454ceeb659617404c5f13bc34b5 upstream. If the driver was compiled with DMA support, but DMA channels weren't acquired by some reason, mid_spi_dma_exit() will crash the kernel. Fixes: 7063c0d942a1 (spi/dw_spi: add DMA support) Signed-off-by: Andy Shevchenko Signed-off-by: Mark Brown Signed-off-by: Jiri Slaby commit 15bc7947fa4a4cf121c46f317c6f1c349b5894d8 Author: Andy Shevchenko Date: Thu Sep 18 20:08:51 2014 +0300 spi: dw-mid: respect 8 bit mode commit b41583e7299046abdc578c33f25ed83ee95b9b31 upstream. In case of 8 bit mode and DMA usage we end up with every second byte written as 0. We have to respect bits_per_word settings what this patch actually does. Signed-off-by: Andy Shevchenko Signed-off-by: Mark Brown Signed-off-by: Jiri Slaby commit 61ede5ae220215573de66fa5937f7236ac046f21 Author: Bryan O'Donoghue Date: Wed Sep 24 00:26:24 2014 +0100 x86/intel/quark: Switch off CR4.PGE so TLB flush uses CR3 instead commit ee1b5b165c0a2f04d2107e634e51f05d0eb107de upstream. Quark x1000 advertises PGE via the standard CPUID method PGE bits exist in Quark X1000's PTEs. In order to flush an individual PTE it is necessary to reload CR3 irrespective of the PTE.PGE bit. See Quark Core_DevMan_001.pdf section 6.4.11 This bug was fixed in Galileo kernels, unfixed vanilla kernels are expected to crash and burn on this platform. Signed-off-by: Bryan O'Donoghue Cc: Borislav Petkov Link: http://lkml.kernel.org/r/1411514784-14885-1-git-send-email-pure.logic@nexus-software.ie Signed-off-by: Ingo Molnar Signed-off-by: Jiri Slaby commit 007a4d98f9814eed5d6d14c658b73fe2b60f13b6 Author: David Matlack Date: Fri Sep 19 16:03:25 2014 -0700 kvm: don't take vcpu mutex for obviously invalid vcpu ioctls commit 2ea75be3219571d0ec009ce20d9971e54af96e09 upstream. vcpu ioctls can hang the calling thread if issued while a vcpu is running. However, invalid ioctls can happen when userspace tries to probe the kind of file descriptors (e.g. isatty() calls ioctl(TCGETS)); in that case, we know the ioctl is going to be rejected as invalid anyway and we can fail before trying to take the vcpu mutex. This patch does not change functionality, it just makes invalid ioctls fail faster. Signed-off-by: David Matlack Signed-off-by: Paolo Bonzini Signed-off-by: Jiri Slaby commit dc17be89b79e769835935762f62e7d25c903e7ba Author: Christian Borntraeger Date: Wed Sep 3 16:21:32 2014 +0200 KVM: s390: unintended fallthrough for external call commit f346026e55f1efd3949a67ddd1dcea7c1b9a615e upstream. We must not fallthrough if the conditions for external call are not met. Signed-off-by: Christian Borntraeger Reviewed-by: Thomas Huth Signed-off-by: Jiri Slaby commit 8d09d4afe2735d152447903f521623dc54ddafa4 Author: David Matlack Date: Mon Aug 18 15:46:06 2014 -0700 kvm: fix potentially corrupt mmio cache commit ee3d1570b58677885b4552bce8217fda7b226a68 upstream. vcpu exits and memslot mutations can run concurrently as long as the vcpu does not aquire the slots mutex. Thus it is theoretically possible for memslots to change underneath a vcpu that is handling an exit. If we increment the memslot generation number again after synchronize_srcu_expedited(), vcpus can safely cache memslot generation without maintaining a single rcu_dereference through an entire vm exit. And much of the x86/kvm code does not maintain a single rcu_dereference of the current memslots during each exit. We can prevent the following case: vcpu (CPU 0) | thread (CPU 1) --------------------------------------------+-------------------------- 1 vm exit | 2 srcu_read_unlock(&kvm->srcu) | 3 decide to cache something based on | old memslots | 4 | change memslots | (increments generation) 5 | synchronize_srcu(&kvm->srcu); 6 retrieve generation # from new memslots | 7 tag cache with new memslot generation | 8 srcu_read_unlock(&kvm->srcu) | ... | | ... | | | By incrementing the generation after synchronizing with kvm->srcu readers, we ensure that the generation retrieved in (6) will become invalid soon after (8). Keeping the existing increment is not strictly necessary, but we do keep it and just move it for consistency from update_memslots to install_new_memslots. It invalidates old cached MMIOs immediately, instead of having to wait for the end of synchronize_srcu_expedited, which makes the code more clearly correct in case CPU 1 is preempted right after synchronize_srcu() returns. To avoid halving the generation space in SPTEs, always presume that the low bit of the generation is zero when reconstructing a generation number out of an SPTE. This effectively disables MMIO caching in SPTEs during the call to synchronize_srcu_expedited. Using the low bit this way is somewhat like a seqcount---where the protected thing is a cache, and instead of retrying we can simply punt if we observe the low bit to be 1. Signed-off-by: David Matlack Reviewed-by: Xiao Guangrong Reviewed-by: David Matlack Signed-off-by: Paolo Bonzini Signed-off-by: Jiri Slaby commit 3fe0bc3399e85a9b96e668843ab69874be36939b Author: David Matlack Date: Mon Aug 18 15:46:07 2014 -0700 kvm: x86: fix stale mmio cache bug commit 56f17dd3fbc44adcdbc3340fe3988ddb833a47a7 upstream. The following events can lead to an incorrect KVM_EXIT_MMIO bubbling up to userspace: (1) Guest accesses gpa X without a memory slot. The gfn is cached in struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets the SPTE write-execute-noread so that future accesses cause EPT_MISCONFIGs. (2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION covering the page just accessed. (3) Guest attempts to read or write to gpa X again. On Intel, this generates an EPT_MISCONFIG. The memory slot generation number that was incremented in (2) would normally take care of this but we fast path mmio faults through quickly_check_mmio_pf(), which only checks the per-vcpu mmio cache. Since we hit the cache, KVM passes a KVM_EXIT_MMIO up to userspace. This patch fixes the issue by using the memslot generation number to validate the mmio cache. Signed-off-by: David Matlack [xiaoguangrong: adjust the code to make it simpler for stable-tree fix.] Signed-off-by: Xiao Guangrong Reviewed-by: David Matlack Reviewed-by: Xiao Guangrong Tested-by: David Matlack Signed-off-by: Paolo Bonzini Signed-off-by: Jiri Slaby commit e7fd6c7a40afe78151c49ea69ef62a528f2f9ea7 Author: Josef Ahmad Date: Tue Sep 2 13:45:20 2014 +0300 pci_ids: Add support for Intel Quark ILB commit bb048713bba3ead39f6112910906d9fe3f88ede7 upstream. This patch adds the PCI id for Intel Quark ILB. It will be used for GPIO and Multifunction device driver. Signed-off-by: Josef Ahmad Acked-by: Bjorn Helgaas Signed-off-by: Andy Shevchenko Signed-off-by: Lee Jones Signed-off-by: Chang Rebecca Swee Fun Signed-off-by: Jiri Slaby commit ceac347d87281845424b1ce1ee8da0343c3b4d77 Author: Bryan O'Donoghue Date: Mon Aug 4 10:22:54 2014 -0700 usb: pch_udc: usb gadget device support for Intel Quark X1000 commit a68df7066a6f974db6069e0b93c498775660a114 upstream. This patch is to enable the USB gadget device for Intel Quark X1000 Signed-off-by: Bryan O'Donoghue Signed-off-by: Bing Niu Signed-off-by: Alvin (Weike) Chen Signed-off-by: Felipe Balbi Signed-off-by: Chang Rebecca Swee Fun Signed-off-by: Jiri Slaby commit 8ef9958bef569b081698d4ad25b263560d2a1720 Author: Sage Weil Date: Fri Sep 26 08:30:06 2014 -0700 Btrfs: fix race in WAIT_SYNC ioctl commit 42383020beb1cfb05f5d330cc311931bc4917a97 upstream. We check whether transid is already committed via last_trans_committed and then search through trans_list for pending transactions. If last_trans_committed is updated by btrfs_commit_transaction after we check it (there is no locking), we will fail to find the committed transaction and return EINVAL to the caller. This has been observed occasionally by ceph-osd (which uses this ioctl heavily). Fix by rechecking whether the provided transid <= last_trans_committed after the search fails, and if so return 0. Signed-off-by: Sage Weil Signed-off-by: Chris Mason Signed-off-by: Jiri Slaby commit b346e6df896c35f174c5613a78b97f4cea9143d6 Author: Josef Bacik Date: Fri Sep 19 15:43:34 2014 -0400 Btrfs: fix build_backref_tree issue with multiple shared blocks commit bbe9051441effce51c9a533d2c56440df64db2d7 upstream. Marc Merlin sent me a broken fs image months ago where it would blow up in the upper->checked BUG_ON() in build_backref_tree. This is because we had a scenario like this block a -- level 4 (not shared) | block b -- level 3 (reloc block, shared) | block c -- level 2 (not shared) | block d -- level 1 (shared) | block e -- level 0 (shared) We go to build a backref tree for block e, we notice block d is shared and add it to the list of blocks to lookup it's backrefs for. Now when we loop around we will check edges for the block, so we will see we looked up block c last time. So we lookup block d and then see that the block that points to it is block c and we can just skip that edge since we've already been up this path. The problem is because we clear need_check when we see block d (as it is shared) we never add block b as needing to be checked. And because block c is in our path already we bail out before we walk up to block b and add it to the backref check list. To fix this we need to reset need_check if we trip over a block that doesn't need to be checked. This will make sure that any subsequent blocks in the path as we're walking up afterwards are added to the list to be processed. With this patch I can now mount Marc's fs image and it'll complete the balance without panicing. Thanks, Reported-by: Marc MERLIN Signed-off-by: Josef Bacik Signed-off-by: Chris Mason Signed-off-by: Jiri Slaby commit 3ffb5771a40b0aa98c6c6de7976de5742f3d7c10 Author: Josef Bacik Date: Fri Sep 19 10:40:00 2014 -0400 Btrfs: cleanup error handling in build_backref_tree commit 75bfb9aff45e44625260f52a5fd581b92ace3e62 upstream. When balance panics it tends to panic in the BUG_ON(!upper->checked); test, because it means it couldn't build the backref tree properly. This is annoying to users and frankly a recoverable error, nothing in this function is actually fatal since it is just an in-memory building of the backrefs for a given bytenr. So go through and change all the BUG_ON()'s to ASSERT()'s, and fix the BUG_ON(!upper->checked) thing to just return an error. This patch also fixes the error handling so it tears down the work we've done properly. This code was horribly broken since we always just panic'ed instead of actually erroring out, so it needed to be completely re-worked. With this patch my broken image no longer panics when I mount it. Thanks, Signed-off-by: Josef Bacik Signed-off-by: Chris Mason Signed-off-by: Jiri Slaby commit 1125942d412f684b24d7a307888cd9dec99e3779 Author: Josef Bacik Date: Thu Sep 18 11:30:44 2014 -0400 Btrfs: try not to ENOSPC on log replay commit 1d52c78afbbf80b58299e076a159617d6b42fe3c upstream. When doing log replay we may have to update inodes, which traditionally goes through our delayed inode stuff. This will try to move space over from the trans handle, but we don't reserve space in our trans handle on replay since we don't know how much we will need, so instead we try to flush. But because we have a trans handle open we won't flush anything, so if we are out of reserve space we will simply return ENOSPC. Since we know that if an operation made it into the log then we definitely had space before the box bought the farm then we don't need to worry about doing this space reservation. Use the fs_info->log_root_recovering flag to skip the delayed inode stuff and update the item directly. Thanks, Signed-off-by: Josef Bacik Signed-off-by: Chris Mason Signed-off-by: Jiri Slaby commit 40b69f03d3371e16e66fa5b30f22b6df585399fc Author: David Sterba Date: Wed Jul 23 14:39:35 2014 +0200 btrfs: wake up transaction thread from SYNC_FS ioctl commit 2fad4e83e12591eb3bd213875b9edc2d18e93383 upstream. The transaction thread may want to do more work, namely it pokes the cleaner ktread that will start processing uncleaned subvols. This can be triggered by user via the 'btrfs fi sync' command, otherwise there was a delay up to 30 seconds before the cleaner started to clean old snapshots. Signed-off-by: David Sterba Signed-off-by: Chris Mason Signed-off-by: Jiri Slaby commit f0a4aeb48268770f88a2b1dabdeb2816f3fc8c0f Author: David S. Miller Date: Sat Sep 27 21:30:57 2014 -0700 sparc64: Kill unnecessary tables and increase MAX_BANKS. [ Upstream commit d195b71bad4347d2df51072a537f922546a904f1 ] swapper_low_pmd_dir and swapper_pud_dir are actually completely useless and unnecessary. We just need swapper_pg_dir[]. Naturally the other page table chunks will be allocated on an as-needed basis. Since the kernel actually accesses these tables in the PAGE_OFFSET view, there is not even a TLB locality advantage of placing them in the kernel image. Use the hard coded vmlinux.ld.S slot for swapper_pg_dir which is naturally page aligned. Increase MAX_BANKS to 1024 in order to handle heavily fragmented virtual guests. Even with this MAX_BANKS increase, the kernel is 20K+ smaller. Signed-off-by: David S. Miller Acked-by: Bob Picco commit b11148d717439f4632b7d5cbec78e95849026a1f Author: bob picco Date: Thu Sep 25 12:25:03 2014 -0700 sparc64: sparse irq [ Upstream commit ee6a9333fa58e11577c1b531b8e0f5ffc0fd6f50 ] This patch attempts to do a few things. The highlights are: 1) enable SPARSE_IRQ unconditionally, 2) kills off !SPARSE_IRQ code 3) allocates ivector_table at boot time and 4) default to cookie only VIRQ mechanism for supported firmware. The first firmware with cookie only support for me appears on T5. You can optionally force the HV firmware to not cookie only mode which is the sysino support. The sysino is a deprecated HV mechanism according to the most recent SPARC Virtual Machine Specification. HV_GRP_INTR is what controls the cookie/sysino firmware versioning. The history of this interface is: 1) Major version 1.0 only supported sysino based interrupt interfaces. 2) Major version 2.0 added cookie based VIRQs, however due to the fact that OSs were using the VIRQs without negoatiating major version 2.0 (Linux and Solaris are both guilty), the VIRQs calls were allowed even with major version 1.0 To complicate things even further, the VIRQ interfaces were only actually hooked up in the hypervisor for LDC interrupt sources. VIRQ calls on other device types would result in HV_EINVAL errors. So effectively, major version 2.0 is unusable. 3) Major version 3.0 was created to signal use of VIRQs and the fact that the hypervisor has these calls hooked up for all interrupt sources, not just those for LDC devices. A new boot option is provided should cookie only HV support have issues. hvirq - this is the version for HV_GRP_INTR. This is related to HV API versioning. The code attempts major=3 first by default. The option can be used to override this default. I've tested with SPARSE_IRQ on T5-8, M7-4 and T4-X and Jalap?no. Signed-off-by: Bob Picco Signed-off-by: David S. Miller commit a31c967b2a9e869a12ee44f8f3aaf77901e206ee Author: David S. Miller Date: Sat Sep 27 11:05:21 2014 -0700 sparc64: Adjust vmalloc region size based upon available virtual address bits. [ Upstream commit bb4e6e85daa52a9f6210fa06a5ec6269598a202b ] In order to accomodate embedded per-cpu allocation with large numbers of cpus and numa nodes, we have to use as much virtual address space as possible for the vmalloc region. Otherwise we can get things like: PERCPU: max_distance=0x380001c10000 too large for vmalloc space 0xff00000000 So, once we select a value for PAGE_OFFSET, derive the size of the vmalloc region based upon that. Signed-off-by: David S. Miller Acked-by: Bob Picco commit 7f3fde55e3ad473a141dd19ae1baf31f81798d92 Author: David S. Miller Date: Wed Sep 24 21:49:29 2014 -0700 sparc64: Increase MAX_PHYS_ADDRESS_BITS to 53. commit 7c0fa0f24bb76ce3d67be7f737b799846a04570f upstream. Make sure, at compile time, that the kernel can properly support whatever MAX_PHYS_ADDRESS_BITS is defined to. On M7 chips, use a max_phys_bits value of 49. Based upon a patch by Bob Picco. Signed-off-by: David S. Miller Acked-by: Bob Picco commit b7be15979833db31402d59aaed3c9055452a3273 Author: David S. Miller Date: Wed Sep 24 21:20:14 2014 -0700 sparc64: Use kernel page tables for vmemmap. [ Upstream commit c06240c7f5c39c83dfd7849c0770775562441b96 ] For sparse memory configurations, the vmemmap array behaves terribly and it takes up an inordinate amount of space in the BSS section of the kernel image unconditionally. Just build huge PMDs and look them up just like we do for TLB misses in the vmalloc area. Kernel BSS shrinks by about 2MB. Signed-off-by: David S. Miller Acked-by: Bob Picco commit cac611bda6823d445f38578391c213b7f62e2cbd Author: David S. Miller Date: Wed Sep 24 20:56:11 2014 -0700 sparc64: Fix physical memory management regressions with large max_phys_bits. [ Upstream commit 0dd5b7b09e13dae32869371e08e1048349fd040c ] If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like DEBUG_PAGEALLOC stop working because the 3-level page tables only can cover up to 43 bits. Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to 47, several statically allocated tables became enormous. Compounding this is that we will need to support up to 49 bits of physical addressing for M7 chips. The two tables in question are sparc64_valid_addr_bitmap and kpte_linear_bitmap. The first holds a bitmap, with 1 bit for each 4MB chunk of physical memory, indicating whether that chunk actually exists in the machine and is valid. The second table is a set of 2-bit values which tell how large of a mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB chunk of ram in the system. These tables are huge and take up an enormous amount of the BSS section of the sparc64 kernel image. Specifically, the sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K. So let's solve the space wastage and the DEBUG_PAGEALLOC problem at the same time, by using the kernel page tables (as designed) to manage this information. We have to keep using large mappings when DEBUG_PAGEALLOC is disabled, and we do this by encoding huge PMDs and PUDs. On a T4-2 with 256GB of ram the kernel page table takes up 16K with DEBUG_PAGEALLOC disabled and 256MB with it enabled. Furthermore, this memory is dynamically allocated at run time rather than coded statically into the kernel image. Signed-off-by: David S. Miller Acked-by: Bob Picco commit 4d00d3e2811eae3346a713192c87447c95448140 Author: David S. Miller Date: Wed Sep 17 10:14:56 2014 -0700 sparc64: Adjust KTSB assembler to support larger physical addresses. [ Upstream commit 8c82dc0e883821c098c8b0b130ffebabf9aab5df ] As currently coded the KTSB accesses in the kernel only support up to 47 bits of physical addressing. Adjust the instruction and patching sequence in order to support arbitrary 64 bits addresses. Signed-off-by: David S. Miller Acked-by: Bob Picco commit f3106616c9043d975e7d13441c0354b4d4e98fb4 Author: David S. Miller Date: Fri Sep 26 21:58:33 2014 -0700 sparc64: Define VA hole at run time, rather than at compile time. [ Upstream commit 4397bed080598001e88f612deb8b080bb1cc2322 ] Now that we use 4-level page tables, we can provide up to 53-bits of virtual address space to the user. Adjust the VA hole based upon the capabilities of the cpu type probed. Signed-off-by: David S. Miller Acked-by: Bob Picco commit 0e44ea2e2de88f1d7bd9a27f2bc54171ddbf55a7 Author: David S. Miller Date: Fri Sep 26 21:19:46 2014 -0700 sparc64: Switch to 4-level page tables. [ Upstream commit ac55c768143aa34cc3789c4820cbb0809a76fd9c ] This has become necessary with chips that support more than 43-bits of physical addressing. Based almost entirely upon a patch by Bob Picco. Signed-off-by: David S. Miller Acked-by: Bob Picco commit 666892deef1df8c41eddec4df90136fd368b97df Author: bob picco Date: Tue Sep 16 10:09:06 2014 -0400 sparc64: T5 PMU commit 05aa1651e8b9ca078b1808a2fe7b50703353ec02 upstream. The T5 (niagara5) has different PCR related HV fast trap values and a new HV API Group. This patch utilizes these and shares when possible with niagara4. We use the same sparc_pmu niagara4_pmu. Should there be new effort to obtain the MCU perf statistics then this would have to be changed. Cc: sparclinux@vger.kernel.org Signed-off-by: Bob Picco Signed-off-by: David S. Miller commit 45a61adc8f2f3c62624e4d169618fa83aa63958b Author: Allen Pais Date: Mon Sep 8 11:48:55 2014 +0530 sparc64: cpu hardware caps support for sparc M6 and M7 commit 408316258521168614bfb4da0e070490d3e65a17 upstream. Signed-off-by: Allen Pais Signed-off-by: David S. Miller commit 4891b830195eb30dc861105e20a586bcd2fa0534 Author: Allen Pais Date: Mon Sep 8 11:48:54 2014 +0530 sparc64: support M6 and M7 for building CPU distribution map commit 9bd3ee33f6b97de092610d8dcabc4cb98d99505c upstream. Add M6 and M7 chip type in cpumap.c to correctly build CPU distribution map that spans all online CPUs. Signed-off-by: Allen Pais Signed-off-by: David S. Miller commit 6e7ffc94a368492fdb25aa5974d3564a0fb6208e Author: Allen Pais Date: Mon Sep 8 11:48:53 2014 +0530 sparc64: correctly recognise M6 and M7 cpu type commit cadbb58039f7cab1def9c931012ab04c953a6997 upstream. The following patch adds support for correctly recognising M6 and M7 cpu type. Signed-off-by: Allen Pais Signed-off-by: David S. Miller commit dca89ea9e03641d202971dccff411edec5252775 Author: David S. Miller Date: Wed Sep 24 21:05:30 2014 -0700 sparc64: Fix hibernation code refrence to PAGE_OFFSET. commit 9d0713edf72461438bc3526e4ea55fec47754cd9 upstream. We changed PAGE_OFFSET to be a variable rather than a constant, but this reference here in the hibernate assembler got missed. Signed-off-by: David S. Miller commit a553c52f811798d5cccdd7867db5728c6dc00cf0 Author: David S. Miller Date: Tue Apr 29 13:03:27 2014 -0700 sparc64: Add basic validations to {pud,pmd}_bad(). [ Upstream commit 26cf432551d749e7d581db33529507a711c6eaab ] Instead of returning false we should at least check the most basic things, otherwise page table corruptions will be very difficult to debug. PMD and PTE tables are of size PAGE_SIZE, so none of the sub-PAGE_SIZE bits should be set. We also complement this with a check that the physical address the pud/pmd points to is valid memory. PowerPC was used as a guide while implementating this. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f215d50902fea10886d602560413d89e279e880b Author: David S. Miller Date: Sat May 3 22:52:50 2014 -0700 sparc64: Use 'ILOG2_4MB' instead of constant '22'. [ Upstream commit 0eef331a3d0ee970dcbebd1bd5fcb57ca33ece01 ] Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 08a33b83e7311de2285774dab61ac4db5f24e35d Author: David S. Miller Date: Tue Apr 29 12:58:03 2014 -0700 sparc64: Fix range check in kern_addr_valid(). [ Upstream commit ee73887e92a69ae0a5cda21c68ea75a27804c944 ] In commit b2d438348024b75a1ee8b66b85d77f569a5dfed8 ("sparc64: Make PAGE_OFFSET variable."), the MAX_PHYS_ADDRESS_BITS value was increased (to 47). This constant reference to '41UL' was missed. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a55db39ca1b9d88cc6b71c042680ec4dd7aa3ff8 Author: David S. Miller Date: Mon Apr 28 19:11:27 2014 -0700 sparc64: Don't use _PAGE_PRESENT in pte_modify() mask. [ Upstream commit eaf85da82669b057f20c4e438dc2566b51a83af6 ] Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bfd0f4baa43d31cd9b4149bec9e706346336670a Author: David S. Miller Date: Sun Apr 27 21:01:56 2014 -0700 sparc64: Fix hex values in comment above pte_modify(). [ Upstream commit c2e4e676adb40ea764af79d3e08be954e14a0f4c ] When _PAGE_SPECIAL and _PAGE_PMD_HUGE were added to the mask, the comment was not updated. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 69a46d905430c260f1e91408579282b62f82dd9a Author: David S. Miller Date: Fri Apr 25 10:21:12 2014 -0700 sparc64: Fix bugs in get_user_pages_fast() wrt. THP. [ Upstream commit 04df419de34104d8818b8c5cffaa062fa36d20ea ] The large PMD path needs to check _PAGE_VALID not _PAGE_PRESENT, to decide if it needs to bail and return 0. pmd_large() should therefore just check _PAGE_PMD_HUGE. Calls to gup_huge_pmd() are guarded with a check of pmd_large(), so we just need to add a valid bit check. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 62af252347f7539a25767a9d01c3ec3930e18d09 Author: David S. Miller Date: Thu Apr 24 13:58:02 2014 -0700 sparc64: Fix huge PMD invalidation. [ Upstream commit 51e5ef1bb7ab0e5fa7de4e802da5ab22fe35f0bf ] On sparc64 "present" and "valid" are seperate PTE bits, this allows us to naturally distinguish between the user explicitly asking for PROT_NONE with mprotect() and other situations. However we weren't handling this properly in the huge PMD paths. First of all, the page table walker in the TSB miss path only checks for _PAGE_PMD_HUGE. So the generic pmdp_invalidate() would clear _PAGE_PRESENT but the TLB miss paths would still load it into the TLB as a valid huge PMD. Fix this by clearing the valid bit in pmdp_invalidate(), and also checking the valid bit in USER_PGTABLE_CHECK_PMD_HUGE using "brgez" since _PAGE_VALID is bit 63 in both the sun4u and sun4v pte layouts. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 0ed11e0d7809e6faefd1580260ec1fa68fbdcb0d Author: David S. Miller Date: Sun Apr 20 21:55:01 2014 -0400 sparc64: Fix executable bit testing in set_pmd_at() paths. [ Upstream commit 5b1e94fa439a3227beefad58c28c17f68287a8e9 ] This code was mistakenly using the exec bit from the PMD in all cases, even when the PMD isn't a huge PMD. If it's not a huge PMD, test the exec bit in the individual ptes down in tlb_batch_pmd_scan(). Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f68d72bc620b6b562c773ab4ffb26bad18fbc3ce Author: Dave Kleikamp Date: Mon Dec 16 15:01:00 2013 -0600 Revert "sparc64: Fix __copy_{to,from}_user_inatomic defines." This reverts commit 145e1c0023585e0e8f6df22316308ec61c5066b2. This commit broke the behavior of __copy_from_user_inatomic when it is only partially successful. Instead of returning the number of bytes not copied, it now returns 1. This translates to the wrong value being returned by iov_iter_copy_from_user_atomic. xfstests generic/246 and LTP writev01 both fail on btrfs and nfs because of this. Signed-off-by: Dave Kleikamp Cc: Hugh Dickins Cc: David S. Miller Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller commit 1d9e79d6ccb5b00cfe7e695b7bf29a9df421e037 Author: oftedal Date: Fri Oct 18 22:28:29 2013 +0200 sparc: PCI: Fix incorrect address calculation of PCI Bridge windows on Simba-bridges commit 557fc5873ef178c4b3e1e36a42db547ecdc43f9b upstream. The SIMBA APB Bridges lacks the 'ranges' of-property describing the PCI I/O and memory areas located beneath the bridge. Faking this information has been performed by reading range registers in the APB bridge, and calculating the corresponding areas. In commit 01f94c4a6ced476ce69b895426fc29bfc48c69bd ("Fix sabre pci controllers with new probing scheme.") a bug was introduced into this calculation, causing the PCI memory areas to be calculated incorrectly: The shift size was set to be identical for I/O and MEM ranges, which is incorrect. This patch set the shift size of the MEM range back to the value used before 01f94c4a6ced476ce69b895426fc29bfc48c69bd. Signed-off-by: Kjetil Oftedal Signed-off-by: David S. Miller commit eaf019930a792be35c923e73ebe4d9ce5ef1513b Author: David S. Miller Date: Thu Sep 26 13:45:15 2013 -0700 sparc64: Encode huge PMDs using PTE encoding. commit a7b9403f0e6d5f99139dca18be885819c8d380a1 upstream. Now that we have 64-bits for PMDs we can stop using special encodings for the huge PMD values, and just put real PTEs in there. We allocate a _PAGE_PMD_HUGE bit to distinguish between plain PMDs and huge ones. It is the same for both 4U and 4V PTE layouts. We also use _PAGE_SPECIAL to indicate the splitting state, since a huge PMD cannot also be special. All of the PMD --> PTE translation code disappears, and most of the huge PMD bit modifications and tests just degenerate into the PTE operations. In particular USER_PGTABLE_CHECK_PMD_HUGE becomes trivial. As a side effect, normal PMDs don't shift the physical address around. This also speeds up the page table walks in the TLB miss paths since they don't have to do the shifts any more. Another non-trivial aspect is that pte_modify() has to be changed to preserve the _PAGE_PMD_HUGE bits as well as the page size field of the pte. Signed-off-by: David S. Miller commit f82fe1d262dcaecb542c91afe3ef23576d773689 Author: David S. Miller Date: Wed Sep 25 14:33:16 2013 -0700 sparc64: Move to 64-bit PGDs and PMDs. commit 2b77933c28f5044629bb19e8045aae65b72b939d upstream. To make the page tables compact, we were using 32-bit PGDs and PMDs. We only had to support <= 43 bits of physical addresses so this was quite feasible. In order to support larger physical addresses we have to move to 64-bit PGDs and PMDs. Most of the changes are straight-forward: 1) {pgd,pmd}_t --> unsigned long 2) Anything that tries to use plain "unsigned int" types with pgd/pmd values needs to be adjusted. In particular things like "0U" become "0UL". 3) {PGDIR,PMD}_BITS decrease by one. 4) In the assembler page table walkers, use "ldxa" instead of "lduwa" and adjust the low bit masks to clear out the low 3 bits instead of just the low 2 bits during pgd/pmd address formation. Also, use PTRS_PER_PGD and PTRS_PER_PMD in the sizing of the swapper_{pg_dir,low_pmd_dir} arrays. This patch does not try to take advantage of having 64-bits in the PMDs to simplify the hugepage code, that will come in a subsequent change. Signed-off-by: David S. Miller commit 86a68ad00e9811a656678fdcfb3f686ef52cdcc9 Author: David S. Miller Date: Wed Sep 25 13:48:49 2013 -0700 sparc64: Move from 4MB to 8MB huge pages. commit 37b3a8ff3e086cd5c369e77d2383b691b2874cd6 upstream. The impetus for this is that we would like to move to 64-bit PMDs and PGDs, but that would result in only supporting a 42-bit address space with the current page table layout. It'd be nice to support at least 43-bits. The reason we'd end up with only 42-bits after making PMDs and PGDs 64-bit is that we only use half-page sized PTE tables in order to make PMDs line up to 4MB, the hardware huge page size we use. So what we do here is we make huge pages 8MB, and fabricate them using 4MB hw TLB entries. Facilitate this by providing a "REAL_HPAGE_SHIFT" which is used in places that really need to operate on hardware 4MB pages. Use full pages (512 entries) for PTE tables, and adjust PMD_SHIFT, PGD_SHIFT, and the build time CPP test as needed. Use a CPP test to make sure REAL_HPAGE_SHIFT and the _PAGE_SZHUGE_* we use match up. This makes the pgtable cache completely unused, so remove the code managing it and the state used in mm_context_t. Now we have less spinlocks taken in the page table allocation path. The technique we use to fabricate the 8MB pages is to transfer bit 22 from the missing virtual address into the PTEs physical address field. That takes care of the transparent huge pages case. For hugetlb, we fill things in at the PTE level and that code already puts the sub huge page physical bits into the PTEs, based upon the offset, so there is nothing special we need to do. It all just works out. So, a small amount of complexity in the THP case, but this code is about to get much simpler when we move the 64-bit PMDs as we can move away from the fancy 32-bit huge PMD encoding and just put a real PTE value in there. With bug fixes and help from Bob Picco. Signed-off-by: David S. Miller commit 05ce20e5c71e75eee79080861c544284d7618ea5 Author: David S. Miller Date: Fri Sep 20 21:50:41 2013 -0700 sparc64: Make PAGE_OFFSET variable. commit b2d438348024b75a1ee8b66b85d77f569a5dfed8 upstream. Choose PAGE_OFFSET dynamically based upon cpu type. Original UltraSPARC-I (spitfire) chips only supported a 44-bit virtual address space. Newer chips (T4 and later) support 52-bit virtual addresses and up to 47-bits of physical memory space. Therefore we have to adjust PAGE_SIZE dynamically based upon the capabilities of the chip. Note that this change alone does not allow us to support > 43-bit physical memory, to do that we need to re-arrange our page table support. The current encodings of the pmd_t and pgd_t pointers restricts us to "32 + 11" == 43 bits. This change can waste quite a bit of memory for the various tables. In particular, a future change should work to size and allocate kern_linear_bitmap[] and sparc64_valid_addr_bitmap[] dynamically. This isn't easy as we really cannot take a TLB miss when accessing kern_linear_bitmap[]. We'd have to lock it into the TLB or similar. Signed-off-by: David S. Miller Acked-by: Bob Picco commit 7ae3f1206c8558f44af46fb5af0b0d786515012a Author: David S. Miller Date: Wed Sep 18 18:39:25 2013 -0700 sparc64: Fix inconsistent max-physical-address defines. commit f998c9c0d663b013e3aa3ba78908396c8c497218 upstream. Some parts of the code use '41' others use '42', make them all use the same value. Signed-off-by: David S. Miller Acked-by: Bob Picco commit 4832868cdf15d4deeed85f24ee5ec0768d1ffaba Author: David S. Miller Date: Wed Sep 18 15:39:06 2013 -0700 sparc64: Document the shift counts used to validate linear kernel addresses. commit bb7b435388b9f035ecfb16f42b5c6bf428359c63 upstream. This way we can see exactly what they are derived from, and in particular how they would change if we were to use a different PAGE_OFFSET value. Signed-off-by: David S. Miller Acked-by: Bob Picco commit 5c1bc5289ed4f7391c9722670f6da74de41fec0c Author: David S. Miller Date: Wed Sep 18 14:22:34 2013 -0700 sparc64: Define PAGE_OFFSET in terms of physical address bits. commit e0a45e3580a033669b24b04c3535515d69bb9702 upstream. This makes clearer the implications for a given choosen value. Based upon patches by Bob Picco. Signed-off-by: David S. Miller Acked-by: Bob Picco commit 642c2dbbb0907ec68ae3b3aed3a14312d9275567 Author: David S. Miller Date: Wed Sep 18 12:00:00 2013 -0700 sparc64: Use PAGE_OFFSET instead of a magic constant. commit 922631b988d8cbb821ebe2c67feffc0b95264894 upstream. This pertains to all of the computations of the kernel fast TLB miss xor values. Based upon a patch by Bob Picco. Signed-off-by: David S. Miller Acked-by: Bob Picco commit 02432d2dfb47bb556c7cd628af58ee69c27d48c4 Author: David S. Miller Date: Wed Sep 18 11:58:32 2013 -0700 sparc64: Clean up 64-bit mmap exclusion defines. commit c920745e6964bd4b9315a17b018d83fad66010d3 upstream. Older UltraSPARC chips had an address space hole due to the MMU only supporting 44-bit virtual addresses. The top end of this hole also has the same value as the current definition of PAGE_OFFSET, so this can be confusing. Consolidate the defines for the userspace mmap exclusion range into page_64.h and use them in sys_sparc_64.c and hugetlbpage.c Signed-off-by: David S. Miller Acked-by: Bob Picco commit e3d45a797b7742c26a847bd9b8c85f940a15860b Author: David S. Miller Date: Fri Oct 24 09:59:02 2014 -0700 sparc64: Implement __get_user_pages_fast(). [ Upstream commit 06090e8ed89ea2113a236befb41f71d51f100e60 ] It is not sufficient to only implement get_user_pages_fast(), you must also implement the atomic version __get_user_pages_fast() otherwise you end up using the weak symbol fallback implementation which simply returns zero. This is dangerous, because it causes the futex code to loop forever if transparent hugepages are supported (see get_futex_key()). Signed-off-by: David S. Miller commit 029b4cd37f0557cb0eecb026bbfa9efda7bd8d0d Author: David S. Miller Date: Thu Oct 23 12:58:13 2014 -0700 sparc64: Fix register corruption in top-most kernel stack frame during boot. [ Upstream commit ef3e035c3a9b81da8a778bc333d10637acf6c199 ] Meelis Roos reported that kernels built with gcc-4.9 do not boot, we eventually narrowed this down to only impacting machines using UltraSPARC-III and derivitive cpus. The crash happens right when the first user process is spawned: [ 54.451346] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 [ 54.451346] [ 54.571516] CPU: 1 PID: 1 Comm: init Not tainted 3.16.0-rc2-00211-gd7933ab #96 [ 54.666431] Call Trace: [ 54.698453] [0000000000762f8c] panic+0xb0/0x224 [ 54.759071] [000000000045cf68] do_exit+0x948/0x960 [ 54.823123] [000000000042cbc0] fault_in_user_windows+0xe0/0x100 [ 54.902036] [0000000000404ad0] __handle_user_windows+0x0/0x10 [ 54.978662] Press Stop-A (L1-A) to return to the boot prom [ 55.050713] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 Further investigation showed that compiling only per_cpu_patch() with an older compiler fixes the boot. Detailed analysis showed that the function is not being miscompiled by gcc-4.9, but it is using a different register allocation ordering. With the gcc-4.9 compiled function, something during the code patching causes some of the %i* input registers to get corrupted. Perhaps we have a TLB miss path into the firmware that is deep enough to cause a register window spill and subsequent restore when we get back from the TLB miss trap. Let's plug this up by doing two things: 1) Stop using the firmware stack for client interface calls into the firmware. Just use the kernel's stack. 2) As soon as we can, call into a new function "start_early_boot()" to put a one-register-window buffer between the firmware's deepest stack frame and the top-most initial kernel one. Reported-by: Meelis Roos Tested-by: Meelis Roos Signed-off-by: David S. Miller commit 1fd3b1be66c8fc6c1fd201d6fb4e09a8de43e6a9 Author: Dave Kleikamp Date: Tue Oct 7 08:12:37 2014 -0500 sparc64: Increase size of boot string to 1024 bytes [ Upstream commit 1cef94c36bd4d79b5ae3a3df99ee0d76d6a4a6dc ] This is the longest boot string that silo supports. Signed-off-by: Dave Kleikamp Cc: Bob Picco Cc: David S. Miller Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller commit c973669c3e04dbab2c7c4cd74e6d778202d1a5d4 Author: David S. Miller Date: Sat Oct 18 23:12:33 2014 -0400 sparc64: Do not define thread fpregs save area as zero-length array. [ Upstream commit e2653143d7d79a49f1a961aeae1d82612838b12c ] This breaks the stack end corruption detection facility. What that facility does it write a magic value to "end_of_stack()" and checking to see if it gets overwritten. "end_of_stack()" is "task_thread_info(p) + 1", which for sparc64 is the beginning of the FPU register save area. So once the user uses the FPU, the magic value is overwritten and the debug checks trigger. Fix this by making the size explicit. Due to the size we use for the fpsaved[], gsr[], and xfsr[] arrays we are limited to 7 levels of FPU state saves. So each FPU register set is 256 bytes, allocate 256 * 7 for the fpregs area. Reported-by: Meelis Roos Signed-off-by: David S. Miller commit df4ef7bad4c314681dcfed13be5d959d47a3dfc9 Author: David S. Miller Date: Tue Oct 14 19:37:58 2014 -0700 sparc64: Fix FPU register corruption with AES crypto offload. [ Upstream commit f4da3628dc7c32a59d1fb7116bb042e6f436d611 ] The AES loops in arch/sparc/crypto/aes_glue.c use a scheme where the key material is preloaded into the FPU registers, and then we loop over and over doing the crypt operation, reusing those pre-cooked key registers. There are intervening blkcipher*() calls between the crypt operation calls. And those might perform memcpy() and thus also try to use the FPU. The sparc64 kernel FPU usage mechanism is designed to allow such recursive uses, but with a catch. There has to be a trap between the two FPU using threads of control. The mechanism works by, when the FPU is already in use by the kernel, allocating a slot for FPU saving at trap time. Then if, within the trap handler, we try to use the FPU registers, the pre-trap FPU register state is saved into the slot. Then at trap return time we notice this and restore the pre-trap FPU state. Over the long term there are various more involved ways we can make this work, but for a quick fix let's take advantage of the fact that the situation where this happens is very limited. All sparc64 chips that support the crypto instructiosn also are using the Niagara4 memcpy routine, and that routine only uses the FPU for large copies where we can't get the source aligned properly to a multiple of 8 bytes. We look to see if the FPU is already in use in this context, and if so we use the non-large copy path which only uses integer registers. Furthermore, we also limit this special logic to when we are doing kernel copy, rather than a user copy. Signed-off-by: David S. Miller commit 91d02cb077c3a3984316a530bf65c240e5601901 Author: David S. Miller Date: Fri Oct 10 15:49:16 2014 -0400 sparc64: Fix lockdep warnings on reboot on Ultra-5 [ Upstream commit bdcf81b658ebc4c2640c3c2c55c8b31c601b6996 ] Inconsistently, the raw_* IRQ routines do not interact with and update the irqflags tracing and lockdep state, whereas the raw_* spinlock interfaces do. This causes problems in p1275_cmd_direct() because we disable hardirqs by hand using raw_local_irq_restore() and then do a raw_spin_lock() which triggers a lockdep trace because the CPU's hw IRQ state doesn't match IRQ tracing's internal software copy of that state. The CPU's irqs are disabled, yet current->hardirqs_enabled is true. ==================== reboot: Restarting system ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3536 check_flags+0x7c/0x240() DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) Modules linked in: openpromfs CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: G W 3.17.0-dirty #145 Call Trace: [000000000045919c] warn_slowpath_common+0x5c/0xa0 [0000000000459210] warn_slowpath_fmt+0x30/0x40 [000000000048f41c] check_flags+0x7c/0x240 [0000000000493280] lock_acquire+0x20/0x1c0 [0000000000832b70] _raw_spin_lock+0x30/0x60 [000000000068f2fc] p1275_cmd_direct+0x1c/0x60 [000000000068ed28] prom_reboot+0x28/0x40 [000000000043610c] machine_restart+0x4c/0x80 [000000000047d2d4] kernel_restart+0x54/0x80 [000000000047d618] SyS_reboot+0x138/0x200 [00000000004060b4] linux_sparc_syscall32+0x34/0x60 ---[ end trace 5c439fe81c05a100 ]--- possible reason: unannotated irqs-off. irq event stamp: 2010267 hardirqs last enabled at (2010267): [<000000000049a358>] vprintk_emit+0x4b8/0x580 hardirqs last disabled at (2010266): [<0000000000499f08>] vprintk_emit+0x68/0x580 softirqs last enabled at (2010046): [<000000000045d278>] __do_softirq+0x378/0x4a0 softirqs last disabled at (2010039): [<000000000042bf08>] do_softirq_own_stack+0x28/0x40 Resetting ... ==================== Use local_* variables of the hw IRQ interfaces so that IRQ tracing sees all of our changes. Reported-by: Meelis Roos Tested-by: Meelis Roos Signed-off-by: David S. Miller commit 9d31b733f3f2dbdfa3113e432b721d209276699b Author: David S. Miller Date: Sat Oct 4 21:05:14 2014 -0700 sparc64: Fix reversed start/end in flush_tlb_kernel_range() [ Upstream commit 473ad7f4fb005d1bb727e4ef27d370d28703a062 ] When we have to split up a flush request into multiple pieces (in order to avoid the firmware range) we don't specify the arguments in the right order for the second piece. Fix the order, or else we get hangs as the code tries to flush "a lot" of entries and we get lockups like this: [ 4422.981276] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [expect:117032] [ 4422.996130] Modules linked in: ipv6 loop usb_storage igb ptp sg sr_mod ehci_pci ehci_hcd pps_core n2_rng rng_core [ 4423.016617] CPU: 12 PID: 117032 Comm: expect Not tainted 3.17.0-rc4+ #1608 [ 4423.030331] task: fff8003cc730e220 ti: fff8003d99d54000 task.ti: fff8003d99d54000 [ 4423.045282] TSTATE: 0000000011001602 TPC: 00000000004521e8 TNPC: 00000000004521ec Y: 00000000 Not tainted [ 4423.064905] TPC: <__flush_tlb_kernel_range+0x28/0x40> [ 4423.074964] g0: 000000000052fd10 g1: 00000001295a8000 g2: ffffff7176ffc000 g3: 0000000000002000 [ 4423.092324] g4: fff8003cc730e220 g5: fff8003dfedcc000 g6: fff8003d99d54000 g7: 0000000000000006 [ 4423.109687] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000000000003 o3: 00000000f0000000 [ 4423.127058] o4: 0000000000000080 o5: 00000001295a8000 sp: fff8003d99d56d01 ret_pc: 000000000052ff54 [ 4423.145121] RPC: <__purge_vmap_area_lazy+0x314/0x3a0> [ 4423.155185] l0: 0000000000000000 l1: 0000000000000000 l2: 0000000000a38040 l3: 0000000000000000 [ 4423.172559] l4: fff8003dae8965e0 l5: ffffffffffffffff l6: 0000000000000000 l7: 00000000f7e2b138 [ 4423.189913] i0: fff8003d99d576a0 i1: fff8003d99d576a8 i2: fff8003d99d575e8 i3: 0000000000000000 [ 4423.207284] i4: 0000000000008008 i5: fff8003d99d575c8 i6: fff8003d99d56df1 i7: 0000000000530c24 [ 4423.224640] I7: [ 4423.234193] Call Trace: [ 4423.239051] [0000000000530c24] free_vmap_area_noflush+0x64/0x80 [ 4423.251029] [0000000000531a7c] remove_vm_area+0x5c/0x80 [ 4423.261628] [0000000000531b80] __vunmap+0x20/0x120 [ 4423.271352] [000000000071cf18] n_tty_close+0x18/0x40 [ 4423.281423] [00000000007222b0] tty_ldisc_close+0x30/0x60 [ 4423.292183] [00000000007225a4] tty_ldisc_reinit+0x24/0xa0 [ 4423.303120] [0000000000722ab4] tty_ldisc_hangup+0xd4/0x1e0 [ 4423.314232] [0000000000719aa0] __tty_hangup+0x280/0x3c0 [ 4423.324835] [0000000000724cb4] pty_close+0x134/0x1a0 [ 4423.334905] [000000000071aa24] tty_release+0x104/0x500 [ 4423.345316] [00000000005511d0] __fput+0x90/0x1e0 [ 4423.354701] [000000000047fa54] task_work_run+0x94/0xe0 [ 4423.365126] [0000000000404b44] __handle_signal+0xc/0x2c Fixes: 4ca9a23765da ("sparc64: Guard against flushing openfirmware mappings.") Signed-off-by: David S. Miller commit 9e9b1c45d28215e49d10e3595cf4ee67480adf2f Author: Andreas Larsson Date: Fri Aug 29 17:08:21 2014 +0200 sparc: Let memset return the address argument [ Upstream commit 74cad25c076a2f5253312c2fe82d1a4daecc1323 ] This makes memset follow the standard (instead of returning 0 on success). This is needed when certain versions of gcc optimizes around memset calls and assume that the address argument is preserved in %o0. Signed-off-by: Andreas Larsson Signed-off-by: David S. Miller commit 0f4b3a2ea626749e2a04181d2488876b5460463b Author: Sowmini Varadhan Date: Tue Sep 16 11:37:08 2014 -0400 sparc64: Move request_irq() from ldc_bind() to ldc_alloc() [ Upstream commit c21c4ab0d6921f7160a43216fa6973b5924de561 ] The request_irq() needs to be done from ldc_alloc() to avoid the following (caught by lockdep) [00000000004a0738] __might_sleep+0xf8/0x120 [000000000058bea4] kmem_cache_alloc_trace+0x184/0x2c0 [00000000004faf80] request_threaded_irq+0x80/0x160 [000000000044f71c] ldc_bind+0x7c/0x220 [0000000000452454] vio_port_up+0x54/0xe0 [00000000101f6778] probe_disk+0x38/0x220 [sunvdc] [00000000101f6b8c] vdc_port_probe+0x22c/0x300 [sunvdc] [0000000000451a88] vio_device_probe+0x48/0x60 [000000000074c56c] really_probe+0x6c/0x300 [000000000074c83c] driver_probe_device+0x3c/0xa0 [000000000074c92c] __driver_attach+0x8c/0xa0 [000000000074a6ec] bus_for_each_dev+0x6c/0xa0 [000000000074c1dc] driver_attach+0x1c/0x40 [000000000074b0fc] bus_add_driver+0xbc/0x280 Signed-off-by: Sowmini Varadhan Acked-by: Dwight Engen Signed-off-by: David S. Miller commit b1f46334729ec1ee534049b98cae2f1bdb846f23 Author: bob picco Date: Tue Sep 16 09:28:15 2014 -0400 sparc64: find_node adjustment [ Upstream commit 3dee9df54836d5f844f3d58281d3f3e6331b467f ] We have seen an issue with guest boot into LDOM that causes early boot failures because of no matching rules for node identitity of the memory. I analyzed this on my T4 and concluded there might not be a solution. I saw the issue in mainline too when booting into the control/primary domain - with guests configured. Note, this could be a firmware bug on some older machines. I'll provide a full explanation of the issues below. Should we not find a matching BEST latency group for a real address (RA) then we will assume node 0. On the T4-2 here with the information provided I can't see an alternative. Technically the LDOM shown below should match the MBLOCK to the favorable latency group. However other factors must be considered too. Were the memory controllers configured "fine" grained interleave or "coarse" grain interleaved - T4. Also should a "group" MD node be considered a NUMA node? There has to be at least one Machine Description (MD) "group" and hence one NUMA node. The group can have one or more latency groups (lg) - more than one memory controller. The current code chooses the smallest latency as the most favorable per group. The latency and lg information is in MLGROUP below. MBLOCK is the base and size of the RAs for the machine as fetched from OBP /memory "available" property. My machine has one MBLOCK but more would be possible - with holes? For a T4-2 the following information has been gathered: with LDOM guest MEMBLOCK configuration: memory size = 0x27f870000 memory.cnt = 0x3 memory[0x0] [0x00000020400000-0x0000029fc67fff], 0x27f868000 bytes memory[0x1] [0x0000029fd8a000-0x0000029fd8bfff], 0x2000 bytes memory[0x2] [0x0000029fd92000-0x0000029fd97fff], 0x6000 bytes reserved.cnt = 0x2 reserved[0x0] [0x00000020800000-0x000000216c15c0], 0xec15c1 bytes reserved[0x1] [0x00000024800000-0x0000002c180c1e], 0x7980c1f bytes MBLOCK[0]: base[20000000] size[280000000] offset[0] (note: "base" and "size" reported in "MBLOCK" encompass the "memory[X]" values) (note: (RA + offset) & mask = val is the formula to detect a match for the memory controller. should there be no match for find_node node, a return value of -1 resulted for the node - BAD) There is one group. It has these forward links MLGROUP[1]: node[545] latency[1f7e8] match[200000000] mask[200000000] MLGROUP[2]: node[54d] latency[2de60] match[0] mask[200000000] NUMA NODE[0]: node[545] mask[200000000] val[200000000] (latency[1f7e8]) (note: "val" is the best lg's (smallest latency) "match") no LDOM guest - bare metal MEMBLOCK configuration: memory size = 0xfdf2d0000 memory.cnt = 0x3 memory[0x0] [0x00000020400000-0x00000fff6adfff], 0xfdf2ae000 bytes memory[0x1] [0x00000fff6d2000-0x00000fff6e7fff], 0x16000 bytes memory[0x2] [0x00000fff766000-0x00000fff771fff], 0xc000 bytes reserved.cnt = 0x2 reserved[0x0] [0x00000020800000-0x00000021a04580], 0x1204581 bytes reserved[0x1] [0x00000024800000-0x0000002c7d29fc], 0x7fd29fd bytes MBLOCK[0]: base[20000000] size[fe0000000] offset[0] there are two groups group node[16d5] MLGROUP[0]: node[1765] latency[1f7e8] match[0] mask[200000000] MLGROUP[3]: node[177d] latency[2de60] match[200000000] mask[200000000] NUMA NODE[0]: node[1765] mask[200000000] val[0] (latency[1f7e8]) group node[171d] MLGROUP[2]: node[1775] latency[2de60] match[0] mask[200000000] MLGROUP[1]: node[176d] latency[1f7e8] match[200000000] mask[200000000] NUMA NODE[1]: node[176d] mask[200000000] val[200000000] (latency[1f7e8]) (note: for this two "group" bare metal machine, 1/2 memory is in group one's lg and 1/2 memory is in group two's lg). Cc: sparclinux@vger.kernel.org Signed-off-by: Bob Picco Signed-off-by: David S. Miller commit af77bf8ebf6f70baea97d6aceec0c82aec74cb87 Author: David S. Miller Date: Sat Oct 18 23:03:09 2014 -0400 sparc64: Fix corrupted thread fault code. [ Upstream commit 84bd6d8b9c0f06b3f188efb479c77e20f05e9a8a ] Every path that ends up at do_sparc64_fault() must install a valid FAULT_CODE_* bitmask in the per-thread fault code byte. Two paths leading to the label winfix_trampoline (which expects the FAULT_CODE_* mask in register %g4) were not doing so: 1) For pre-hypervisor TLB protection violation traps, if we took the 'winfix_trampoline' path we wouldn't have %g4 initialized with the FAULT_CODE_* value yet. Resulting in using the TLB_TAG_ACCESS register address value instead. 2) In the TSB miss path, when we notice that we are going to use a hugepage mapping, but we haven't allocated the hugepage TSB yet, we still have to take the window fixup case into consideration and in that particular path we leave %g4 not setup properly. Errors on this sort were largely invisible previously, but after commit 4ccb9272892c33ef1c19a783cfa87103b30c2784 ("sparc64: sun4v TLB error power off events") we now have a fault_code mask bit (FAULT_CODE_BAD_RA) that triggers due to this bug. FAULT_CODE_BAD_RA triggers because this bit is set in TLB_TAG_ACCESS (see #1 above) and thus we get seemingly random bus errors triggered for user processes. Fixes: 4ccb9272892c ("sparc64: sun4v TLB error power off events") Reported-by: Meelis Roos Signed-off-by: David S. Miller commit c32179b21810cb0be5511b8b34ec77a6f99a7216 Author: bob picco Date: Tue Sep 16 09:26:47 2014 -0400 sparc64: sun4v TLB error power off events [ Upstream commit 4ccb9272892c33ef1c19a783cfa87103b30c2784 ] We've witnessed a few TLB events causing the machine to power off because of prom_halt. In one case it was some nfs related area during rmmod. Another was an mmapper of /dev/mem. A more recent one is an ITLB issue with a bad pagesize which could be a hardware bug. Bugs happen but we should attempt to not power off the machine and/or hang it when possible. This is a DTLB error from an mmapper of /dev/mem: [root@sparcie ~]# SUN4V-DTLB: Error at TPC[fffff80100903e6c], tl 1 SUN4V-DTLB: TPC<0xfffff80100903e6c> SUN4V-DTLB: O7[fffff801081979d0] SUN4V-DTLB: O7<0xfffff801081979d0> SUN4V-DTLB: vaddr[fffff80100000000] ctx[1250] pte[98000000000f0610] error[2] . This is recent mainline for ITLB: [ 3708.179864] SUN4V-ITLB: TPC<0xfffffc010071cefc> [ 3708.188866] SUN4V-ITLB: O7[fffffc010071cee8] [ 3708.197377] SUN4V-ITLB: O7<0xfffffc010071cee8> [ 3708.206539] SUN4V-ITLB: vaddr[e0003] ctx[1a3c] pte[2900000dcc800eeb] error[4] . Normally sun4v_itlb_error_report() and sun4v_dtlb_error_report() would call prom_halt() and drop us to OF command prompt "ok". This isn't the case for LDOMs and the machine powers off. For the HV reported error of HV_ENORADDR for HV HV_MMU_MAP_ADDR_TRAP we cause a SIGBUS error by qualifying it within do_sparc64_fault() for fault code mask of FAULT_CODE_BAD_RA. This is done when trap level (%tl) is less or equal one("1"). Otherwise, for %tl > 1, we proceed eventually to die_if_kernel(). The logic of this patch was partially inspired by David Miller's feedback. Power off of large sparc64 machines is painful. Plus die_if_kernel provides more context. A reset sequence isn't a brief period on large sparc64 but better than power-off/power-on sequence. Cc: sparclinux@vger.kernel.org Signed-off-by: Bob Picco Signed-off-by: David S. Miller commit 7a55c4d1f2952a793bc3614b212a2fa5fccfc42b Author: Daniel Hellstrom Date: Wed Sep 10 14:17:52 2014 +0200 sparc32: dma_alloc_coherent must honour gfp flags [ Upstream commit d1105287aabe88dbb3af825140badaa05cf0442c ] dma_zalloc_coherent() calls dma_alloc_coherent(__GFP_ZERO) but the sparc32 implementations sbus_alloc_coherent() and pci32_alloc_coherent() doesn't take the gfp flags into account. Tested on the SPARC32/LEON GRETH Ethernet driver which fails due to dma_alloc_coherent(__GFP_ZERO) returns non zeroed pages. Signed-off-by: Daniel Hellstrom Signed-off-by: David S. Miller commit dd55fec8c50bbc799ddef0e0adb8a980c074e28d Author: David S. Miller Date: Mon Aug 11 15:38:46 2014 -0700 sparc64: Fix pcr_ops initialization and usage bugs. [ Upstream commit 8bccf5b313180faefce38e0d1140f76e0f327d28 ] Christopher reports that perf_event_print_debug() can crash in uniprocessor builds. The crash is due to pcr_ops being NULL. This happens because pcr_arch_init() is only invoked by smp_cpus_done() which only executes in SMP builds. init_hw_perf_events() is closely intertwined with pcr_ops being setup properly, therefore: 1) Call pcr_arch_init() early on from init_hw_perf_events(), instead of from smp_cpus_done(). 2) Do not hook up a PMU type if pcr_ops is NULL after pcr_arch_init(). 3) Move init_hw_perf_events to a later initcall so that it we will be sure to invoke pcr_arch_init() after all cpus are brought up. Finally, guard the one naked sequence of pcr_ops dereferences in __global_pmu_self() with an appropriate NULL check. Reported-by: Christopher Alexander Tobias Schulze Signed-off-by: David S. Miller commit adb54d2a80797cbcc639217ec3eb2315faecb687 Author: David S. Miller Date: Mon Aug 11 20:45:01 2014 -0700 sparc64: Do not disable interrupts in nmi_cpu_busy() [ Upstream commit 58556104e9cd0107a7a8d2692cf04ef31669f6e4 ] nmi_cpu_busy() is a SMP function call that just makes sure that all of the cpus are spinning using cpu cycles while the NMI test runs. It does not need to disable IRQs because we just care about NMIs executing which will even with 'normal' IRQs disabled. It is not legal to enable hard IRQs in a SMP cross call, in fact this bug triggers the BUG check in irq_work_run_list(): BUG_ON(!irqs_disabled()); Because now irq_work_run() is invoked from the tail of generic_smp_call_function_single_interrupt(). Signed-off-by: David S. Miller commit 53784254d9657ed4c643e755ec1c574a6a43410f Author: Arjun Sreedharan Date: Mon Aug 18 11:17:33 2014 +0530 usb: phy: return -ENODEV on failure of try_module_get commit 2c4e3dbf63b39d44a291db70016c718f45d9cd46 upstream. When __usb_find_phy_dev() does not return error and try_module_get() fails, return -ENODEV. Signed-off-by: Arjun Sreedharan Signed-off-by: Felipe Balbi Signed-off-by: Jiri Slaby commit 297b3ddd679ac4e4958661df855595bb49c42a18 Author: Michal Kubeček Date: Mon Aug 25 15:16:22 2014 +0200 net: fix checksum features handling in netif_skb_features() commit db115037bb57cdfe97078b13da762213f7980e81 upstream. This is follow-up to da08143b8520 ("vlan: more careful checksum features handling") which introduced more careful feature intersection in vlan code, taking into account that HW_CSUM should be considered superset of IP_CSUM/IPV6_CSUM. The same is needed in netif_skb_features() in order to avoid offloading mismatch warning when vlan is created on top of a bond consisting of slaves supporting IP/IPv6 checksumming but not vlan Tx offloading. Signed-off-by: Michal Kubecek Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby commit 57c68d49b58d2c6c942dbc89d2c4fec2041e0e9e Author: Slava Pestov Date: Fri Jul 11 12:17:41 2014 -0700 bcache: fix crash with incomplete cache set commit bf0c55c986540483c34ca640f2eef4c3314388b1 upstream. Change-Id: I6abde52afe917633480caaf4e2518f42a816d886 Signed-off-by: Jiri Slaby commit 2c30554fdfb22f41d66d6c5f4114147bc60a5fd9 Author: Slava Pestov Date: Thu Jun 19 15:05:59 2014 -0700 bcache: fix memory corruption in init error path commit c9a78332b42cbdcdd386a95192a716b67d1711a4 upstream. If register_cache_set() failed, we would touch ca->set after it had already been freed. Also, fix an assertion to catch this. Change-Id: I748e5f5b223e2d9b2602075dec2f997cced2394d Signed-off-by: Jiri Slaby commit 64521124425139ca5a5420660d2408b4f1c4770a Author: Surbhi Palande Date: Thu Apr 17 12:07:04 2014 -0700 bcache: Correct printing of btree_gc_max_duration_ms commit 5b25abade29616d42d60f9bd5e6a5ad07f7314e3 upstream. time_stats::btree_gc_max_duration_mc is not bit shifted by 8 Fixes BUG #138 Change-Id: I44fc6e1d0579674016acc533f1a546b080e5371a Signed-off-by: Surbhi Palande Signed-off-by: Jiri Slaby commit 56a60dd28baa8b6e38f16d54c60370002dd690e9 Author: Al Viro Date: Thu Jun 12 00:29:13 2014 -0400 lock_parent: don't step on stale ->d_parent of all-but-freed one commit c2338f2dc7c1e9f6202f370c64ffd7f44f3d4b51 upstream. Dentry that had been through (or into) __dentry_kill() might be seen by shrink_dentry_list(); that's normal, it'll be taken off the shrink list and freed if __dentry_kill() has already finished. The problem is, its ->d_parent might be pointing to already freed dentry, so lock_parent() needs to be careful. We need to check that dentry hasn't already gone into __dentry_kill() *and* grab rcu_read_lock() before dropping ->d_lock - the latter makes sure that whatever we see in ->d_parent after dropping ->d_lock it won't be freed until we drop rcu_read_lock(). Signed-off-by: Al Viro Signed-off-by: Jiri Slaby commit 3fc5a7a953bd4cbbb7e0b6c56fa3c8c91a4aa22c Author: Linus Torvalds Date: Sat May 31 09:13:21 2014 -0700 dcache: add missing lockdep annotation commit 9f12600fe425bc28f0ccba034a77783c09c15af4 upstream. lock_parent() very much on purpose does nested locking of dentries, and is careful to maintain the right order (lock parent first). But because it didn't annotate the nested locking order, lockdep thought it might be a deadlock on d_lock, and complained. Add the proper annotation for the inner locking of the child dentry to make lockdep happy. Introduced by commit 046b961b45f9 ("shrink_dentry_list(): take parent's ->d_lock earlier"). Reported-and-tested-by: Josh Boyer Cc: Al Viro Signed-off-by: Linus Torvalds Signed-off-by: Jiri Slaby commit a2d9d64978050f5cbf03292fa2656837b4a53918 Author: Al Viro Date: Thu May 29 09:18:26 2014 -0400 dentry_kill() doesn't need the second argument now commit 8cbf74da435d1bd13dbb790f94c7ff67b2fb6af4 upstream. it's 1 in the only remaining caller. Signed-off-by: Al Viro Signed-off-by: Jiri Slaby commit 641955231ecc5ddaeb2e6b917dd2e606359af6d3 Author: Al Viro Date: Thu May 29 09:11:45 2014 -0400 dealing with the rest of shrink_dentry_list() livelock commit b2b80195d8829921506880f6dccd21cabd163d0d upstream. We have the same problem with ->d_lock order in the inner loop, where we are dropping references to ancestors. Same solution, basically - instead of using dentry_kill() we use lock_parent() (introduced in the previous commit) to get that lock in a safe way, recheck ->d_count (in case if lock_parent() has ended up dropping and retaking ->d_lock and somebody managed to grab a reference during that window), trylock the inode->i_lock and use __dentry_kill() to do the rest. Signed-off-by: Al Viro Signed-off-by: Jiri Slaby commit f0d0762945b54220e4dc34672f8841918a81cd95 Author: Al Viro Date: Thu May 29 08:54:52 2014 -0400 shrink_dentry_list(): take parent's ->d_lock earlier commit 046b961b45f93a92e4c70525a12f3d378bced130 upstream. The cause of livelocks there is that we are taking ->d_lock on dentry and its parent in the wrong order, forcing us to use trylock on the parent's one. d_walk() takes them in the right order, and unfortunately it's not hard to create a situation when shrink_dentry_list() can't make progress since trylock keeps failing, and shrink_dcache_parent() or check_submounts_and_drop() keeps calling d_walk() disrupting the very shrink_dentry_list() it's waiting for. Solution is straightforward - if that trylock fails, let's unlock the dentry itself and take locks in the right order. We need to stabilize ->d_parent without holding ->d_lock, but that's doable using RCU. And we'd better do that in the very beginning of the loop in shrink_dentry_list(), since the checks on refcount, etc. would need to be redone anyway. That deals with a half of the problem - killing dentries on the shrink list itself. Another one (dropping their parents) is in the next commit. locking parent is interesting - it would be easy to do rcu_read_lock(), lock whatever we think is a parent, lock dentry itself and check if the parent is still the right one. Except that we need to check that *before* locking the dentry, or we are risking taking ->d_lock out of order. Fortunately, once the D1 is locked, we can check if D2->d_parent is equal to D1 without the need to lock D2; D2->d_parent can start or stop pointing to D1 only under D1->d_lock, so taking D1->d_lock is enough. In other words, the right solution is rcu_read_lock/lock what looks like parent right now/check if it's still our parent/rcu_read_unlock/lock the child. Signed-off-by: Al Viro Signed-off-by: Jiri Slaby commit 01980970e78a8f8ef6baf5a2745b3370fba2d986 Author: Al Viro Date: Wed May 28 13:59:13 2014 -0400 expand dentry_kill(dentry, 0) in shrink_dentry_list() commit ff2fde9929feb2aef45377ce56b8b12df85dda69 upstream. Result will be massaged to saner shape in the next commits. It is ugly, no questions - the point of that one is to be a provably equivalent transformation (and it might be worth splitting a bit more). Signed-off-by: Al Viro Signed-off-by: Jiri Slaby commit f6df1e4d6e6599607d2016edb2160f2748a61905 Author: Al Viro Date: Wed May 28 13:51:12 2014 -0400 split dentry_kill() commit e55fd011549eae01a230e3cace6f4d031b6a3453 upstream. ... into trylocks and everything else. The latter (actual killing) is __dentry_kill(). Signed-off-by: Al Viro Signed-off-by: Jiri Slaby commit e69a2ca7917a453a4cba8be4e722908c147ed16d Author: Al Viro Date: Wed May 28 09:48:44 2014 -0400 lift the "already marked killed" case into shrink_dentry_list() commit 64fd72e0a44bdd62c5ca277cb24d0d02b2d8e9dc upstream. It can happen only when dentry_kill() is called with unlock_on_failure equal to 0 - other callers had dentry pinned until the moment they've got ->d_lock and DCACHE_DENTRY_KILLED is set only after lockref_mark_dead(). IOW, only one of three call sites of dentry_kill() might end up reaching that code. Just move it there. Signed-off-by: Al Viro Signed-off-by: Jiri Slaby commit 72b7f65091199b7ae38ad658c5ce8246f1496976 Author: Miklos Szeredi Date: Fri May 2 15:38:39 2014 -0400 dcache: don't need rcu in shrink_dentry_list() commit 60942f2f235ce7b817166cdf355eed729094834d upstream. Since now the shrink list is private and nobody can free the dentry while it is on the shrink list, we can remove RCU protection from this. Signed-off-by: Miklos Szeredi Signed-off-by: Al Viro Signed-off-by: Jiri Slaby commit 2c52637bf31e69e2025cd6ad088b7cb1b726ad7b Author: Al Viro Date: Fri May 2 20:36:10 2014 -0400 more graceful recovery in umount_collect() commit 9c8c10e262e0f62cb2530f1b076de979123183dd upstream. Start with shrink_dcache_parent(), then scan what remains. First of all, BUG() is very much an overkill here; we are holding ->s_umount, and hitting BUG() means that a lot of interesting stuff will be hanging after that point (sync(2), for example). Moreover, in cases when there had been more than one leak, we'll be better off reporting all of them. And more than just the last component of pathname - %pd is there for just such uses... That was the last user of dentry_lru_del(), so kill it off... Signed-off-by: Al Viro Signed-off-by: Jiri Slaby commit 8cb08644a8e14875d08c757fe111e15f20f6ecc0 Author: Al Viro Date: Sat May 3 00:02:25 2014 -0400 don't remove from shrink list in select_collect() commit fe91522a7ba82ca1a51b07e19954b3825e4aaa22 upstream. If we find something already on a shrink list, just increment data->found and do nothing else. Loops in shrink_dcache_parent() and check_submounts_and_drop() will do the right thing - everything we did put into our list will be evicted and if there had been nothing, but data->found got non-zero, well, we have somebody else shrinking those guys; just try again. Signed-off-by: Al Viro Signed-off-by: Jiri Slaby commit 11662c7b4621457dd0fccd3bba23a4d9cee2f722 Author: Al Viro Date: Thu May 1 10:30:00 2014 -0400 dentry_kill(): don't try to remove from shrink list commit 41edf278fc2f042f4e22a12ed87d19c5201210e1 upstream. If the victim in on the shrink list, don't remove it from there. If shrink_dentry_list() manages to remove it from the list before we are done - fine, we'll just free it as usual. If not - mark it with new flag (DCACHE_MAY_FREE) and leave it there. Eventually, shrink_dentry_list() will get to it, remove the sucker from shrink list and call dentry_kill(dentry, 0). Which is where we'll deal with freeing. Since now dentry_kill(dentry, 0) may happen after or during dentry_kill(dentry, 1), we need to recognize that (by seeing DCACHE_DENTRY_KILLED already set), unlock everything and either free the sucker (in case DCACHE_MAY_FREE has been set) or leave it for ongoing dentry_kill(dentry, 1) to deal with. Signed-off-by: Al Viro Signed-off-by: Jiri Slaby commit 1be1a028ca20541434bf93ae5961480e34974f05 Author: Al Viro Date: Tue Apr 29 23:42:52 2014 -0400 expand the call of dentry_lru_del() in dentry_kill() commit 01b6035190b024240a43ac1d8e9c6f964f5f1c63 upstream. Signed-off-by: Al Viro Signed-off-by: Jiri Slaby commit e2dab9fb82b262313ecd36454beea93adb73aea3 Author: Al Viro Date: Tue Apr 29 23:40:14 2014 -0400 new helper: dentry_free() commit b4f0354e968f5fabd39bc85b99fedae4a97589fe upstream. The part of old d_free() that dealt with actual freeing of dentry. Taken out of dentry_kill() into a separate function. Signed-off-by: Al Viro Signed-off-by: Jiri Slaby commit 84a43622f3c578b78663cc2742a27e4d2613c697 Author: Al Viro Date: Tue Apr 29 16:13:18 2014 -0400 fold try_prune_one_dentry() commit 5c47e6d0ad608987b91affbcf7d1fc12dfbe8fb4 upstream. Signed-off-by: Al Viro Signed-off-by: Jiri Slaby commit 5d78d22291f373ccf4658723bc81a8ed8b396639 Author: Al Viro Date: Tue Apr 29 15:45:28 2014 -0400 fold d_kill() and d_free() commit 03b3b889e79cdb6b806fc0ba9be0d71c186bbfaa upstream. Signed-off-by: Al Viro Signed-off-by: Jiri Slaby commit 316845a0eb891d1705917d3139a1005910563e5b Author: Al Viro Date: Fri Oct 25 17:04:27 2013 -0400 fold try_to_ascend() into the sole remaining caller commit 31dec1327e377b6d91a8a6c92b5cd8513939a233 upstream. There used to be a bunch of tree-walkers in dcache.c, all alike. try_to_ascend() had been introduced to abstract a piece of logics duplicated in all of them. These days all these tree-walkers are implemented via the same iterator (d_walk()), which is the only remaining caller of try_to_ascend(), so let's fold it back... Signed-off-by: Al Viro Signed-off-by: Jiri Slaby commit e7f269e225873c08ec13c4f56d6dd2aa780a884c Author: Al Viro Date: Fri Oct 4 11:09:01 2013 -0400 fold __d_shrink() into its only remaining caller commit b61625d24596ea44555943867d5a5c1efd81074c upstream. Signed-off-by: Al Viro Signed-off-by: Jiri Slaby commit 346c766f1df256d3b5a8fddbcfd1ac9240d8dbc8 Author: Al Viro Date: Fri Nov 8 12:31:16 2013 -0500 switch shrink_dcache_for_umount() to use of d_walk() commit 42c326082d8a2c91506f951ace638deae1faf083 upstream. we have too many iterators in fs/dcache.c... Signed-off-by: Al Viro Signed-off-by: Jiri Slaby commit c3b8cdf7030f3f59fb36b15a8ee6f2d7e1df8c70 Author: Sanjeev Sharma Date: Tue Aug 12 12:10:21 2014 +0530 uas: replace WARN_ON_ONCE() with lockdep_assert_held() commit ab945eff8396bc3329cc97274320e8d2c6585077 upstream. on some architecture spin_is_locked() always return false in uniprocessor configuration and therefore it would be advise to replace with lockdep_assert_held(). Signed-off-by: Sanjeev Sharma Acked-by: Hans de Goede Signed-off-by: Greg Kroah-Hartman Signed-off-by: Jiri Slaby commit 941cc4ffa7252e3f99396b1e2743a1f83aef053c Author: Mark Knibbs Date: Tue Sep 23 11:20:17 2014 +0100 storage: Add quirks for Castlewood and Double-H USB-SCSI converters commit 57cde01a7b8111cdd43b6a261763aad1ead8161c upstream. Castlewood Systems supplied various models of USB-SCSI converter with their ORB external removable-media drive. The ORB Windows and Macintosh drivers support six USB IDs: 084B:A001 [VID 084B is Castlewood Systems] 04E6:0002 (*) ORB USB Smart Cable P/N 88205-001 (generic SCM ID) 2027:A001 Double-H Technology DH-2000SC 1822:0001 (*) Ariston iConnect/iSCSI 07AF:0004 (*) Microtech XpressSCSI (25-pin) 07AF:0005 (*) Microtech XpressSCSI (50-pin) *: quirk already in unusual-devs.h [Apparently the official VID for Double-H Technology is 0x07EB = 2027 decimal. That's another hex/decimal mix-up with these SCM-based products (in addition to the Ariston and Entrega ones). Perhaps the USB-IF informed companies of their allocated VID in decimal, but they assumed it was hex? It seems all Entrega products used VID 0x1645, not just the USB-SCSI converter.] Double-H Technology Co., Ltd. produced a USB-SCSI converter, model DH-2000SC, which is probably the one supported by the ORB drivers. Perhaps the Castlewood-bundled product had a different label or PID though? Castlewood mentioned Conmate as being one type of USB-SCSI converter. Conmate and Double-H seem related somehow; both company addresses in the same road, and at one point the Conmate web site mentioned DH-2000H4, DH-200D4/DH-2000C4 as models of USB hub (DH short for Double-H presumably). Conmate did show a USB-SCSI converter model CM-660 on their web site at one point. My guess is that was identical to the DH-2000SC. Mention of the Double-H product: http://web.archive.org/web/20010221010141/http://www.doubleh.com.tw/dh-2000sc.htm The only picture I could find is at http://jp.acesuppliers.com/catalog/j64/component/page03.html The casing design looks the same as my ORB USB Smart Cable which has ID 04E6:0002. Anyway, that's enough rambling. Here's the patch. storage: Add quirks for Castlewood and Double-H USB-SCSI converters Add quirks for two SCM-based USB-SCSI converters which were bundled with some Castlewood ORB removable drives. Without the quirk only the (single) drive with SCSI ID 0 can be accessed. Signed-off-by: Mark Knibbs Signed-off-by: Greg Kroah-Hartman Signed-off-by: Jiri Slaby commit 372ba9cd9e2c13e9cfcf52ce637e6d31863b8e35 Author: Mark Knibbs Date: Tue Sep 23 12:43:02 2014 +0100 storage: Add quirk for another SCM-based USB-SCSI converter commit 3512e7bfea6a459cad84712a021d856bd78cd7e4 upstream. There is apparently another SCM USB-SCSI converter with ID 04E6:000F. It is listed along with 04E6:000B in the Windows INF file for the Startech ICUSBSCSI2 as "eUSB SCSI Adapter (Bus Powered)". The quirk allows devices with SCSI ID other than 0 to be accessed. Also make a couple of existing SCM product IDs lower case to be consistent with other entries. Signed-off-by: Mark Knibbs Signed-off-by: Greg Kroah-Hartman Signed-off-by: Jiri Slaby commit e25ecd21ffdbf1f8287103ccf97205fc30088344 Author: Lu Baolu Date: Fri Sep 19 10:13:50 2014 +0800 USB: Add device quirk for ASUS T100 Base Station keyboard commit ddbe1fca0bcb87ca8c199ea873a456ca8a948567 upstream. This full-speed USB device generates spurious remote wakeup event as soon as USB_DEVICE_REMOTE_WAKEUP feature is set. As the result, Linux can't enter system suspend and S0ix power saving modes once this keyboard is used. This patch tries to introduce USB_QUIRK_IGNORE_REMOTE_WAKEUP quirk. With this quirk set, wakeup capability will be ignored during device configure. This patch could be back-ported to kernels as old as 2.6.39. Signed-off-by: Lu Baolu Acked-by: Alan Stern Cc: stable Signed-off-by: Greg Kroah-Hartman Signed-off-by: Jiri Slaby commit 84197d64c418b1b9fb9512defa61f57411df1989 Author: Eliad Peller Date: Wed Jun 11 10:23:35 2014 +0300 regulatory: fix misapplied alpha2 fix Upstream commit a5fe8e7695dc3f547e955ad2b662e3e72969e506 (regulatory: add NUL to alpha2) contained a hunk that was supposed to be applied to struct ieee80211_reg_rule. However in stable 3.12 (3.12.31 in particular), it ended up in struct regulatory_request. Fix that now. Signed-off-by: Jiri Slaby Cc: Eliad Peller Cc: Johannes Berg commit 6fd64e7d017a6355e2debeaf5ceb61c2a4c407e1 Author: Pali Rohár Date: Mon Sep 29 15:10:51 2014 +0200 dell-wmi: Fix access out of memory commit a666b6ffbc9b6705a3ced704f52c3fe9ea8bf959 upstream. Without this patch, dell-wmi is trying to access elements of dynamically allocated array without checking the array size. This can lead to memory corruption or a kernel panic. This patch adds the missing checks for array size. Signed-off-by: Pali Rohár Signed-off-by: Darren Hart Signed-off-by: Jiri Slaby commit 8d9c8c3980a85d79db13c5ce4bb118fef32d4f50 Author: Andy Lutomirski Date: Wed Oct 8 12:32:47 2014 -0700 fs: Add a missing permission check to do_umount commit a1480dcc3c706e309a88884723446f2e84fedd5b upstream. Accessing do_remount_sb should require global CAP_SYS_ADMIN, but only one of the two call sites was appropriately protected. Fixes CVE-2014-7975. Signed-off-by: Andy Lutomirski Signed-off-by: Jiri Slaby