commit 4a9b1eb8bc3ba4ad8b3b1aa3317cf8d4a3aaad83 Author: Greg Kroah-Hartman Date: Sun Jul 28 08:28:39 2019 +0200 Linux 5.1.21 commit 51d036eb719b258bd5bc2be3a243a22c7368abad Author: Vlad Buslov Date: Sun Jul 21 17:44:12 2019 +0300 net: sched: verify that q!=NULL before setting q->flags commit 503d81d428bd598430f7f9d02021634e1a8139a0 upstream. In function int tc_new_tfilter() q pointer can be NULL when adding filter on a shared block. With recent change that resets TCQ_F_CAN_BYPASS after filter creation, following NULL pointer dereference happens in case parent block is shared: [ 212.925060] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 212.925445] #PF: supervisor write access in kernel mode [ 212.925709] #PF: error_code(0x0002) - not-present page [ 212.925965] PGD 8000000827923067 P4D 8000000827923067 PUD 827924067 PMD 0 [ 212.926302] Oops: 0002 [#1] SMP KASAN PTI [ 212.926539] CPU: 18 PID: 2617 Comm: tc Tainted: G B 5.2.0+ #512 [ 212.926938] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 212.927364] RIP: 0010:tc_new_tfilter+0x698/0xd40 [ 212.927633] Code: 74 0d 48 85 c0 74 08 48 89 ef e8 03 aa 62 00 48 8b 84 24 a0 00 00 00 48 8d 78 10 48 89 44 24 18 e8 4d 0c 6b ff 48 8b 44 24 18 <83> 60 10 f b 48 85 ed 0f 85 3d fe ff ff e9 4f fe ff ff e8 81 26 f8 [ 212.928607] RSP: 0018:ffff88884fd5f5d8 EFLAGS: 00010296 [ 212.928905] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dffffc0000000000 [ 212.929201] RDX: 0000000000000007 RSI: 0000000000000004 RDI: 0000000000000297 [ 212.929402] RBP: ffff88886bedd600 R08: ffffffffb91d4b51 R09: fffffbfff7616e4d [ 212.929609] R10: fffffbfff7616e4c R11: ffffffffbb0b7263 R12: ffff88886bc61040 [ 212.929803] R13: ffff88884fd5f950 R14: ffffc900039c5000 R15: ffff88835e927680 [ 212.929999] FS: 00007fe7c50b6480(0000) GS:ffff88886f980000(0000) knlGS:0000000000000000 [ 212.930235] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 212.930394] CR2: 0000000000000010 CR3: 000000085bd04002 CR4: 00000000001606e0 [ 212.930588] Call Trace: [ 212.930682] ? tc_del_tfilter+0xa40/0xa40 [ 212.930811] ? __lock_acquire+0x5b5/0x2460 [ 212.930948] ? find_held_lock+0x85/0xa0 [ 212.931081] ? tc_del_tfilter+0xa40/0xa40 [ 212.931201] rtnetlink_rcv_msg+0x4ab/0x5f0 [ 212.931332] ? rtnl_dellink+0x490/0x490 [ 212.931454] ? lockdep_hardirqs_on+0x260/0x260 [ 212.931589] ? netlink_deliver_tap+0xab/0x5a0 [ 212.931717] ? match_held_lock+0x1b/0x240 [ 212.931844] netlink_rcv_skb+0xd0/0x200 [ 212.931958] ? rtnl_dellink+0x490/0x490 [ 212.932079] ? netlink_ack+0x440/0x440 [ 212.932205] ? netlink_deliver_tap+0x161/0x5a0 [ 212.932335] ? lock_downgrade+0x360/0x360 [ 212.932457] ? lock_acquire+0xe5/0x210 [ 212.932579] netlink_unicast+0x296/0x350 [ 212.932705] ? netlink_attachskb+0x390/0x390 [ 212.932834] ? _copy_from_iter_full+0xe0/0x3a0 [ 212.932976] netlink_sendmsg+0x394/0x600 [ 212.937998] ? netlink_unicast+0x350/0x350 [ 212.943033] ? move_addr_to_kernel.part.0+0x90/0x90 [ 212.948115] ? netlink_unicast+0x350/0x350 [ 212.953185] sock_sendmsg+0x96/0xa0 [ 212.958099] ___sys_sendmsg+0x482/0x520 [ 212.962881] ? match_held_lock+0x1b/0x240 [ 212.967618] ? copy_msghdr_from_user+0x250/0x250 [ 212.972337] ? lock_downgrade+0x360/0x360 [ 212.976973] ? rwlock_bug.part.0+0x60/0x60 [ 212.981548] ? __mod_node_page_state+0x1f/0xa0 [ 212.986060] ? match_held_lock+0x1b/0x240 [ 212.990567] ? find_held_lock+0x85/0xa0 [ 212.994989] ? do_user_addr_fault+0x349/0x5b0 [ 212.999387] ? lock_downgrade+0x360/0x360 [ 213.003713] ? find_held_lock+0x85/0xa0 [ 213.007972] ? __fget_light+0xa1/0xf0 [ 213.012143] ? sockfd_lookup_light+0x91/0xb0 [ 213.016165] __sys_sendmsg+0xba/0x130 [ 213.020040] ? __sys_sendmsg_sock+0xb0/0xb0 [ 213.023870] ? handle_mm_fault+0x337/0x470 [ 213.027592] ? page_fault+0x8/0x30 [ 213.031316] ? lockdep_hardirqs_off+0xbe/0x100 [ 213.034999] ? mark_held_locks+0x24/0x90 [ 213.038671] ? do_syscall_64+0x1e/0xe0 [ 213.042297] do_syscall_64+0x74/0xe0 [ 213.045828] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 213.049354] RIP: 0033:0x7fe7c527c7b8 [ 213.052792] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f 0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 54 [ 213.060269] RSP: 002b:00007ffc3f7908a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 213.064144] RAX: ffffffffffffffda RBX: 000000005d34716f RCX: 00007fe7c527c7b8 [ 213.068094] RDX: 0000000000000000 RSI: 00007ffc3f790910 RDI: 0000000000000003 [ 213.072109] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007fe7c5340cc0 [ 213.076113] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000080 [ 213.080146] R13: 0000000000480640 R14: 0000000000000080 R15: 0000000000000000 [ 213.084147] Modules linked in: act_gact cls_flower sch_ingress nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc sunrpc intel_rapl_msr intel_rapl_common [<1;69;32Msb_edac rdma_ucm rdma_cm x86_pkg_temp_thermal iw_cm intel_powerclamp ib_cm coretemp kvm_intel kvm irqbypass mlx5_ib ib_uverbs ib_core crct10dif_pclmul crc32_pc lmul crc32c_intel ghash_clmulni_intel mlx5_core intel_cstate intel_uncore iTCO_wdt igb iTCO_vendor_support mlxfw mei_me ptp ses intel_rapl_perf mei pcspkr ipmi _ssif i2c_i801 joydev enclosure pps_core lpc_ich ioatdma wmi dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ast i2c_algo_bit drm_vram_helpe r ttm drm_kms_helper drm mpt3sas raid_class scsi_transport_sas [ 213.112326] CR2: 0000000000000010 [ 213.117429] ---[ end trace adb58eb0a4ee6283 ]--- Verify that q pointer is not NULL before setting the 'flags' field. Fixes: 3f05e6886a59 ("net_sched: unset TCQ_F_CAN_BYPASS when adding filters") Signed-off-by: Vlad Buslov Acked-by: Jiri Pirko Signed-off-by: David S. Miller Cc: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit b4dd6aae6f930e5d4b02d4b34489621c38ea3158 Author: Kuo-Hsin Yang Date: Thu Jul 11 20:52:04 2019 -0700 mm: vmscan: scan anonymous pages on file refaults commit 2c012a4ad1a2cd3fb5a0f9307b9d219f84eda1fa upstream. When file refaults are detected and there are many inactive file pages, the system never reclaim anonymous pages, the file pages are dropped aggressively when there are still a lot of cold anonymous pages and system thrashes. This issue impacts the performance of applications with large executable, e.g. chrome. With this patch, when file refault is detected, inactive_list_is_low() always returns true for file pages in get_scan_count() to enable scanning anonymous pages. The problem can be reproduced by the following test program. ---8<--- void fallocate_file(const char *filename, off_t size) { struct stat st; int fd; if (!stat(filename, &st) && st.st_size >= size) return; fd = open(filename, O_WRONLY | O_CREAT, 0600); if (fd < 0) { perror("create file"); exit(1); } if (posix_fallocate(fd, 0, size)) { perror("fallocate"); exit(1); } close(fd); } long *alloc_anon(long size) { long *start = malloc(size); memset(start, 1, size); return start; } long access_file(const char *filename, long size, long rounds) { int fd, i; volatile char *start1, *end1, *start2; const int page_size = getpagesize(); long sum = 0; fd = open(filename, O_RDONLY); if (fd == -1) { perror("open"); exit(1); } /* * Some applications, e.g. chrome, use a lot of executable file * pages, map some of the pages with PROT_EXEC flag to simulate * the behavior. */ start1 = mmap(NULL, size / 2, PROT_READ | PROT_EXEC, MAP_SHARED, fd, 0); if (start1 == MAP_FAILED) { perror("mmap"); exit(1); } end1 = start1 + size / 2; start2 = mmap(NULL, size / 2, PROT_READ, MAP_SHARED, fd, size / 2); if (start2 == MAP_FAILED) { perror("mmap"); exit(1); } for (i = 0; i < rounds; ++i) { struct timeval before, after; volatile char *ptr1 = start1, *ptr2 = start2; gettimeofday(&before, NULL); for (; ptr1 < end1; ptr1 += page_size, ptr2 += page_size) sum += *ptr1 + *ptr2; gettimeofday(&after, NULL); printf("File access time, round %d: %f (sec) ", i, (after.tv_sec - before.tv_sec) + (after.tv_usec - before.tv_usec) / 1000000.0); } return sum; } int main(int argc, char *argv[]) { const long MB = 1024 * 1024; long anon_mb, file_mb, file_rounds; const char filename[] = "large"; long *ret1; long ret2; if (argc != 4) { printf("usage: thrash ANON_MB FILE_MB FILE_ROUNDS "); exit(0); } anon_mb = atoi(argv[1]); file_mb = atoi(argv[2]); file_rounds = atoi(argv[3]); fallocate_file(filename, file_mb * MB); printf("Allocate %ld MB anonymous pages ", anon_mb); ret1 = alloc_anon(anon_mb * MB); printf("Access %ld MB file pages ", file_mb); ret2 = access_file(filename, file_mb * MB, file_rounds); printf("Print result to prevent optimization: %ld ", *ret1 + ret2); return 0; } ---8<--- Running the test program on 2GB RAM VM with kernel 5.2.0-rc5, the program fills ram with 2048 MB memory, access a 200 MB file for 10 times. Without this patch, the file cache is dropped aggresively and every access to the file is from disk. $ ./thrash 2048 200 10 Allocate 2048 MB anonymous pages Access 200 MB file pages File access time, round 0: 2.489316 (sec) File access time, round 1: 2.581277 (sec) File access time, round 2: 2.487624 (sec) File access time, round 3: 2.449100 (sec) File access time, round 4: 2.420423 (sec) File access time, round 5: 2.343411 (sec) File access time, round 6: 2.454833 (sec) File access time, round 7: 2.483398 (sec) File access time, round 8: 2.572701 (sec) File access time, round 9: 2.493014 (sec) With this patch, these file pages can be cached. $ ./thrash 2048 200 10 Allocate 2048 MB anonymous pages Access 200 MB file pages File access time, round 0: 2.475189 (sec) File access time, round 1: 2.440777 (sec) File access time, round 2: 2.411671 (sec) File access time, round 3: 1.955267 (sec) File access time, round 4: 0.029924 (sec) File access time, round 5: 0.000808 (sec) File access time, round 6: 0.000771 (sec) File access time, round 7: 0.000746 (sec) File access time, round 8: 0.000738 (sec) File access time, round 9: 0.000747 (sec) Checked the swap out stats during the test [1], 19006 pages swapped out with this patch, 3418 pages swapped out without this patch. There are more swap out, but I think it's within reasonable range when file backed data set doesn't fit into the memory. $ ./thrash 2000 100 2100 5 1 # ANON_MB FILE_EXEC FILE_NOEXEC ROUNDS PROCESSES Allocate 2000 MB anonymous pages active_anon: 1613644, inactive_anon: 348656, active_file: 892, inactive_file: 1384 (kB) pswpout: 7972443, pgpgin: 478615246 Access 100 MB executable file pages Access 2100 MB regular file pages File access time, round 0: 12.165, (sec) active_anon: 1433788, inactive_anon: 478116, active_file: 17896, inactive_file: 24328 (kB) File access time, round 1: 11.493, (sec) active_anon: 1430576, inactive_anon: 477144, active_file: 25440, inactive_file: 26172 (kB) File access time, round 2: 11.455, (sec) active_anon: 1427436, inactive_anon: 476060, active_file: 21112, inactive_file: 28808 (kB) File access time, round 3: 11.454, (sec) active_anon: 1420444, inactive_anon: 473632, active_file: 23216, inactive_file: 35036 (kB) File access time, round 4: 11.479, (sec) active_anon: 1413964, inactive_anon: 471460, active_file: 31728, inactive_file: 32224 (kB) pswpout: 7991449 (+ 19006), pgpgin: 489924366 (+ 11309120) With 4 processes accessing non-overlapping parts of a large file, 30316 pages swapped out with this patch, 5152 pages swapped out without this patch. The swapout number is small comparing to pgpgin. [1]: https://github.com/vovo/testing/blob/master/mem_thrash.c Link: http://lkml.kernel.org/r/20190701081038.GA83398@google.com Fixes: e9868505987a ("mm,vmscan: only evict file pages when we have plenty") Fixes: 7c5bd705d8f9 ("mm: memcg: only evict file pages when we have plenty") Signed-off-by: Kuo-Hsin Yang Acked-by: Johannes Weiner Cc: Michal Hocko Cc: Sonny Rao Cc: Mel Gorman Cc: Rik van Riel Cc: Vladimir Davydov Cc: Minchan Kim Cc: [4.12+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [backported to 4.14.y, 4.19.y, 5.1.y: adjust context] Signed-off-by: Kuo-Hsin Yang Signed-off-by: Greg Kroah-Hartman commit 14700c35fb6b7ef42189f79f9b5e0613629c6deb Author: Damien Le Moal Date: Mon Jul 1 14:09:18 2019 +0900 block: Limit zone array allocation size commit 26202928fafad8bda8b478edb7e62c885be623d7 upstream. Limit the size of the struct blk_zone array used in blk_revalidate_disk_zones() to avoid memory allocation failures leading to disk revalidation failure. Also further reduce the likelyhood of such failures by using kvcalloc() (that is vmalloc()) instead of allocating contiguous pages with alloc_pages(). Fixes: 515ce6061312 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation") Fixes: e76239a3748c ("block: add a report_zones method") Cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig Reviewed-by: Martin K. Petersen Signed-off-by: Damien Le Moal Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit fda734287b0d0cf9a303c29e85fdc7f0b044f26d Author: Damien Le Moal Date: Mon Jul 1 14:09:17 2019 +0900 sd_zbc: Fix report zones buffer allocation commit b091ac616846a1da75b1f2566b41255ce7f0e0a6 upstream. During disk scan and revalidation done with sd_revalidate(), the zones of a zoned disk are checked using the helper function blk_revalidate_disk_zones() if a configuration change is detected (change in the number of zones or zone size). The function blk_revalidate_disk_zones() issues report_zones calls that are very large, that is, to obtain zone information for all zones of the disk with a single command. The size of the report zones command buffer necessary for such large request generally is lower than the disk max_hw_sectors and KMALLOC_MAX_SIZE (4MB) and succeeds on boot (no memory fragmentation), but often fail at run time (e.g. hot-plug event). This causes the disk revalidation to fail and the disk capacity to be changed to 0. This problem can be avoided by using vmalloc() instead of kmalloc() for the buffer allocation. To limit the amount of memory to be allocated, this patch also introduces the arbitrary SD_ZBC_REPORT_MAX_ZONES maximum number of zones to report with a single report zones command. This limit may be lowered further to satisfy the disk max_hw_sectors limit. Finally, to ensure that the vmalloc-ed buffer can always be mapped in a request, the buffer size is further limited to at most queue_max_segments() pages, allowing successful mapping of the buffer even in the worst case scenario where none of the buffer pages are contiguous. Fixes: 515ce6061312 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation") Fixes: e76239a3748c ("block: add a report_zones method") Cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig Reviewed-by: Martin K. Petersen Signed-off-by: Damien Le Moal Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 644688604559a6db751b9f26dcc8d38b8c439b0e Author: Paolo Bonzini Date: Mon Jul 22 13:31:27 2019 +0200 Revert "kvm: x86: Use task structs fpu field for user" commit ec269475cba7bcdd1eb8fdf8e87f4c6c81a376fe upstream. This reverts commit 240c35a3783ab9b3a0afaba0dde7291295680a6b ("kvm: x86: Use task structs fpu field for user", 2018-11-06). The commit is broken and causes QEMU's FPU state to be destroyed when KVM_RUN is preempted. Fixes: 240c35a3783a ("kvm: x86: Use task structs fpu field for user") Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit 79a431342a92c98f96a6a35ffda37b25ab0fecbe Author: Jan Kiszka Date: Sun Jul 21 13:52:18 2019 +0200 KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested commit cf64527bb33f6cec2ed50f89182fc4688d0056b6 upstream. Letting this pend may cause nested_get_vmcs12_pages to run against an invalid state, corrupting the effective vmcs of L1. This was triggerable in QEMU after a guest corruption in L2, followed by a L1 reset. Signed-off-by: Jan Kiszka Reviewed-by: Liran Alon Cc: stable@vger.kernel.org Fixes: 7f7f1ba33cf2 ("KVM: x86: do not load vmcs12 pages while still in SMM") Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit 93e7720cc238bfb9784bb2b8d340b4595e7891b1 Author: Paolo Bonzini Date: Fri Jul 19 18:41:10 2019 +0200 KVM: nVMX: do not use dangling shadow VMCS after guest reset commit 88dddc11a8d6b09201b4db9d255b3394d9bc9e57 upstream. If a KVM guest is reset while running a nested guest, free_nested will disable the shadow VMCS execution control in the vmcs01. However, on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync the VMCS12 to the shadow VMCS which has since been freed. This causes a vmptrld of a NULL pointer on my machime, but Jan reports the host to hang altogether. Let's see how much this trivial patch fixes. Reported-by: Jan Kiszka Cc: Liran Alon Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit 7f1f86276515f6816a98f6ca3ef99c827d54642f Author: Theodore Ts'o Date: Thu Jun 20 21:19:02 2019 -0400 ext4: allow directory holes commit 4e19d6b65fb4fc42e352ce9883649e049da14743 upstream. The largedir feature was intended to allow ext4 directories to have unmapped directory blocks (e.g., directory holes). And so the released e2fsprogs no longer enforces this for largedir file systems; however, the corresponding change to the kernel-side code was not made. This commit fixes this oversight. Signed-off-by: Theodore Ts'o Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman commit 4c95bc41baca2a7570deb0414cf30e7c50942d5e Author: Ross Zwisler Date: Thu Jun 20 17:26:26 2019 -0400 ext4: use jbd2_inode dirty range scoping commit 73131fbb003b3691cfcf9656f234b00da497fcd6 upstream. Use the newly introduced jbd2_inode dirty range scoping to prevent us from waiting forever when trying to complete a journal transaction. Signed-off-by: Ross Zwisler Signed-off-by: Theodore Ts'o Reviewed-by: Jan Kara Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman commit cb9663114c534c1331f3e47b11ea8a34fa78612d Author: Ross Zwisler Date: Thu Jun 20 17:24:56 2019 -0400 jbd2: introduce jbd2_inode dirty range scoping commit 6ba0e7dc64a5adcda2fbe65adc466891795d639e upstream. Currently both journal_submit_inode_data_buffers() and journal_finish_inode_data_buffers() operate on the entire address space of each of the inodes associated with a given journal entry. The consequence of this is that if we have an inode where we are constantly appending dirty pages we can end up waiting for an indefinite amount of time in journal_finish_inode_data_buffers() while we wait for all the pages under writeback to be written out. The easiest way to cause this type of workload is do just dd from /dev/zero to a file until it fills the entire filesystem. This can cause journal_finish_inode_data_buffers() to wait for the duration of the entire dd operation. We can improve this situation by scoping each of the inode dirty ranges associated with a given transaction. We do this via the jbd2_inode structure so that the scoping is contained within jbd2 and so that it follows the lifetime and locking rules for that structure. This allows us to limit the writeback & wait in journal_submit_inode_data_buffers() and journal_finish_inode_data_buffers() respectively to the dirty range for a given struct jdb2_inode, keeping us from waiting forever if the inode in question is still being appended to. Signed-off-by: Ross Zwisler Signed-off-by: Theodore Ts'o Reviewed-by: Jan Kara Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman commit 4d15f68ab90d1cb72f1834f65d35b6b4f5eae017 Author: Ross Zwisler Date: Thu Jun 20 17:05:37 2019 -0400 mm: add filemap_fdatawait_range_keep_errors() commit aa0bfcd939c30617385ffa28682c062d78050eba upstream. In the spirit of filemap_fdatawait_range() and filemap_fdatawait_keep_errors(), introduce filemap_fdatawait_range_keep_errors() which both takes a range upon which to wait and does not clear errors from the address space. Signed-off-by: Ross Zwisler Signed-off-by: Theodore Ts'o Reviewed-by: Jan Kara Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman commit 3d762b2af4824ef9e124c3c66c9eff7d2f2e470e Author: Theodore Ts'o Date: Sun Jun 9 22:04:33 2019 -0400 ext4: enforce the immutable flag on open files commit 02b016ca7f99229ae6227e7b2fc950c4e140d74a upstream. According to the chattr man page, "a file with the 'i' attribute cannot be modified..." Historically, this was only enforced when the file was opened, per the rest of the description, "... and the file can not be opened in write mode". There is general agreement that we should standardize all file systems to prevent modifications even for files that were opened at the time the immutable flag is set. Eventually, a change to enforce this at the VFS layer should be landing in mainline. Until then, enforce this at the ext4 level to prevent xfstests generic/553 from failing. Signed-off-by: Theodore Ts'o Cc: "Darrick J. Wong" Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman commit e5723ddf999e8dd51e6494dfe8ee39bfccac8a35 Author: Darrick J. Wong Date: Sun Jun 9 21:41:41 2019 -0400 ext4: don't allow any modifications to an immutable file commit 2e53840362771c73eb0a5ff71611507e64e8eecd upstream. Don't allow any modifications to a file that's marked immutable, which means that we have to flush all the writable pages to make the readonly and we have to check the setattr/setflags parameters more closely. Signed-off-by: Darrick J. Wong Signed-off-by: Theodore Ts'o Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman commit 539f1257906624abdcc81aeb91276791e543bfac Author: Peter Zijlstra Date: Sat Jul 13 11:21:25 2019 +0200 perf/core: Fix race between close() and fork() commit 1cf8dfe8a661f0462925df943140e9f6d1ea5233 upstream. Syzcaller reported the following Use-after-Free bug: close() clone() copy_process() perf_event_init_task() perf_event_init_context() mutex_lock(parent_ctx->mutex) inherit_task_group() inherit_group() inherit_event() mutex_lock(event->child_mutex) // expose event on child list list_add_tail() mutex_unlock(event->child_mutex) mutex_unlock(parent_ctx->mutex) ... goto bad_fork_* bad_fork_cleanup_perf: perf_event_free_task() perf_release() perf_event_release_kernel() list_for_each_entry() mutex_lock(ctx->mutex) mutex_lock(event->child_mutex) // event is from the failing inherit // on the other CPU perf_remove_from_context() list_move() mutex_unlock(event->child_mutex) mutex_unlock(ctx->mutex) mutex_lock(ctx->mutex) list_for_each_entry_safe() // event already stolen mutex_unlock(ctx->mutex) delayed_free_task() free_task() list_for_each_entry_safe() list_del() free_event() _free_event() // and so event->hw.target // is the already freed failed clone() if (event->hw.target) put_task_struct(event->hw.target) // WHOOPSIE, already quite dead Which puts the lie to the the comment on perf_event_free_task(): 'unexposed, unused context' not so much. Which is a 'fun' confluence of fail; copy_process() doing an unconditional free_task() and not respecting refcounts, and perf having creative locking. In particular: 82d94856fa22 ("perf/core: Fix lock inversion between perf,trace,cpuhp") seems to have overlooked this 'fun' parade. Solve it by using the fact that detached events still have a reference count on their (previous) context. With this perf_event_free_task() can detect when events have escaped and wait for their destruction. Debugged-by: Alexander Shishkin Reported-by: syzbot+a24c397a29ad22d86c98@syzkaller.appspotmail.com Signed-off-by: Peter Zijlstra (Intel) Acked-by: Mark Rutland Cc: Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Fixes: 82d94856fa22 ("perf/core: Fix lock inversion between perf,trace,cpuhp") Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 421db87843e6f1885cebccd64ff29dddd1d77818 Author: Alexander Shishkin Date: Mon Jul 1 14:07:55 2019 +0300 perf/core: Fix exclusive events' grouping commit 8a58ddae23796c733c5dfbd717538d89d036c5bd upstream. So far, we tried to disallow grouping exclusive events for the fear of complications they would cause with moving between contexts. Specifically, moving a software group to a hardware context would violate the exclusivity rules if both groups contain matching exclusive events. This attempt was, however, unsuccessful: the check that we have in the perf_event_open() syscall is both wrong (looks at wrong PMU) and insufficient (group leader may still be exclusive), as can be illustrated by running: $ perf record -e '{intel_pt//,cycles}' uname $ perf record -e '{cycles,intel_pt//}' uname ultimately successfully. Furthermore, we are completely free to trigger the exclusivity violation by: perf -e '{cycles,intel_pt//}' -e '{intel_pt//,instructions}' even though the helpful perf record will not allow that, the ABI will. The warning later in the perf_event_open() path will also not trigger, because it's also wrong. Fix all this by validating the original group before moving, getting rid of broken safeguards and placing a useful one to perf_install_in_context(). Signed-off-by: Alexander Shishkin Signed-off-by: Peter Zijlstra (Intel) Cc: Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Cc: mathieu.poirier@linaro.org Cc: will.deacon@arm.com Fixes: bed5b25ad9c8a ("perf: Add a pmu capability for "exclusive" events") Link: https://lkml.kernel.org/r/20190701110755.24646-1-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 5c008c97dae1933a487511672ad734a450808b4a Author: Song Liu Date: Thu Jun 20 18:44:38 2019 -0700 perf script: Assume native_arch for pipe mode commit 9d49169c5958e429ffa6874fbef734ae7502ad65 upstream. In pipe mode, session->header.env.arch is not populated until the events are processed. Therefore, the following command crashes: perf record -o - | perf script (gdb) bt It fails when we try to compare env.arch against uts.machine: if (!strcmp(uts.machine, session->header.env.arch) || (!strcmp(uts.machine, "x86_64") && !strcmp(session->header.env.arch, "i386"))) native_arch = true; In pipe mode, it is tricky to find env.arch at this stage. To keep it simple, let's just assume native_arch is always true for pipe mode. Reported-by: David Carrillo Cisneros Signed-off-by: Song Liu Tested-by: Arnaldo Carvalho de Melo Cc: Andi Kleen Cc: Jiri Olsa Cc: Namhyung Kim Cc: kernel-team@fb.com Cc: stable@vger.kernel.org #v5.1+ Fixes: 3ab481a1cfe1 ("perf script: Support insn output for normal samples") Link: http://lkml.kernel.org/r/20190621014438.810342-1-songliubraving@fb.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman commit 8206c6c1b7745871ce9875ae4f8aa85a88022423 Author: Paul Cercueil Date: Tue Jun 4 18:33:11 2019 +0200 MIPS: lb60: Fix pin mappings commit 1323c3b72a987de57141cabc44bf9cd83656bc70 upstream. The pin mappings introduced in commit 636f8ba67fb6 ("MIPS: JZ4740: Qi LB60: Add pinctrl configuration for several drivers") are completely wrong. The pinctrl driver name is incorrect, and the function and group fields are swapped. Fixes: 636f8ba67fb6 ("MIPS: JZ4740: Qi LB60: Add pinctrl configuration for several drivers") Cc: Signed-off-by: Paul Cercueil Reviewed-by: Linus Walleij Signed-off-by: Paul Burton Cc: Ralf Baechle Cc: James Hogan Cc: od@zcrc.me Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Greg Kroah-Hartman commit dfc0a1b1013076fa14eec2fcc453e1f94d6ee400 Author: Keerthy Date: Mon Jul 8 14:19:04 2019 +0530 gpio: davinci: silence error prints in case of EPROBE_DEFER commit 541e4095f388c196685685633c950cb9b97f8039 upstream. Silence error prints in case of EPROBE_DEFER. This avoids multiple/duplicate defer prints during boot. Cc: Signed-off-by: Keerthy Signed-off-by: Bartosz Golaszewski Signed-off-by: Greg Kroah-Hartman commit f39659f4b3b4742310550a4225bfebf41117f378 Author: Nishka Dasgupta Date: Sat Jul 6 19:04:22 2019 +0530 gpiolib: of: fix a memory leak in of_gpio_flags_quirks() commit 89fea04c85e85f21ef4937611055abce82330d48 upstream. Each iteration of for_each_child_of_node puts the previous node, but in the case of a break from the middle of the loop, there is no put, thus causing a memory leak. Hence add an of_node_put before the break. Issue found with Coccinelle. Cc: Signed-off-by: Nishka Dasgupta [Bartosz: tweaked the commit message] Signed-off-by: Bartosz Golaszewski Signed-off-by: Greg Kroah-Hartman commit 4528cbb9659195c82bf80745e5abdc14e9ecbb79 Author: Chris Wilson Date: Tue Jun 4 13:53:23 2019 +0100 dma-buf: Discard old fence_excl on retrying get_fences_rcu for realloc commit f5b07b04e5f090a85d1e96938520f2b2b58e4a8e upstream. If we have to drop the seqcount & rcu lock to perform a krealloc, we have to restart the loop. In doing so, be careful not to lose track of the already acquired exclusive fence. Fixes: fedf54132d24 ("dma-buf: Restart reservation_object_get_fences_rcu() after writes") Signed-off-by: Chris Wilson Cc: Daniel Vetter Cc: Maarten Lankhorst Cc: Christian König Cc: Alex Deucher Cc: Sumit Semwal Cc: stable@vger.kernel.org #v4.10 Reviewed-by: Christian König Link: https://patchwork.freedesktop.org/patch/msgid/20190604125323.21396-1-chris@chris-wilson.co.uk Signed-off-by: Greg Kroah-Hartman commit 19fab82084e094800ceb5cb3e5f6411769524152 Author: Jérôme Glisse Date: Thu Dec 6 11:18:40 2018 -0500 dma-buf: balance refcount inbalance commit 5e383a9798990c69fc759a4930de224bb497e62c upstream. The debugfs take reference on fence without dropping them. Signed-off-by: Jérôme Glisse Cc: Christian König Cc: Daniel Vetter Cc: Sumit Semwal Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: Stéphane Marchesin Cc: stable@vger.kernel.org Reviewed-by: Christian König Signed-off-by: Sumit Semwal Link: https://patchwork.freedesktop.org/patch/msgid/20181206161840.6578-1-jglisse@redhat.com Signed-off-by: Greg Kroah-Hartman commit 2a782edf08b450b0da551802e31ab47ba4e58179 Author: Aya Levin Date: Sun Jun 30 11:11:26 2019 +0300 net/mlx5e: Fix error flow in tx reporter diagnose [ Upstream commit 99d31cbd8953c6929da978bf049ab0f0b4e503d9 ] Fix tx reporter's diagnose callback. Propagate error when failing to gather diagnostics information or failing to print diagnostic data per queue. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Aya Levin Reviewed-by: Tariq Toukan Acked-by: Jiri Pirko Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit d5596ded2ec330e3e72f5c541fc2e9eb4437275a Author: Aya Levin Date: Mon Jun 17 12:01:45 2019 +0300 net/mlx5e: Fix return value from timeout recover function [ Upstream commit 39825350ae2a52f8513741b36e42118bd80dd689 ] Fix timeout recover function to return a meaningful return value. When an interrupt was not sent by the FW, return IO error instead of 'true'. Fixes: c7981bea48fb ("net/mlx5e: Fix return status of TX reporter timeout recover") Signed-off-by: Aya Levin Acked-by: Jiri Pirko Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 2260ec46cd26dd98eabd16b3be183b33dde12017 Author: Saeed Mahameed Date: Fri May 3 13:14:59 2019 -0700 net/mlx5e: Rx, Fix checksum calculation for new hardware [ Upstream commit db849faa9bef993a1379dc510623f750a72fa7ce ] CQE checksum full mode in new HW, provides a full checksum of rx frame. Covering bytes starting from eth protocol up to last byte in the received frame (frame_size - ETH_HLEN), as expected by the stack. Fixing up skb->csum by the driver is not required in such case. This fix is to avoid wrong checksum calculation in drivers which already support the new hardware with the new checksum mode. Fixes: 85327a9c4150 ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 8c1dd131d9f818672d74e8a532e2fc5b01013b9a Author: Eli Britstein Date: Sun Jun 2 06:19:03 2019 +0000 net/mlx5e: Fix port tunnel GRE entropy control [ Upstream commit 914adbb1bcf89478ac138318d28b302704564d59 ] GRE entropy calculation is a single bit per card, and not per port. Force disable GRE entropy calculation upon the first GRE encap rule, and release the force at the last GRE encap rule removal. This is done per port. Fixes: 97417f6182f8 ("net/mlx5e: Fix GRE key by controlling port tunnel entropy calculation") Signed-off-by: Eli Britstein Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 06ff90ba3c8c4ddd4d411ce5bea3a0fe3a34e0a3 Author: Jakub Kicinski Date: Fri Jun 28 16:07:59 2019 -0700 net/tls: reject offload of TLS 1.3 [ Upstream commit 618bac45937a3dc6126ac0652747481e97000f99 ] Neither drivers nor the tls offload code currently supports TLS version 1.3. Check the TLS version when installing connection state. TLS 1.3 will just fallback to the kernel crypto for now. Fixes: 130b392c6cd6 ("net: tls: Add tls 1.3 support") Signed-off-by: Jakub Kicinski Reviewed-by: Dirk van der Merwe Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 53911428585db5042649fc568b2e12facc55d184 Author: Jakub Kicinski Date: Thu Jul 4 14:50:36 2019 -0700 net/tls: fix poll ignoring partially copied records [ Upstream commit 13aecb17acabc2a92187d08f7ca93bb8aad62c6f ] David reports that RPC applications which use epoll() occasionally get stuck, and that TLS ULP causes the kernel to not wake applications, even though read() will return data. This is indeed true. The ctx->rx_list which holds partially copied records is not consulted when deciding whether socket is readable. Note that SO_RCVLOWAT with epoll() is and has always been broken for kernel TLS. We'd need to parse all records from the TCP layer, instead of just the first one. Fixes: 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records") Reported-by: David Beckett Signed-off-by: Jakub Kicinski Reviewed-by: Dirk van der Merwe Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d7728216dee122a21fe8732c4f202301a07cdd60 Author: Frank de Brabander Date: Fri Jul 5 13:43:14 2019 +0200 selftests: txring_overwrite: fix incorrect test of mmap() return value [ Upstream commit cecaa76b2919aac2aa584ce476e9fcd5b084add5 ] If mmap() fails it returns MAP_FAILED, which is defined as ((void *) -1). The current if-statement incorrectly tests if *ring is NULL. Fixes: 358be656406d ("selftests/net: add txring_overwrite") Signed-off-by: Frank de Brabander Acked-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1cec7a0cf072a8b0039abaa88c73256f6d82bf4e Author: Cong Wang Date: Mon Jul 22 20:41:22 2019 -0700 netrom: hold sock when setting skb->destructor [ Upstream commit 4638faac032756f7eab5524be7be56bee77e426b ] sock_efree() releases the sock refcnt, if we don't hold this refcnt when setting skb->destructor to it, the refcnt would not be balanced. This leads to several bug reports from syzbot. I have checked other users of sock_efree(), all of them hold the sock refcnt. Fixes: c8c8218ec5af ("netrom: fix a memory leak in nr_rx_frame()") Reported-and-tested-by: Reported-and-tested-by: Reported-and-tested-by: Reported-and-tested-by: Cc: Ralf Baechle Signed-off-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 897bcee6a1292b2a5d4e731ec5f12c6b97309465 Author: Cong Wang Date: Thu Jun 27 14:30:58 2019 -0700 netrom: fix a memory leak in nr_rx_frame() [ Upstream commit c8c8218ec5af5d2598381883acbefbf604e56b5e ] When the skb is associated with a new sock, just assigning it to skb->sk is not sufficient, we have to set its destructor to free the sock properly too. Reported-by: syzbot+d6636a36d3c34bd88938@syzkaller.appspotmail.com Signed-off-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 961a727f2c1056195f8293db728d0ea9cf7c961d Author: Andreas Steinmetz Date: Sun Jun 30 22:46:45 2019 +0200 macsec: fix checksumming after decryption [ Upstream commit 7d8b16b9facb0dd81d1469808dd9a575fa1d525a ] Fix checksumming after decryption. Signed-off-by: Andreas Steinmetz Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a5e9991ff901aecead1fad8e5d31abaa8ad79856 Author: Andreas Steinmetz Date: Sun Jun 30 22:46:42 2019 +0200 macsec: fix use-after-free of skb during RX [ Upstream commit 095c02da80a41cf6d311c504d8955d6d1c2add10 ] Fix use-after-free of skb when rx_handler returns RX_HANDLER_PASS. Signed-off-by: Andreas Steinmetz Acked-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 7edddedc3b87b9a2226893be6f94d06cf90d02c1 Author: Nikolay Aleksandrov Date: Tue Jul 2 15:00:21 2019 +0300 net: bridge: stp: don't cache eth dest pointer before skb pull [ Upstream commit 2446a68ae6a8cee6d480e2f5b52f5007c7c41312 ] Don't cache eth dest pointer before calling pskb_may_pull. Fixes: cf0f02d04a83 ("[BRIDGE]: use llc for receiving STP packets") Signed-off-by: Nikolay Aleksandrov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 17f2557801f833007a296ce96f133c1da7e1f489 Author: Nikolay Aleksandrov Date: Tue Jul 2 15:00:20 2019 +0300 net: bridge: don't cache ether dest pointer on input [ Upstream commit 3d26eb8ad1e9b906433903ce05f775cf038e747f ] We would cache ether dst pointer on input in br_handle_frame_finish but after the neigh suppress code that could lead to a stale pointer since both ipv4 and ipv6 suppress code do pskb_may_pull. This means we have to always reload it after the suppress code so there's no point in having it cached just retrieve it directly. Fixes: 057658cb33fbf ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports") Fixes: ed842faeb2bd ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports") Signed-off-by: Nikolay Aleksandrov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 0b159b8b49b7c22eb4815710f365c8a53af0f148 Author: Nikolay Aleksandrov Date: Tue Jul 2 15:00:19 2019 +0300 net: bridge: mcast: fix stale ipv6 hdr pointer when handling v6 query [ Upstream commit 3b26a5d03d35d8f732d75951218983c0f7f68dff ] We get a pointer to the ipv6 hdr in br_ip6_multicast_query but we may call pskb_may_pull afterwards and end up using a stale pointer. So use the header directly, it's just 1 place where it's needed. Fixes: 08b202b67264 ("bridge br_multicast: IPv6 MLD support.") Signed-off-by: Nikolay Aleksandrov Tested-by: Martin Weinelt Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit aa1a26b4d8b68dda13f997b67415632c1d963e45 Author: Nikolay Aleksandrov Date: Tue Jul 2 15:00:18 2019 +0300 net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling [ Upstream commit e57f61858b7cf478ed6fa23ed4b3876b1c9625c4 ] We take a pointer to grec prior to calling pskb_may_pull and use it afterwards to get nsrcs so record nsrcs before the pull when handling igmp3 and we get a pointer to nsrcs and call pskb_may_pull when handling mld2 which again could lead to reading 2 bytes out-of-bounds. ================================================================== BUG: KASAN: use-after-free in br_multicast_rcv+0x480c/0x4ad0 [bridge] Read of size 2 at addr ffff8880421302b4 by task ksoftirqd/1/16 CPU: 1 PID: 16 Comm: ksoftirqd/1 Tainted: G OE 5.2.0-rc6+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x71/0xab print_address_description+0x6a/0x280 ? br_multicast_rcv+0x480c/0x4ad0 [bridge] __kasan_report+0x152/0x1aa ? br_multicast_rcv+0x480c/0x4ad0 [bridge] ? br_multicast_rcv+0x480c/0x4ad0 [bridge] kasan_report+0xe/0x20 br_multicast_rcv+0x480c/0x4ad0 [bridge] ? br_multicast_disable_port+0x150/0x150 [bridge] ? ktime_get_with_offset+0xb4/0x150 ? __kasan_kmalloc.constprop.6+0xa6/0xf0 ? __netif_receive_skb+0x1b0/0x1b0 ? br_fdb_update+0x10e/0x6e0 [bridge] ? br_handle_frame_finish+0x3c6/0x11d0 [bridge] br_handle_frame_finish+0x3c6/0x11d0 [bridge] ? br_pass_frame_up+0x3a0/0x3a0 [bridge] ? virtnet_probe+0x1c80/0x1c80 [virtio_net] br_handle_frame+0x731/0xd90 [bridge] ? select_idle_sibling+0x25/0x7d0 ? br_handle_frame_finish+0x11d0/0x11d0 [bridge] __netif_receive_skb_core+0xced/0x2d70 ? virtqueue_get_buf_ctx+0x230/0x1130 [virtio_ring] ? do_xdp_generic+0x20/0x20 ? virtqueue_napi_complete+0x39/0x70 [virtio_net] ? virtnet_poll+0x94d/0xc78 [virtio_net] ? receive_buf+0x5120/0x5120 [virtio_net] ? __netif_receive_skb_one_core+0x97/0x1d0 __netif_receive_skb_one_core+0x97/0x1d0 ? __netif_receive_skb_core+0x2d70/0x2d70 ? _raw_write_trylock+0x100/0x100 ? __queue_work+0x41e/0xbe0 process_backlog+0x19c/0x650 ? _raw_read_lock_irq+0x40/0x40 net_rx_action+0x71e/0xbc0 ? __switch_to_asm+0x40/0x70 ? napi_complete_done+0x360/0x360 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 ? __schedule+0x85e/0x14d0 __do_softirq+0x1db/0x5f9 ? takeover_tasklets+0x5f0/0x5f0 run_ksoftirqd+0x26/0x40 smpboot_thread_fn+0x443/0x680 ? sort_range+0x20/0x20 ? schedule+0x94/0x210 ? __kthread_parkme+0x78/0xf0 ? sort_range+0x20/0x20 kthread+0x2ae/0x3a0 ? kthread_create_worker_on_cpu+0xc0/0xc0 ret_from_fork+0x35/0x40 The buggy address belongs to the page: page:ffffea0001084c00 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 flags: 0xffffc000000000() raw: 00ffffc000000000 ffffea0000cfca08 ffffea0001098608 0000000000000000 raw: 0000000000000000 0000000000000003 00000000ffffff7f 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888042130180: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888042130200: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ffff888042130280: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff888042130300: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888042130380: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== Disabling lock debugging due to kernel taint Fixes: bc8c20acaea1 ("bridge: multicast: treat igmpv3 report with INCLUDE and no sources as a leave") Reported-by: Martin Weinelt Signed-off-by: Nikolay Aleksandrov Tested-by: Martin Weinelt Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5a637c0b0d5e37076f5fec36879016f9308a328a Author: Aya Levin Date: Sun Jul 7 16:57:06 2019 +0300 net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn [ Upstream commit ef1ce7d7b67b46661091c7ccc0396186b7a247ef ] Check return value from mlx5e_attach_netdev, add error path on failure. Fixes: 48935bbb7ae8 ("net/mlx5e: IPoIB, Add netdevice profile skeleton") Signed-off-by: Aya Levin Reviewed-by: Feras Daoud Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 98ecf34e6b82d9c04bff5cb39e9ff964f505bb79 Author: Peter Kosyh Date: Fri Jul 19 11:11:47 2019 +0300 vrf: make sure skb->data contains ip header to make routing [ Upstream commit 107e47cc80ec37cb332bd41b22b1c7779e22e018 ] vrf_process_v4_outbound() and vrf_process_v6_outbound() do routing using ip/ipv6 addresses, but don't make sure the header is available in skb->data[] (skb_headlen() is less then header size). Case: 1) igb driver from intel. 2) Packet size is greater then 255. 3) MPLS forwards to VRF device. So, patch adds pskb_may_pull() calls in vrf_process_v4/v6_outbound() functions. Signed-off-by: Peter Kosyh Reviewed-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 2527ea8f115cdd0b1e8fc5cb161ff7d1a7a99351 Author: Christoph Paasch Date: Sat Jul 6 16:13:07 2019 -0700 tcp: Reset bytes_acked and bytes_received when disconnecting [ Upstream commit e858faf556d4e14c750ba1e8852783c6f9520a0e ] If an app is playing tricks to reuse a socket via tcp_disconnect(), bytes_acked/received needs to be reset to 0. Otherwise tcp_info will report the sum of the current and the old connection.. Cc: Eric Dumazet Fixes: 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info") Fixes: bdd1f9edacb5 ("tcp: add tcpi_bytes_received to tcp_info") Signed-off-by: Christoph Paasch Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit dfc9148309629cd5c1f8252153496195dfc31045 Author: Eric Dumazet Date: Thu Jul 18 19:28:14 2019 -0700 tcp: fix tcp_set_congestion_control() use from bpf hook [ Upstream commit 8d650cdedaabb33e85e9b7c517c0c71fcecc1de9 ] Neal reported incorrect use of ns_capable() from bpf hook. bpf_setsockopt(...TCP_CONGESTION...) -> tcp_set_congestion_control() -> ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN) -> ns_capable_common() -> current_cred() -> rcu_dereference_protected(current->cred, 1) Accessing 'current' in bpf context makes no sense, since packets are processed from softirq context. As Neal stated : The capability check in tcp_set_congestion_control() was written assuming a system call context, and then was reused from a BPF call site. The fix is to add a new parameter to tcp_set_congestion_control(), so that the ns_capable() call is only performed under the right context. Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control") Signed-off-by: Eric Dumazet Cc: Lawrence Brakmo Reported-by: Neal Cardwell Acked-by: Neal Cardwell Acked-by: Lawrence Brakmo Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 35618f4e85134bed345a93725a89d1e14f4fcfef Author: Eric Dumazet Date: Fri Jul 19 11:52:33 2019 -0700 tcp: be more careful in tcp_fragment() [ Upstream commit b617158dc096709d8600c53b6052144d12b89fab ] Some applications set tiny SO_SNDBUF values and expect TCP to just work. Recent patches to address CVE-2019-11478 broke them in case of losses, since retransmits might be prevented. We should allow these flows to make progress. This patch allows the first and last skb in retransmit queue to be split even if memory limits are hit. It also adds the some room due to the fact that tcp_sendmsg() and tcp_sendpage() might overshoot sk_wmem_queued by about one full TSO skb (64KB size). Note this allowance was already present in stable backports for kernels < 4.15 Note for < 4.15 backports : tcp_rtx_queue_tail() will probably look like : static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk) { struct sk_buff *skb = tcp_send_head(sk); return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk); } Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits") Signed-off-by: Eric Dumazet Reported-by: Andrew Prout Tested-by: Andrew Prout Tested-by: Jonathan Lemon Tested-by: Michal Kubecek Acked-by: Neal Cardwell Acked-by: Yuchung Cheng Acked-by: Christoph Paasch Cc: Jonathan Looney Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 0558547a6e1f08862b06a75a7dec3239e2091662 Author: Takashi Iwai Date: Tue Jul 23 17:15:25 2019 +0200 sky2: Disable MSI on ASUS P6T [ Upstream commit a261e3797506bd561700be643fe1a85bf81e9661 ] The onboard sky2 NIC on ASUS P6T WS PRO doesn't work after PM resume due to the infamous IRQ problem. Disabling MSI works around it, so let's add it to the blacklist. Unfortunately the BIOS on the machine doesn't fill the standard DMI_SYS_* entry, so we pick up DMI_BOARD_* entries instead. BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1142496 Reported-and-tested-by: Marcus Seyfarth Signed-off-by: Takashi Iwai Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 066992c6328c637384f82a6832016a21fcd3a089 Author: Xin Long Date: Wed Jun 26 16:31:39 2019 +0800 sctp: not bind the socket in sctp_connect [ Upstream commit 9b6c08878e23adb7cc84bdca94d8a944b03f099e ] Now when sctp_connect() is called with a wrong sa_family, it binds to a port but doesn't set bp->port, then sctp_get_af_specific will return NULL and sctp_connect() returns -EINVAL. Then if sctp_bind() is called to bind to another port, the last port it has bound will leak due to bp->port is NULL by then. sctp_connect() doesn't need to bind ports, as later __sctp_connect will do it if bp->port is NULL. So remove it from sctp_connect(). While at it, remove the unnecessary sockaddr.sa_family len check as it's already done in sctp_inet_connect. Fixes: 644fbdeacf1d ("sctp: fix the issue that flags are ignored when using kernel_connect") Reported-by: syzbot+079bf326b38072f849d9@syzkaller.appspotmail.com Signed-off-by: Xin Long Acked-by: Marcelo Ricardo Leitner Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 2a8e1d0c9646f78fdd0f00e2eadf8210bbf810a7 Author: Marcelo Ricardo Leitner Date: Thu Jun 27 19:48:10 2019 -0300 sctp: fix error handling on stream scheduler initialization [ Upstream commit 4d1415811e492d9a8238f8a92dd0d51612c788e9 ] It allocates the extended area for outbound streams only on sendmsg calls, if they are not yet allocated. When using the priority stream scheduler, this initialization may imply into a subsequent allocation, which may fail. In this case, it was aborting the stream scheduler initialization but leaving the ->ext pointer (allocated) in there, thus in a partially initialized state. On a subsequent call to sendmsg, it would notice the ->ext pointer in there, and trip on uninitialized stuff when trying to schedule the data chunk. The fix is undo the ->ext initialization if the stream scheduler initialization fails and avoid the partially initialized state. Although syzkaller bisected this to commit 4ff40b86262b ("sctp: set chunk transport correctly when it's a new asoc"), this bug was actually introduced on the commit I marked below. Reported-by: syzbot+c1a380d42b190ad1e559@syzkaller.appspotmail.com Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Tested-by: Xin Long Signed-off-by: Marcelo Ricardo Leitner Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 36febc9885c7cda0fcafadb6fc7f77deb1811902 Author: David Howells Date: Tue Jul 2 15:59:12 2019 +0100 rxrpc: Fix send on a connected, but unbound socket [ Upstream commit e835ada07091f40dcfb1bc735082bd0a7c005e59 ] If sendmsg() or sendmmsg() is called on a connected socket that hasn't had bind() called on it, then an oops will occur when the kernel tries to connect the call because no local endpoint has been allocated. Fix this by implicitly binding the socket if it is in the RXRPC_CLIENT_UNBOUND state, just like it does for the RXRPC_UNBOUND state. Further, the state should be transitioned to RXRPC_CLIENT_BOUND after this to prevent further attempts to bind it. This can be tested with: #include #include #include #include #include #include static const unsigned char inet6_addr[16] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, 0xac, 0x14, 0x14, 0xaa }; int main(void) { struct sockaddr_rxrpc srx; struct cmsghdr *cm; struct msghdr msg; unsigned char control[16]; int fd; memset(&srx, 0, sizeof(srx)); srx.srx_family = 0x21; srx.srx_service = 0; srx.transport_type = AF_INET; srx.transport_len = 0x1c; srx.transport.sin6.sin6_family = AF_INET6; srx.transport.sin6.sin6_port = htons(0x4e22); srx.transport.sin6.sin6_flowinfo = htons(0x4e22); srx.transport.sin6.sin6_scope_id = htons(0xaa3b); memcpy(&srx.transport.sin6.sin6_addr, inet6_addr, 16); cm = (struct cmsghdr *)control; cm->cmsg_len = CMSG_LEN(sizeof(unsigned long)); cm->cmsg_level = SOL_RXRPC; cm->cmsg_type = RXRPC_USER_CALL_ID; *(unsigned long *)CMSG_DATA(cm) = 0; msg.msg_name = NULL; msg.msg_namelen = 0; msg.msg_iov = NULL; msg.msg_iovlen = 0; msg.msg_control = control; msg.msg_controllen = cm->cmsg_len; msg.msg_flags = 0; fd = socket(AF_RXRPC, SOCK_DGRAM, AF_INET); connect(fd, (struct sockaddr *)&srx, sizeof(srx)); sendmsg(fd, &msg, 0); return 0; } Leading to the following oops: BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page ... RIP: 0010:rxrpc_connect_call+0x42/0xa01 ... Call Trace: ? mark_held_locks+0x47/0x59 ? __local_bh_enable_ip+0xb6/0xba rxrpc_new_client_call+0x3b1/0x762 ? rxrpc_do_sendmsg+0x3c0/0x92e rxrpc_do_sendmsg+0x3c0/0x92e rxrpc_sendmsg+0x16b/0x1b5 sock_sendmsg+0x2d/0x39 ___sys_sendmsg+0x1a4/0x22a ? release_sock+0x19/0x9e ? reacquire_held_locks+0x136/0x160 ? release_sock+0x19/0x9e ? find_held_lock+0x2b/0x6e ? __lock_acquire+0x268/0xf73 ? rxrpc_connect+0xdd/0xe4 ? __local_bh_enable_ip+0xb6/0xba __sys_sendmsg+0x5e/0x94 do_syscall_64+0x7d/0x1bf entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 2341e0775747 ("rxrpc: Simplify connect() implementation and simplify sendmsg() op") Reported-by: syzbot+7966f2a0b2c7da8939b4@syzkaller.appspotmail.com Signed-off-by: David Howells Reviewed-by: Marc Dionne Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 3618e5919694df536f35b53e94416cb7163606aa Author: Heiner Kallweit Date: Sat Jul 13 13:45:47 2019 +0200 r8169: fix issue with confused RX unit after PHY power-down on RTL8411b [ Upstream commit fe4e8db0392a6c2e795eb89ef5fcd86522e66248 ] On RTL8411b the RX unit gets confused if the PHY is powered-down. This was reported in [0] and confirmed by Realtek. Realtek provided a sequence to fix the RX unit after PHY wakeup. The issue itself seems to have been there longer, the Fixes tag refers to where the fix applies properly. [0] https://bugzilla.redhat.com/show_bug.cgi?id=1692075 Fixes: a99790bf5c7f ("r8169: Reinstate ASPM Support") Tested-by: Ionut Radu Signed-off-by: Heiner Kallweit Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e1f7bc510d84532d3140da37830e9a84e5984e65 Author: Yang Wei Date: Mon Jul 8 22:57:39 2019 +0800 nfc: fix potential illegal memory access [ Upstream commit dd006fc434e107ef90f7de0db9907cbc1c521645 ] The frags_q is not properly initialized, it may result in illegal memory access when conn_info is NULL. The "goto free_exit" should be replaced by "goto exit". Signed-off-by: Yang Wei Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5edaba9e9e06b9c310eeec995ca7320c2dc64f2b Author: Jakub Kicinski Date: Fri Jun 28 16:11:39 2019 -0700 net/tls: make sure offload also gets the keys wiped [ Upstream commit acd3e96d53a24d219f720ed4012b62723ae05da1 ] Commit 86029d10af18 ("tls: zero the crypto information from tls_context before freeing") added memzero_explicit() calls to clear the key material before freeing struct tls_context, but it missed tls_device.c has its own way of freeing this structure. Replace the missing free. Fixes: 86029d10af18 ("tls: zero the crypto information from tls_context before freeing") Signed-off-by: Jakub Kicinski Reviewed-by: Dirk van der Merwe Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 7a78bda3cad948939f006cb923893184a30a8d75 Author: Jose Abreu Date: Mon Jul 8 14:26:28 2019 +0200 net: stmmac: Re-work the queue selection for TSO packets [ Upstream commit 4993e5b37e8bcb55ac90f76eb6d2432647273747 ] Ben Hutchings says: "This is the wrong place to change the queue mapping. stmmac_xmit() is called with a specific TX queue locked, and accessing a different TX queue results in a data race for all of that queue's state. I think this commit should be reverted upstream and in all stable branches. Instead, the driver should implement the ndo_select_queue operation and override the queue mapping there." Fixes: c5acdbee22a1 ("net: stmmac: Send TSO packets always from Queue 0") Suggested-by: Ben Hutchings Signed-off-by: Jose Abreu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e6da0a5873732f9ea06345bb01a8117fd4529a43 Author: Cong Wang Date: Tue Jul 16 13:57:30 2019 -0700 net_sched: unset TCQ_F_CAN_BYPASS when adding filters [ Upstream commit 3f05e6886a595c9a29a309c52f45326be917823c ] For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS, notably fq_codel, it makes no sense to let packets bypass the TC filters we setup in any scenario, otherwise our packets steering policy could not be enforced. This can be reproduced easily with the following script: ip li add dev dummy0 type dummy ifconfig dummy0 up tc qd add dev dummy0 root fq_codel tc filter add dev dummy0 parent 8001: protocol arp basic action mirred egress redirect dev lo tc filter add dev dummy0 parent 8001: protocol ip basic action mirred egress redirect dev lo ping -I dummy0 192.168.112.1 Without this patch, packets are sent directly to dummy0 without hitting any of the filters. With this patch, packets are redirected to loopback as expected. This fix is not perfect, it only unsets the flag but does not set it back because we have to save the information somewhere in the qdisc if we really want that. Note, both fq_codel and sfq clear this flag in their ->bind_tcf() but this is clearly not sufficient when we don't use any class ID. Fixes: 23624935e0c4 ("net_sched: TCQ_F_CAN_BYPASS generalization") Cc: Eric Dumazet Signed-off-by: Cong Wang Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ba5f4cb33e9bdcbc895902dc78fd990a526de005 Author: Andrew Lunn Date: Sun Jul 21 18:50:08 2019 +0200 net: phy: sfp: hwmon: Fix scaling of RX power [ Upstream commit 0cea0e1148fe134a4a3aaf0b1496f09241fb943a ] The RX power read from the SFP uses units of 0.1uW. This must be scaled to units of uW for HWMON. This requires a divide by 10, not the current 100. With this change in place, sensors(1) and ethtool -m agree: sff2-isa-0000 Adapter: ISA adapter in0: +3.23 V temp1: +33.1 C power1: 270.00 uW power2: 200.00 uW curr1: +0.01 A Laser output power : 0.2743 mW / -5.62 dBm Receiver signal average optical power : 0.2014 mW / -6.96 dBm Reported-by: chris.healy@zii.aero Signed-off-by: Andrew Lunn Fixes: 1323061a018a ("net: phy: sfp: Add HWMON support for module sensors") Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit af0cab5c15d134cb88534f9479107bb36011cb2f Author: John Hurley Date: Thu Jun 27 14:37:30 2019 +0100 net: openvswitch: fix csum updates for MPLS actions [ Upstream commit 0e3183cd2a64843a95b62f8bd4a83605a4cf0615 ] Skbs may have their checksum value populated by HW. If this is a checksum calculated over the entire packet then the CHECKSUM_COMPLETE field is marked. Changes to the data pointer on the skb throughout the network stack still try to maintain this complete csum value if it is required through functions such as skb_postpush_rcsum. The MPLS actions in Open vSwitch modify a CHECKSUM_COMPLETE value when changes are made to packet data without a push or a pull. This occurs when the ethertype of the MAC header is changed or when MPLS lse fields are modified. The modification is carried out using the csum_partial function to get the csum of a buffer and add it into the larger checksum. The buffer is an inversion of the data to be removed followed by the new data. Because the csum is calculated over 16 bits and these values align with 16 bits, the effect is the removal of the old value from the CHECKSUM_COMPLETE and addition of the new value. However, the csum fed into the function and the outcome of the calculation are also inverted. This would only make sense if it was the new value rather than the old that was inverted in the input buffer. Fix the issue by removing the bit inverts in the csum_partial calculation. The bug was verified and the fix tested by comparing the folded value of the updated CHECKSUM_COMPLETE value with the folded value of a full software checksum calculation (reset skb->csum to 0 and run skb_checksum_complete(skb)). Prior to the fix the outcomes differed but after they produce the same result. Fixes: 25cd9ba0abc0 ("openvswitch: Add basic MPLS support to kernel") Fixes: bc7cc5999fd3 ("openvswitch: update checksum in {push,pop}_mpls") Signed-off-by: John Hurley Reviewed-by: Jakub Kicinski Reviewed-by: Simon Horman Acked-by: Pravin B Shelar Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 821aa330150e493dec7379963211cc4145117aea Author: Lorenzo Bianconi Date: Sun Jul 14 23:36:11 2019 +0200 net: neigh: fix multiple neigh timer scheduling [ Upstream commit 071c37983d99da07797294ea78e9da1a6e287144 ] Neigh timer can be scheduled multiple times from userspace adding multiple neigh entries and forcing the neigh timer scheduling passing NTF_USE in the netlink requests. This will result in a refcount leak and in the following dump stack: [ 32.465295] NEIGH: BUG, double timer add, state is 8 [ 32.465308] CPU: 0 PID: 416 Comm: double_timer_ad Not tainted 5.2.0+ #65 [ 32.465311] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014 [ 32.465313] Call Trace: [ 32.465318] dump_stack+0x7c/0xc0 [ 32.465323] __neigh_event_send+0x20c/0x880 [ 32.465326] ? ___neigh_create+0x846/0xfb0 [ 32.465329] ? neigh_lookup+0x2a9/0x410 [ 32.465332] ? neightbl_fill_info.constprop.0+0x800/0x800 [ 32.465334] neigh_add+0x4f8/0x5e0 [ 32.465337] ? neigh_xmit+0x620/0x620 [ 32.465341] ? find_held_lock+0x85/0xa0 [ 32.465345] rtnetlink_rcv_msg+0x204/0x570 [ 32.465348] ? rtnl_dellink+0x450/0x450 [ 32.465351] ? mark_held_locks+0x90/0x90 [ 32.465354] ? match_held_lock+0x1b/0x230 [ 32.465357] netlink_rcv_skb+0xc4/0x1d0 [ 32.465360] ? rtnl_dellink+0x450/0x450 [ 32.465363] ? netlink_ack+0x420/0x420 [ 32.465366] ? netlink_deliver_tap+0x115/0x560 [ 32.465369] ? __alloc_skb+0xc9/0x2f0 [ 32.465372] netlink_unicast+0x270/0x330 [ 32.465375] ? netlink_attachskb+0x2f0/0x2f0 [ 32.465378] netlink_sendmsg+0x34f/0x5a0 [ 32.465381] ? netlink_unicast+0x330/0x330 [ 32.465385] ? move_addr_to_kernel.part.0+0x20/0x20 [ 32.465388] ? netlink_unicast+0x330/0x330 [ 32.465391] sock_sendmsg+0x91/0xa0 [ 32.465394] ___sys_sendmsg+0x407/0x480 [ 32.465397] ? copy_msghdr_from_user+0x200/0x200 [ 32.465401] ? _raw_spin_unlock_irqrestore+0x37/0x40 [ 32.465404] ? lockdep_hardirqs_on+0x17d/0x250 [ 32.465407] ? __wake_up_common_lock+0xcb/0x110 [ 32.465410] ? __wake_up_common+0x230/0x230 [ 32.465413] ? netlink_bind+0x3e1/0x490 [ 32.465416] ? netlink_setsockopt+0x540/0x540 [ 32.465420] ? __fget_light+0x9c/0xf0 [ 32.465423] ? sockfd_lookup_light+0x8c/0xb0 [ 32.465426] __sys_sendmsg+0xa5/0x110 [ 32.465429] ? __ia32_sys_shutdown+0x30/0x30 [ 32.465432] ? __fd_install+0xe1/0x2c0 [ 32.465435] ? lockdep_hardirqs_off+0xb5/0x100 [ 32.465438] ? mark_held_locks+0x24/0x90 [ 32.465441] ? do_syscall_64+0xf/0x270 [ 32.465444] do_syscall_64+0x63/0x270 [ 32.465448] entry_SYSCALL_64_after_hwframe+0x49/0xbe Fix the issue unscheduling neigh_timer if selected entry is in 'IN_TIMER' receiving a netlink request with NTF_USE flag set Reported-by: Marek Majkowski Fixes: 0c5c2d308906 ("neigh: Allow for user space users of the neighbour table") Signed-off-by: Lorenzo Bianconi Reviewed-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4912f3f1a8868afc6b36293454b5307368f88c22 Author: Florian Westphal Date: Wed Jun 26 20:40:45 2019 +0200 net: make skb_dst_force return true when dst is refcounted [ Upstream commit b60a77386b1d4868f72f6353d35dabe5fbe981f2 ] netfilter did not expect that skb_dst_force() can cause skb to lose its dst entry. I got a bug report with a skb->dst NULL dereference in netfilter output path. The backtrace contains nf_reinject(), so the dst might have been cleared when skb got queued to userspace. Other users were fixed via if (skb_dst(skb)) { skb_dst_force(skb); if (!skb_dst(skb)) goto handle_err; } But I think its preferable to make the 'dst might be cleared' part of the function explicit. In netfilter case, skb with a null dst is expected when queueing in prerouting hook, so drop skb for the other hooks. v2: v1 of this patch returned true in case skb had no dst entry. Eric said: Say if we have two skb_dst_force() calls for some reason on the same skb, only the first one will return false. This now returns false even when skb had no dst, as per Erics suggestion, so callers might need to check skb_dst() first before skb_dst_force(). Signed-off-by: Florian Westphal Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4b2e53dfcb599ecdaa3db1a2e7d5bef9bacd5502 Author: Baruch Siach Date: Thu Jun 27 21:17:39 2019 +0300 net: dsa: mv88e6xxx: wait after reset deactivation [ Upstream commit 7b75e49de424ceb53d13e60f35d0a73765626fda ] Add a 1ms delay after reset deactivation. Otherwise the chip returns bogus ID value. This is observed with 88E6390 (Peridot) chip. Signed-off-by: Baruch Siach Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 17f782cc4039d48436587b897658937158c2ef33 Author: Justin Chen Date: Wed Jul 17 14:58:53 2019 -0700 net: bcmgenet: use promisc for unsupported filters [ Upstream commit 35cbef9863640f06107144687bd13151bc2e8ce3 ] Currently we silently ignore filters if we cannot meet the filter requirements. This will lead to the MAC dropping packets that are expected to pass. A better solution would be to set the NIC to promisc mode when the required filters cannot be met. Also correct the number of MDF filters supported. It should be 17, not 16. Signed-off-by: Justin Chen Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 847b4237cfa6512dad7117207f1c5dcebeba35ec Author: Ido Schimmel Date: Wed Jul 17 23:39:33 2019 +0300 ipv6: Unlink sibling route in case of failure [ Upstream commit 54851aa90cf27041d64b12f65ac72e9f97bd90fd ] When a route needs to be appended to an existing multipath route, fib6_add_rt2node() first appends it to the siblings list and increments the number of sibling routes on each sibling. Later, the function notifies the route via call_fib6_entry_notifiers(). In case the notification is vetoed, the route is not unlinked from the siblings list, which can result in a use-after-free. Fix this by unlinking the route from the siblings list before returning an error. Audited the rest of the call sites from which the FIB notification chain is called and could not find more problems. Fixes: 2233000cba40 ("net/ipv6: Move call_fib6_entry_notifiers up for route adds") Signed-off-by: Ido Schimmel Reported-by: Alexander Petrovskiy Reviewed-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f2acb2903f1603643a7b683c04bd88a7781888dd Author: David Ahern Date: Wed Jul 17 15:08:43 2019 -0700 ipv6: rt6_check should return NULL if 'from' is NULL [ Upstream commit 49d05fe2c9d1b4a27761c9807fec39b8155bef9e ] Paul reported that l2tp sessions were broken after the commit referenced in the Fixes tag. Prior to this commit rt6_check returned NULL if the rt6_info 'from' was NULL - ie., the dst_entry was disconnected from a FIB entry. Restore that behavior. Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Reported-by: Paul Donohue Tested-by: Paul Donohue Signed-off-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 7719f7253677b5a9a3db8b95d2861d32081cfca0 Author: Matteo Croce Date: Mon Jul 1 19:01:55 2019 +0200 ipv4: don't set IPv6 only flags to IPv4 addresses [ Upstream commit 2e60546368165c2449564d71f6005dda9205b5fb ] Avoid the situation where an IPV6 only flag is applied to an IPv4 address: # ip addr add 192.0.2.1/24 dev dummy0 nodad home mngtmpaddr noprefixroute # ip -4 addr show dev dummy0 2: dummy0: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 inet 192.0.2.1/24 scope global noprefixroute dummy0 valid_lft forever preferred_lft forever Or worse, by sending a malicious netlink command: # ip -4 addr show dev dummy0 2: dummy0: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 inet 192.0.2.1/24 scope global nodad optimistic dadfailed home tentative mngtmpaddr noprefixroute stable-privacy dummy0 valid_lft forever preferred_lft forever Signed-off-by: Matteo Croce Reviewed-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ce6994f56e7a6e1ac0630328e65fa2d9d919786d Author: Eric Dumazet Date: Thu Jun 27 01:27:01 2019 -0700 igmp: fix memory leak in igmpv3_del_delrec() [ Upstream commit e5b1c6c6277d5a283290a8c033c72544746f9b5b ] im->tomb and/or im->sources might not be NULL, but we currently overwrite their values blindly. Using swap() will make sure the following call to kfree_pmc(pmc) will properly free the psf structures. Tested with the C repro provided by syzbot, which basically does : socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3 setsockopt(3, SOL_IP, IP_ADD_MEMBERSHIP, "\340\0\0\2\177\0\0\1\0\0\0\0", 12) = 0 ioctl(3, SIOCSIFFLAGS, {ifr_name="lo", ifr_flags=0}) = 0 setsockopt(3, SOL_IP, IP_MSFILTER, "\340\0\0\2\177\0\0\1\1\0\0\0\1\0\0\0\377\377\377\377", 20) = 0 ioctl(3, SIOCSIFFLAGS, {ifr_name="lo", ifr_flags=IFF_UP}) = 0 exit_group(0) = ? BUG: memory leak unreferenced object 0xffff88811450f140 (size 64): comm "softirq", pid 0, jiffies 4294942448 (age 32.070s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 ff ff ff ff 00 00 00 00 ................ 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ backtrace: [<00000000c7bad083>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<00000000c7bad083>] slab_post_alloc_hook mm/slab.h:439 [inline] [<00000000c7bad083>] slab_alloc mm/slab.c:3326 [inline] [<00000000c7bad083>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553 [<000000009acc4151>] kmalloc include/linux/slab.h:547 [inline] [<000000009acc4151>] kzalloc include/linux/slab.h:742 [inline] [<000000009acc4151>] ip_mc_add1_src net/ipv4/igmp.c:1976 [inline] [<000000009acc4151>] ip_mc_add_src+0x36b/0x400 net/ipv4/igmp.c:2100 [<000000004ac14566>] ip_mc_msfilter+0x22d/0x310 net/ipv4/igmp.c:2484 [<0000000052d8f995>] do_ip_setsockopt.isra.0+0x1795/0x1930 net/ipv4/ip_sockglue.c:959 [<000000004ee1e21f>] ip_setsockopt+0x3b/0xb0 net/ipv4/ip_sockglue.c:1248 [<0000000066cdfe74>] udp_setsockopt+0x4e/0x90 net/ipv4/udp.c:2618 [<000000009383a786>] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3126 [<00000000d8ac0c94>] __sys_setsockopt+0x98/0x120 net/socket.c:2072 [<000000001b1e9666>] __do_sys_setsockopt net/socket.c:2083 [inline] [<000000001b1e9666>] __se_sys_setsockopt net/socket.c:2080 [inline] [<000000001b1e9666>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2080 [<00000000420d395e>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301 [<000000007fd83a4b>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 24803f38a5c0 ("igmp: do not remove igmp souce list info when set link down") Signed-off-by: Eric Dumazet Cc: Hangbin Liu Reported-by: syzbot+6ca1abd0db68b5173a4f@syzkaller.appspotmail.com Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5b8385de62269b5344f4c15ed92619c06ec92526 Author: Haiyang Zhang Date: Fri Jul 19 17:33:51 2019 +0000 hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback() [ Upstream commit be4363bdf0ce9530f15aa0a03d1060304d116b15 ] There is an extra rcu_read_unlock left in netvsc_recv_callback(), after a previous patch that removes RCU from this function. This patch removes the extra RCU unlock. Fixes: 345ac08990b8 ("hv_netvsc: pass netvsc_device to receive callback") Signed-off-by: Haiyang Zhang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 14f7a56762aa6a290b34bcb037defd3ea4d312f2 Author: Taehee Yoo Date: Mon Jul 15 14:10:17 2019 +0900 caif-hsi: fix possible deadlock in cfhsi_exit_module() [ Upstream commit fdd258d49e88a9e0b49ef04a506a796f1c768a8e ] cfhsi_exit_module() calls unregister_netdev() under rtnl_lock(). but unregister_netdev() internally calls rtnl_lock(). So deadlock would occur. Fixes: c41254006377 ("caif-hsi: Add rtnl support") Signed-off-by: Taehee Yoo Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c73a6485a8aa9fe08ca8d31af3c9925ff59cf9c7 Author: Brian King Date: Mon Jul 15 16:41:50 2019 -0500 bnx2x: Prevent load reordering in tx completion processing [ Upstream commit ea811b795df24644a8eb760b493c43fba4450677 ] This patch fixes an issue seen on Power systems with bnx2x which results in the skb is NULL WARN_ON in bnx2x_free_tx_pkt firing due to the skb pointer getting loaded in bnx2x_free_tx_pkt prior to the hw_cons load in bnx2x_tx_int. Adding a read memory barrier resolves the issue. Signed-off-by: Brian King Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman