aboutsummaryrefslogtreecommitdiffstats
path: root/mm/bootmem.c
AgeCommit message (Collapse)AuthorFilesLines
2018-10-31mm: remove bootmem allocator implementation.Mike Rapoport1-811/+0
All architectures have been converted to use MEMBLOCK + NO_BOOTMEM. The bootmem allocator implementation can be removed. Link: http://lkml.kernel.org/r/1536927045-23536-5-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Serge Semin <fancer.lancer@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-02docs/mm: bootmem: add overview documentationMike Rapoport1-0/+47
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-08-02docs/mm: bootmem: fix kernel-doc warningsMike Rapoport1-2/+8
Add descriptions of the return value where they were missing and fixup the syntax for present ones. Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-08-02mm/bootmem: drop duplicated kernel-doc commentsMike Rapoport1-102/+0
Parts of the bootmem interfaces are duplicated in nobootmem.c along with the kernel-doc comments. There is no point to keep two copies of the comments, so let's drop the bootmem.c copy. Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-02-06mm: docs: fix parameter names mismatchMike Rapoport1-1/+1
There are several places where parameter descriptions do no match the actual code. Fix it. Link: http://lkml.kernel.org/r/1516700871-22279-3-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman1-0/+1
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-22mm/bootmem.c: cosmetic improvement of code readabilityAdygzhy Ondar1-1/+1
The obvious number of bits in a byte is replaced by BITS_PER_BYTE macro in bootmap_bytes() Link: http://lkml.kernel.org/r/1483781600-5136-1-git-send-email-ondar07@gmail.com Signed-off-by: Adygzhy Ondar <ondar07@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mappingCatalin Marinas1-3/+3
Some of the kmemleak_*() callbacks in memblock, bootmem, CMA convert a physical address to a virtual one using __va(). However, such physical addresses may sometimes be located in highmem and using __va() is incorrect, leading to inconsistent object tracking in kmemleak. The following functions have been added to the kmemleak API and they take a physical address as the object pointer. They only perform the corresponding action if the address has a lowmem mapping: kmemleak_alloc_phys kmemleak_free_part_phys kmemleak_not_leak_phys kmemleak_ignore_phys The affected calling places have been updated to use the new kmemleak API. Link: http://lkml.kernel.org/r/1471531432-16503-1-git-send-email-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Vignesh R <vigneshr@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07mm/bootmem.c: replace kzalloc() by kzalloc_node()zijun_hu1-12/+2
In ___alloc_bootmem_node_nopanic(), replace kzalloc() by kzalloc_node() in order to allocate memory within given node preferentially when slab is available Link: http://lkml.kernel.org/r/1f487f12-6af4-5e4f-a28c-1de2361cdcd8@zoho.com Signed-off-by: zijun_hu <zijun_hu@htc.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17mm: convert printk(KERN_<LEVEL> to pr_<level>Joe Perches1-4/+3
Most of the mm subsystem uses pr_<level> so make it consistent. Miscellanea: - Realign arguments - Add missing newline to format - kmemleak-test.c has a "kmemleak: " prefix added to the "Kmemleak testing" logging message via pr_fmt Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Tejun Heo <tj@kernel.org> [percpu] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-06x86/mm: Introduce max_possible_pfnIgor Mammedov1-0/+1
max_possible_pfn will be used for tracking max possible PFN for memory that isn't present in E820 table and could be hotplugged later. By default max_possible_pfn is initialized with max_pfn, but later it could be updated with highest PFN of hotpluggable memory ranges declared in ACPI SRAT table if any present. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akataria@vmware.com Cc: fujita.tomonori@lab.ntt.co.jp Cc: konrad.wilk@oracle.com Cc: pbonzini@redhat.com Cc: revers@redhat.com Cc: riel@redhat.com Link: http://lkml.kernel.org/r/1449234426-273049-2-git-send-email-imammedo@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08bootmem: avoid freeing to bootmem after bootmem is doneChris Metcalf1-0/+7
Bootmem isn't popular any more, but some architectures still use it, and freeing to bootmem after calling free_all_bootmem_core() can end up scribbling over random memory. Instead, make sure the kernel generates a warning in this case by ensuring the node_bootmem_map field is non-NULL when are freeing or marking bootmem. An instance of this bug was just fixed in the tile architecture ("tile: use free_bootmem_late() for initrd") and catching this case more widely seems like a good thing. Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Paul McQuade <paulmcquad@gmail.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-30mm: page_alloc: pass PFN to __free_pages_bootmemMel Gorman1-6/+7
__free_pages_bootmem prepares a page for release to the buddy allocator and assumes that the struct page is initialised. Parallel initialisation of struct pages defers initialisation and __free_pages_bootmem can be called for struct pages that cannot yet map struct page to PFN. This patch passes PFN to __free_pages_bootmem with no other functional change. Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Nate Zimmer <nzimmer@sgi.com> Tested-by: Waiman Long <waiman.long@hp.com> Tested-by: Daniel J Blueman <daniel@numascale.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Robin Holt <robinmholt@gmail.com> Cc: Nate Zimmer <nzimmer@sgi.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Waiman Long <waiman.long@hp.com> Cc: Scott Norton <scott.norton@hp.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-13mem-hotplug: reset node managed pages when hot-adding a new pgdatTang Chen1-4/+5
In free_area_init_core(), zone->managed_pages is set to an approximate value for lowmem, and will be adjusted when the bootmem allocator frees pages into the buddy system. But free_area_init_core() is also called by hotadd_new_pgdat() when hot-adding memory. As a result, zone->managed_pages of the newly added node's pgdat is set to an approximate value in the very beginning. Even if the memory on that node has node been onlined, /sys/device/system/node/nodeXXX/meminfo has wrong value: hot-add node2 (memory not onlined) cat /sys/device/system/node/node2/meminfo Node 2 MemTotal: 33554432 kB Node 2 MemFree: 0 kB Node 2 MemUsed: 33554432 kB Node 2 Active: 0 kB This patch fixes this problem by reset node managed pages to 0 after hot-adding a new node. 1. Move reset_managed_pages_done from reset_node_managed_pages() to reset_all_zones_managed_pages() 2. Make reset_node_managed_pages() non-static 3. Call reset_node_managed_pages() in hotadd_new_pgdat() after pgdat is initialized Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: <stable@vger.kernel.org> [3.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09mm/bootmem.c: use include/linux/ headersPaul McQuade1-2/+2
Replace asm. headers with linux/headers: <linux/bug.h> <linux/io.h> Signed-off-by: Paul McQuade <paulmcquad@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13mm/bootmem.c: remove unused local `map'Daeseok Youn1-3/+3
Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13mm: use pgdat_end_pfn() to simplify the code in othersXishi Qiu1-1/+1
Use "pgdat_end_pfn()" instead of "pgdat->node_start_pfn + pgdat->node_spanned_pages". Simplify the code, no functional change. Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03mm: kill free_all_bootmem_node()Jiang Liu1-18/+0
Now nobody makes use of free_all_bootmem_node(), kill it. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Yinghai Lu <yinghai@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03mm: concentrate modification of totalram_pages into the mm coreJiang Liu1-1/+8
Concentrate code to modify totalram_pages into the mm core, so the arch memory initialized code doesn't need to take care of it. With these changes applied, only following functions from mm core modify global variable totalram_pages: free_bootmem_late(), free_all_bootmem(), free_all_bootmem_node(), adjust_managed_page_count(). With this patch applied, it will be much more easier for us to keep totalram_pages and zone->managed_pages in consistence. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Acked-by: David Howells <dhowells@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: <sworddragon2@aol.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michel Lespinasse <walken@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03mm: accurately calculate zone->managed_pages for highmem zonesJiang Liu1-14/+18
Commit "mm: introduce new field 'managed_pages' to struct zone" assumes that all highmem pages will be freed into the buddy system by function mem_init(). But that's not always true, some architectures may reserve some highmem pages during boot. For example PPC may allocate highmem pages for giagant HugeTLB pages, and several architectures have code to check PageReserved flag to exclude highmem pages allocated during boot when freeing highmem pages into the buddy system. So treat highmem pages in the same way as normal pages, that is to: 1) reset zone->managed_pages to zero in mem_init(). 2) recalculate managed_pages when freeing pages into the buddy system. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Tejun Heo <tj@kernel.org> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Minchan Kim <minchan@kernel.org> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: <sworddragon2@aol.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-29mm: Add alloc_bootmem_low_pages_nopanic()Yinghai Lu1-0/+8
We don't need to panic in some case, like for swiotlb preallocating. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-35-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-11mm: bootmem: fix free_all_bootmem_core() with odd bitmap alignmentMax Filippov1-6/+18
Currently free_all_bootmem_core ignores that node_min_pfn may be not multiple of BITS_PER_LONG. Eg commit 6dccdcbe2c3e ("mm: bootmem: fix checking the bitmap when finally freeing bootmem") shifts vec by lower bits of start instead of lower bits of idx. Also if (IS_ALIGNED(start, BITS_PER_LONG) && vec == ~0UL) assumes that vec bit 0 corresponds to start pfn, which is only true when node_min_pfn is a multiple of BITS_PER_LONG. Also loop in the else clause can double-free pages (e.g. with node_min_pfn == start == 1, map[0] == ~0 on 32-bit machine page 32 will be double-freed). This bug causes the following message during xtensa kernel boot: bootmem::free_all_bootmem_core nid=0 start=1 end=8000 BUG: Bad page state in process swapper pfn:00001 page:d04bd020 count:0 mapcount:-127 mapping: (null) index:0x2 page flags: 0x0() Call Trace: bad_page+0x8c/0x9c free_pages_prepare+0x5e/0x88 free_hot_cold_page+0xc/0xa0 __free_pages+0x24/0x38 __free_pages_bootmem+0x54/0x56 free_all_bootmem_core$part$11+0xeb/0x138 free_all_bootmem+0x46/0x58 mem_init+0x25/0xa4 start_kernel+0x11e/0x25c should_never_return+0x0/0x3be7 The fix is the following: - always align vec so that its bit 0 corresponds to start - provide BITS_PER_LONG bits in vec, if those bits are available in the map - don't free pages past next start position in the else clause. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Cc: Gavin Shan <shangw@linux.vnet.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Prasad Koya <prasad.koya@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12mm/bootmem.c: remove unused wrapper function reserve_bootmem_generic()Lin Feng1-6/+0
reserve_bootmem_generic() has no caller, Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12mm: introduce new field "managed_pages" to struct zoneJiang Liu1-0/+21
Currently a zone's present_pages is calcuated as below, which is inaccurate and may cause trouble to memory hotplug. spanned_pages - absent_pages - memmap_pages - dma_reserve. During fixing bugs caused by inaccurate zone->present_pages, we found zone->present_pages has been abused. The field zone->present_pages may have different meanings in different contexts: 1) pages existing in a zone. 2) pages managed by the buddy system. For more discussions about the issue, please refer to: http://lkml.org/lkml/2012/11/5/866 https://patchwork.kernel.org/patch/1346751/ This patchset tries to introduce a new field named "managed_pages" to struct zone, which counts "pages managed by the buddy system". And revert zone->present_pages to count "physical pages existing in a zone", which also keep in consistence with pgdat->node_present_pages. We will set an initial value for zone->managed_pages in function free_area_init_core() and will adjust it later if the initial value is inaccurate. For DMA/normal zones, the initial value is set to: (spanned_pages - absent_pages - memmap_pages - dma_reserve) Later zone->managed_pages will be adjusted to the accurate value when the bootmem allocator frees all free pages to the buddy system in function free_all_bootmem_node() and free_all_bootmem(). The bootmem allocator doesn't touch highmem pages, so highmem zones' managed_pages is set to the accurate value "spanned_pages - absent_pages" in function free_area_init_core() and won't be updated anymore. This patch also adds a new field "managed_pages" to /proc/zoneinfo and sysrq showmem. [akpm@linux-foundation.org: small comment tweaks] Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Tested-by: Chris Clayton <chris2553@googlemail.com> Cc: "Rafael J . Wysocki" <rjw@sisk.pl> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12bootmem: remove alloc_arch_preferred_bootmem()Joonsoo Kim1-16/+4
The name of this function is not suitable, and removing the function and open-coding it into each call sites makes the code more understandable. Additionally, we shouldn't do an allocation from bootmem when slab_is_available(), so directly return kmalloc()'s return value. Signed-off-by: Joonsoo Kim <js1304@gmail.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12bootmem: remove not implemented function call, bootmem_arch_preferred_node()Joonsoo Kim1-12/+0
There is no implementation of bootmem_arch_preferred_node() and a call to this function will cause a compilation error. So remove it. Signed-off-by: Joonsoo Kim <js1304@gmail.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11bootmem: fix wrong call parameter for free_bootmem()Joonsoo Kim1-10/+10
It is strange that alloc_bootmem() returns a virtual address and free_bootmem() requires a physical address. Anyway, free_bootmem()'s first parameter should be physical address. There are some call sites for free_bootmem() with virtual address. So fix them. [akpm@linux-foundation.org: improve free_bootmem() and free_bootmem_pate() documentation] Signed-off-by: Joonsoo Kim <js1304@gmail.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-16revert "mm: fix-up zone present pages"Andrew Morton1-9/+1
Revert commit 7f1290f2f2a4 ("mm: fix-up zone present pages") That patch tried to fix a issue when calculating zone->present_pages, but it caused a regression on 32bit systems with HIGHMEM. With that change, reset_zone_present_pages() resets all zone->present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone->present_pages when the boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Various options for improving the situation are being discussed but for now, let's return to the 3.6 code. Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David Rientjes <rientjes@google.com> Tested-by: Chris Clayton <chris2553@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm: fix-up zone present pagesJianguo Wu1-1/+9
I think zone->present_pages indicates pages that buddy system can management, it should be: zone->present_pages = spanned pages - absent pages - bootmem pages, but is now: zone->present_pages = spanned pages - absent pages - memmap pages. spanned pages: total size, including holes. absent pages: holes. bootmem pages: pages used in system boot, managed by bootmem allocator. memmap pages: pages used by page structs. This may cause zone->present_pages less than it should be. For example, numa node 1 has ZONE_NORMAL and ZONE_MOVABLE, it's memmap and other bootmem will be allocated from ZONE_MOVABLE, so ZONE_NORMAL's present_pages should be spanned pages - absent pages, but now it also minus memmap pages(free_area_init_core), which are actually allocated from ZONE_MOVABLE. When offlining all memory of a zone, this will cause zone->present_pages less than 0, because present_pages is unsigned long type, it is actually a very large integer, it indirectly caused zone->watermark[WMARK_MIN] becomes a large integer(setup_per_zone_wmarks()), than cause totalreserve_pages become a large integer(calculate_totalreserve_pages()), and finally cause memory allocating failure when fork process(__vm_enough_memory()). [root@localhost ~]# dmesg -bash: fork: Cannot allocate memory I think the bug described in http://marc.info/?l=linux-mm&m=134502182714186&w=2 is also caused by wrong zone present pages. This patch intends to fix-up zone->present_pages when memory are freed to buddy system on x86_64 and IA64 platforms. Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Reported-by: Petr Tesarik <ptesarik@suse.cz> Tested-by: Petr Tesarik <ptesarik@suse.cz> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-08-27bootmem: Fix the short description of reserve_bootmem()Javi Merino1-1/+1
It marks pages as reserved, as the long description says. Signed-off-by: Javi Merino <javi.merino@arm.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-07-17bootmem: make ___alloc_bootmem_node_nopanic() really nopanicYinghai Lu1-0/+4
In reaction to commit 99ab7b19440a ("mm: sparse: fix usemap allocation above node descriptor section") Johannes said: | while backporting the below patch, I realised that your fix busted | f5bf18fa22f8 again. The problem was not a panicking version on | allocation failure but when the usemap size was too large such that | goal + size > limit triggers the BUG_ON in the bootmem allocator. So | we need a version that passes limit ONLY if the usemap is smaller than | the section. after checking the code, the name of ___alloc_bootmem_node_nopanic() does not reflect the fact. Make bootmem really not panic. Hope will kill bootmem sooner. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> [3.3.x, 3.4.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-11mm: sparse: fix usemap allocation above node descriptor sectionYinghai Lu1-1/+1
After commit f5bf18fa22f8 ("bootmem/sparsemem: remove limit constraint in alloc_bootmem_section"), usemap allocations may easily be placed outside the optimal section that holds the node descriptor, even if there is space available in that section. This results in unnecessary hotplug dependencies that need to have the node unplugged before the section holding the usemap. The reason is that the bootmem allocator doesn't guarantee a linear search starting from the passed allocation goal but may start out at a much higher address absent an upper limit. Fix this by trying the allocation with the limit at the section end, then retry without if that fails. This keeps the fix from f5bf18fa22f8 of not panicking if the allocation does not fit in the section, but still makes sure to try to stay within the section at first. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> [3.3.x, 3.4.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29mm/bootmem.c: cleanup on addition to bootmem data listGavin Shan1-8/+8
The objects of "struct bootmem_data_t" are linked together to form double-linked list sequentially based on its minimal page frame number. The current implementation implicitly supports the following cases, which means the inserting point for current bootmem data depends on how "list_for_each" works. That makes the code a little hard to read. Besides, "list_for_each" and "list_entry" can be replaced with "list_for_each_entry". - The linked list is empty. - There has no entry in the linked list, whose minimal page frame number is bigger than current one. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29mm: remove sparsemem allocation details from the bootmem allocatorJohannes Weiner1-22/+0
alloc_bootmem_section() derives allocation area constraints from the specified sparsemem section. This is a bit specific for a generic memory allocator like bootmem, though, so move it over to sparsemem. As __alloc_bootmem_node_nopanic() already retries failed allocations with relaxed area constraints, the fallback code in sparsemem.c can be removed and the code becomes a bit more compact overall. [akpm@linux-foundation.org: fix build] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29mm: bootmem: pass pgdat instead of pgdat->bdata down the stackJohannes Weiner1-10/+10
Pass down the node descriptor instead of the more specific bootmem node descriptor down the call stack, like nobootmem does, when there is no good reason for the two to be different. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29mm: bootmem: unify allocation policy of (non-)panicking node allocationsJohannes Weiner1-20/+24
While the panicking node-specific allocation function tries to satisfy node+goal, goal, node, anywhere, the non-panicking function still does node+goal, goal, anywhere. Make it simpler: define the panicking version in terms of the non-panicking one, like the node-agnostic interface, so they always behave the same way apart from how to deal with allocation failure. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29mm: bootmem: allocate in order node+goal, goal, node, anywhereJohannes Weiner1-1/+13
Match the nobootmem version of __alloc_bootmem_node. Try to satisfy both the node and the goal, then just the goal, then just the node, then allocate anywhere before panicking. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29mm: bootmem: split out goal-to-node mapping from goal droppingJohannes Weiner1-2/+15
Matching the desired goal to the right node is one thing, dropping the goal when it can not be satisfied is another. Split this into separate functions so that subsequent patches can use the node-finding but drop and handle the goal fallback on their own terms. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29mm: bootmem: rename alloc_bootmem_core to alloc_bootmem_bdataJohannes Weiner1-7/+7
Callsites need to provide a bootmem_data_t *, make the naming more descriptive. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29mm: bootmem: remove redundant offset check when finally freeing bootmemJohannes Weiner1-1/+1
When bootmem releases an unaligned BITS_PER_LONG pages chunk of memory to the page allocator, it checks the bitmap if there are still unreserved pages in the chunk (set bits), but also if the offset in the chunk indicates BITS_PER_LONG loop iterations already. But since the consulted bitmap is only a one-word-excerpt of the full per-node bitmap, there can not be more than BITS_PER_LONG bits set in it. The additional offset check is unnecessary. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29mm: bootmem: fix checking the bitmap when finally freeing bootmemGavin Shan1-0/+1
When bootmem releases an unaligned chunk of memory at the beginning of a node to the page allocator, it iterates from that unaligned PFN but checks an aligned word of the page bitmap. The checked bits do not correspond to the PFNs and, as a result, reserved pages can be freed. Properly shift the bitmap word so that the lowest bit corresponds to the starting PFN before entering the freeing loop. This bug has been around since commit 41546c17418f ("bootmem: clean up free_all_bootmem_core") (2.6.27) without known reports. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21bootmem/sparsemem: remove limit constraint in alloc_bootmem_sectionNishanth Aravamudan1-3/+2
While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory Overcommit) on powerpc, we tripped the following: kernel BUG at mm/bootmem.c:483! cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940] pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c sp: c000000000c03bc0 msr: 8000000000021032 current = 0xc000000000b0cce0 paca = 0xc000000001d80000 pid = 0, comm = swapper kernel BUG at mm/bootmem.c:483! enter ? for help [c000000000c03c80] c000000000a64bcc .sparse_early_usemaps_alloc_node+0x84/0x29c [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294 [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460 [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c This is BUG_ON(limit && goal + size > limit); and after some debugging, it seems that goal = 0x7ffff000000 limit = 0x80000000000 and sparse_early_usemaps_alloc_node -> sparse_early_usemaps_alloc_pgdat_section calls return alloc_bootmem_section(usemap_size() * count, section_nr); This is on a system with 8TB available via the AMS pool, and as a quirk of AMS in firmware, all of that memory shows up in node 0. So, we end up with an allocation that will fail the goal/limit constraints. In theory, we could "fall-back" to alloc_bootmem_node() in sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE defined, we'll BUG_ON() instead. A simple solution appears to be to unconditionally remove the limit condition in alloc_bootmem_section, meaning allocations are allowed to cross section boundaries (necessary for systems of this size). Johannes Weiner pointed out that if alloc_bootmem_section() no longer guarantees section-locality, we need check_usemap_section_nr() to print possible cross-dependencies between node descriptors and the usemaps allocated through it. That makes the two loops in sparse_early_usemaps_alloc_node() identical, so re-factor the code a bit. [akpm@linux-foundation.org: code simplification] Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Anton Blanchard <anton@au1.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ben Herrenschmidt <benh@kernel.crashing.org> Cc: Robert Jennings <rcj@linux.vnet.ibm.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: <stable@vger.kernel.org> [3.3.1] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10mm: bootmem: try harder to free pages in bulkJohannes Weiner1-12/+10
The loop that frees pages to the page allocator while bootstrapping tries to free higher-order blocks only when the starting address is aligned to that block size. Otherwise it will free all pages on that node one-by-one. Change it to free individual pages up to the first aligned block and then try higher-order frees from there. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10mm: bootmem: drop superfluous range check when freeing pages in bulkJohannes Weiner1-1/+1
The area node_bootmem_map represents is aligned to BITS_PER_LONG, and all bits in any aligned word of that map valid. When the represented area extends beyond the end of the node, the non-existant pages will be marked as reserved. As a result, when freeing a page block, doing an explicit range check for whether that block is within the node's range is redundant as the bitmap is consulted anyway to see whether all pages in the block are unreserved. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10bootmem: micro optimize freeing pages in bulkUwe Kleine-König1-2/+2
The first entry of bdata->node_bootmem_map holds the data for bdata->node_min_pfn up to bdata->node_min_pfn + BITS_PER_LONG - 1. So the test for freeing all pages of a single map entry can be slightly relaxed. Moreover use DIV_ROUND_UP in another place instead of open coding it. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31mm: Map most files to use export.h instead of module.hPaul Gortmaker1-1/+1
The files changed within are only using the EXPORT_SYMBOL macro variants. They are not using core modular infrastructure and hence don't need module.h but only the export.h header. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-23crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, ↵Olaf Hering1-8/+0
setup_elfcorehdr and saved_max_pfn The Xen PV drivers in a crashed HVM guest can not connect to the dom0 backend drivers because both frontend and backend drivers are still in connected state. To run the connection reset function only in case of a crashdump, the is_kdump_kernel() function needs to be available for the PV driver modules. Consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn into kernel/crash_dump.c Also export elfcorehdr_addr to make is_kdump_kernel() usable for modules. Leave 'elfcorehdr' as early_param(). This changes powerpc from __setup() to early_param(). It adds an address range check from x86 also on ia64 and powerpc. [akpm@linux-foundation.org: additional #includes] [akpm@linux-foundation.org: remove elfcorehdr_addr export] [akpm@linux-foundation.org: fix for Tejun's mm/nobootmem.c changes] Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: Russell King <rmk@arm.linux.org.uk> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-24bootmem: Move contig_page_data definition to bootmem.c/nobootmem.cYinghai Lu1-0/+7
Now that bootmem.c and nobootmem.c are separate, it's cleaner to define contig_page_data in each file than in page_alloc.c with #ifdef. Move it. This patch doesn't introduce any behavior change. -v2: According to Andrew, fixed the struct layout. -tj: Updated commit description. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2011-02-24bootmem: Separate out CONFIG_NO_BOOTMEM code into nobootmem.cYinghai Lu1-170/+3
mm/bootmem.c contained code paths for both bootmem and no bootmem configurations. They implement about the same set of APIs in different ways and as a result bootmem.c contains massive amount of #ifdef CONFIG_NO_BOOTMEM. Separate out CONFIG_NO_BOOTMEM code into mm/nobootmem.c. As the common part is relatively small, duplicate them in nobootmem.c instead of creating a common file or ifdef'ing in bootmem.c. The followings are duplicated. * {min|max}_low_pfn, max_pfn, saved_max_pfn * free_bootmem_late() * ___alloc_bootmem() * __alloc_bootmem_low() The followings are applicable only to nobootmem and moved verbatim. * __free_pages_memory() * free_all_memory_core_early() The followings are not applicable to nobootmem and omitted in nobootmem.c. * reserve_bootmem_node() * reserve_bootmem() The rest split function bodies according to CONFIG_NO_BOOTMEM. Makefile is updated so that only either bootmem.c or nobootmem.c is built according to CONFIG_NO_BOOTMEM. This patch doesn't introduce any behavior change. -tj: Rewrote commit description. Suggested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2010-08-27x86, memblock: Replace e820_/_early string with memblock_Yinghai Lu1-2/+2
1.include linux/memblock.h directly. so later could reduce e820.h reference. 2 this patch is done by sed scripts mainly -v2: use MEMBLOCK_ERROR instead of -1ULL or -1UL Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27x86: Use memblock to replace early_resYinghai Lu1-0/+3
1. replace find_e820_area with memblock_find_in_range 2. replace reserve_early with memblock_x86_reserve_range 3. replace free_early with memblock_x86_free_range. 4. NO_BOOTMEM will switch to use memblock too. 5. use _e820, _early wrap in the patch, in following patch, will replace them all 6. because memblock_x86_free_range support partial free, we can remove some special care 7. Need to make sure that memblock_find_in_range() is called after memblock_x86_fill() so adjust some calling later in setup.c::setup_arch() -- corruption_check and mptable_update -v2: Move reserve_brk() early Before fill_memblock_area, to avoid overlap between brk and memblock_find_in_range() that could happen We have more then 128 RAM entry in E820 tables, and memblock_x86_fill() could use memblock_find_in_range() to find a new place for memblock.memory.region array. and We don't need to use extend_brk() after fill_memblock_area() So move reserve_brk() early before fill_memblock_area(). -v3: Move find_smp_config early To make sure memblock_find_in_range not find wrong place, if BIOS doesn't put mptable in right place. -v4: Treat RESERVED_KERN as RAM in memblock.memory. and they are already in memblock.reserved already.. use __NOT_KEEP_MEMBLOCK to make sure memblock related code could be freed later. -v5: Generic version __memblock_find_in_range() is going from high to low, and for 32bit active_region for 32bit does include high pages need to replace the limit with memblock.default_alloc_limit, aka get_max_mapped() -v6: Use current_limit instead -v7: check with MEMBLOCK_ERROR instead of -1ULL or -1L -v8: Set memblock_can_resize early to handle EFI with more RAM entries -v9: update after kmemleak changes in mainline Suggested-by: David S. Miller <davem@davemloft.net> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27bootmem, x86: Add weak version of reserve_bootmem_genericYinghai Lu1-0/+6
It will be used memblock_x86_to_bootmem converting It is an wrapper for reserve_bootmem, and x86 64bit is using special one. Also clean up that version for x86_64. We don't need to take care of numa path for that, bootmem can handle it how Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-07-20x86,nobootmem: make alloc_bootmem_node fall back to other node when 32bit ↵Yinghai Lu1-4/+20
numa is used Borislav Petkov reported his 32bit numa system has problem: [ 0.000000] Reserving total of 4c00 pages for numa KVA remap [ 0.000000] kva_start_pfn ~ 32800 max_low_pfn ~ 375fe [ 0.000000] max_pfn = 238000 [ 0.000000] 8202MB HIGHMEM available. [ 0.000000] 885MB LOWMEM available. [ 0.000000] mapped low ram: 0 - 375fe000 [ 0.000000] low ram: 0 - 375fe000 [ 0.000000] alloc (nid=8 100000 - 7ee00000) (1000000 - ffffffff) 1000 1000 => 34e7000 [ 0.000000] alloc (nid=8 100000 - 7ee00000) (1000000 - ffffffff) 200 40 => 34c9d80 [ 0.000000] alloc (nid=0 100000 - 7ee00000) (1000000 - ffffffffffffffff) 180 40 => 34e6140 [ 0.000000] alloc (nid=1 80000000 - c7e60000) (1000000 - ffffffffffffffff) 240 40 => 80000000 [ 0.000000] BUG: unable to handle kernel paging request at 40000000 [ 0.000000] IP: [<c2c8cff1>] __alloc_memory_core_early+0x147/0x1d6 [ 0.000000] *pdpt = 0000000000000000 *pde = f000ff53f000ff00 ... [ 0.000000] Call Trace: [ 0.000000] [<c2c8b4f8>] ? __alloc_bootmem_node+0x216/0x22f [ 0.000000] [<c2c90c9b>] ? sparse_early_usemaps_alloc_node+0x5a/0x10b [ 0.000000] [<c2c9149e>] ? sparse_init+0x1dc/0x499 [ 0.000000] [<c2c79118>] ? paging_init+0x168/0x1df [ 0.000000] [<c2c780ff>] ? native_pagetable_setup_start+0xef/0x1bb looks like it allocates too much high address for bootmem. Try to cut limit with get_max_mapped() Reported-by: Borislav Petkov <borislav.petkov@amd.com> Tested-by: Conny Seidel <conny.seidel@amd.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: <stable@kernel.org> [2.6.34.x] Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-07Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds1-2/+15
git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip: x86: Fix double enable_IR_x2apic() call on SMP kernel on !SMP boards x86: Increase CONFIG_NODES_SHIFT max to 10 ibft, x86: Change reserve_ibft_region() to find_ibft_region() x86, hpet: Fix bug in RTC emulation x86, hpet: Erratum workaround for read after write of HPET comparator bootmem, x86: Fix 32bit numa system without RAM on node 0 nobootmem, x86: Fix 32bit numa system without RAM on node 0 x86: Handle overlapping mptables x86: Make e820_remove_range to handle all covered case x86-32, resume: do a global tlb flush in S4 resume
2010-04-01bootmem, x86: Fix 32bit numa system without RAM on node 0Yinghai Lu1-1/+7
When 32bit numa is used, free_all_bootmem() will still only go over with node id 0. If node 0 doesn't have RAM installed, the lowest populated node becomes low RAM. This one fixes BOOTMEM path by iterating over the bdata_list. -v3: add more comments, and fix bootmem path too. -v4: seperate from one big patch Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4BB416D7.6090203@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-01nobootmem, x86: Fix 32bit numa system without RAM on node 0Yinghai Lu1-1/+8
On one system without RAM on node0, got following boot dump with a 32 bit NUMA kernel: early_node_map[4] active PFN ranges 1: 0x00000010 -> 0x00000099 1: 0x00000100 -> 0x0007da00 1: 0x0007e800 -> 0x0007ffa0 1: 0x0007ffae -> 0x0007ffb0 ... Subtract (29 early reservations) #000 [0000001000 - 0000002000] #001 [0000089000 - 000008f000] #002 [0000091000 - 0000093500] ... #027 [007cbfef40 - 007e800000] #028 [007e9ca000 - 007ff95000] (0 free memory ranges) Initializing HighMem for node 0 (00000000:00000000) Initializing HighMem for node 1 (00000000:00000000) Memory: 0k/2096832k available (6662k kernel code, 2096300k reserved, 4829k data, 484k init, 0k highmem) ... Checking if this processor honours the WP bit even in supervisor mode...Ok. swapper: page allocation failure. order:0, mode:0x0 Pid: 0, comm: swapper Not tainted 2.6.34-rc3-tip-03818-g4b1ea6c-dirty #35 Call Trace: [<4087a5dc>] ? printk+0xf/0x11 [<40286728>] __alloc_pages_nodemask+0x417/0x487 [<402a9ce1>] new_slab+0xe2/0x1fe [<402aa5b2>] kmem_cache_open+0x185/0x358 [<402abbc0>] T.954+0x1c/0x60 [<40d52a29>] kmem_cache_init+0x24/0x113 [<40d39738>] start_kernel+0x166/0x2e4 [<40d3940e>] ? unknown_bootoption+0x0/0x18e [<40d390ce>] i386_start_kernel+0xce/0xd5 Mem-Info: Node 1 DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 Node 1 Normal per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 active_anon:0 inactive_anon:0 isolated_anon:0 active_file:0 inactive_file:0 isolated_file:0 unevictable:0 dirty:0 writeback:0 unstable:0 free:0 slab_reclaimable:0 slab_unreclaimable:0 mapped:0 shmem:0 pagetables:0 bounce:0 When 32bit NUMA is used, free_all_bootmem() will still only go over with node id 0. If node 0 doesn't have RAM installed, We need to go with node1 because early_node_map still use 1 for all ranges, and ram from node1 become low ram. Use MAX_NUMNODES like 64-bit NUMA does. Note: BOOTMEM path has the same problem. this bug exist before We have NO_BOOTMEM support. -v3: add more comments, and fix bootmem path too. -v4: seperate bootmem path fix Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4BB41689.9090502@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo1-0/+1
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24x86: Remove excessive early_res debug outputJiri Kosina1-13/+0
Commit 08677214e318297 ("x86: Make 64 bit use early_res instead of bootmem before slab") introduced early_res replacement for bootmem, but left code in __free_pages_memory() which dumps all the ranges that are beeing freed, without any additional information, causing some noise in dmesg during bootup. Just remove printing of the ranges, that doesn't provide anything useful anyway. While at it, remove other commented-out KERN_DEBUG messages in the NO_BOOTMEM code as well. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Found-OK-by: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <alpine.LNX.2.00.1003220931360.18642@pobox.suse.cz> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-12x86: Make 64 bit use early_res instead of bootmem before slabYinghai Lu1-3/+192
Finally we can use early_res to replace bootmem for x86_64 now. Still can use CONFIG_NO_BOOTMEM to enable it or not. -v2: fix 32bit compiling about MAX_DMA32_PFN -v3: folded bug fix from LKML message below Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4B747239.4070907@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-15mm/bootmem.c: properly __init-annotate helper functionsJan Beulich1-4/+4
Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-10bootmem: Add free_bootmem_late()FUJITA Tomonori1-0/+24
Add a new function for freeing bootmem after the bootmem allocator has been released and the unreserved pages given to the page allocator. This allows us to reserve bootmem and then release it if we later discover it was not needed. ( This new API will be used by the swiotlb code to recover a significant amount of RAM (64MB). ) Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: chrisw@sous-sol.org Cc: dwmw2@infradead.org Cc: joerg.roedel@amd.com Cc: muli@il.ibm.com Cc: hannes@cmpxchg.org Cc: tj@kernel.org Cc: akpm@linux-foundation.org Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <1257849980-22640-7-git-send-email-fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-27kmemleak: Do not report alloc_bootmem blocks as leaksCatalin Marinas1-1/+5
This patch sets the min_count for alloc_bootmem objects to 0 so that they are never reported as leaks. This is because many of these blocks are only referred via the physical address which is not looked up by kmemleak. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi>
2009-07-08kmemleak: Add callbacks to the bootmem allocatorCatalin Marinas1-0/+6
This patch adds kmemleak_alloc/free callbacks to the bootmem allocator. This would allow scanning of such blocks and help avoiding a whole class of false positives and more kmemleak annotations. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
2009-06-19bootmem.c: avoid c90 declaration warningJoe Perches1-5/+9
[akpm@linux-foundation.org: cleanup] Signed-off-by: Joe Perches <joe@perches.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tejun Heo <tj@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-11bootmem: fix slab fallback on numaPekka Enberg1-0/+9
If the user requested bootmem allocation on a specific node, we should use kzalloc_node() for the fallback allocation. Cc: Ingo Molnar <mingo@elte.hu> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-11bootmem: use slab if bootmem is no longer availablePekka Enberg1-0/+3
As a preparation for initializing the slab allocator early, make sure the bootmem allocator does not crash and burn if someone calls it after slab is up; otherwise we'd need a flag day for switching to early slab. Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matt Mackall <mpm@selenic.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-03-01bootmem, x86: further fixes for arch-specific bootmem wrappingTejun Heo1-15/+30
Impact: fix new breakages introduced by previous fix Commit c132937556f56ee4b831ef4b23f1846e05fde102 tried to clean up bootmem arch wrapper but it wasn't quite correct. Before the commit, the followings were broken. * Low level interface functions prefixed with __ ignored arch preference. * reserve_bootmem(...) can't be mapped into reserve_bootmem_node(NODE_DATA(0)->bdata, ...) because the node is not preference here. The region specified MUST fall into the specified region; otherwise, it will panic. After the commit, * If allocation fails for the arch preferred node, it should fallback to whatever is available. Instead, it simply failed allocation. There are too many internal details to allow generic wrapping and still keep things simple for archs. Plus, all that arch wants is a way to prefer certain node over another. This patch drops the generic wrapping around alloc_bootmem_core() and add alloc_bootmem_core() instead. If necessary, arch can define bootmem_arch_referred_node() macro or function which takes all allocation information and returns the preferred node. bootmem generic code will always try the preferred node first and then fallback to other nodes as usual. Breakages noted and changes reviewed by Johannes Weiner. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
2009-02-24bootmem: clean up arch-specific bootmem wrappingTejun Heo1-3/+11
Impact: cleaner and consistent bootmem wrapping By setting CONFIG_HAVE_ARCH_BOOTMEM_NODE, archs can define arch-specific wrappers for bootmem allocation. However, this is done a bit strangely in that only the high level convenience macros can be changed while lower level, but still exported, interface functions can't be wrapped. This not only is messy but also leads to strange situation where alloc_bootmem() does what the arch wants it to do but the equivalent __alloc_bootmem() call doesn't although they should be able to be used interchangeably. This patch updates bootmem such that archs can override / wrap the backend function - alloc_bootmem_core() instead of the highlevel interface functions to allow simpler and consistent wrapping. Also, HAVE_ARCH_BOOTMEM_NODE is renamed to HAVE_ARCH_BOOTMEM. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Johannes Weiner <hannes@saeurebad.de>
2009-01-06bootmem: print request details before BUG_ON(them)Johannes Weiner1-4/+4
Moving the request details print-out before the sanity checks that might panic() enables us to analyse invalid requests without having access to the line information of the stack dump. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16misc: replace __FUNCTION__ with __func__Harvey Harrison1-1/+1
__FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-20bootmem: fix aligning of node-relative indexes and offsetsJohannes Weiner1-6/+29
Absolute alignment requirements may never be applied to node-relative offsets. Andreas Herrmann spotted this flaw when a bootmem allocation on an unaligned node was itself not aligned because the combination of an unaligned node with an aligned offset into that node is not garuanteed to be aligned itself. This patch introduces two helper functions that align a node-relative index or offset with respect to the node's starting address so that the absolute PFN or virtual address that results from combining the two satisfies the requested alignment. Then all the broken ALIGN()s in alloc_bootmem_core() are replaced by these helpers. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Reported-by: Andreas Herrmann <andreas.herrmann3@amd.com> Debugged-by: Andreas Herrmann <andreas.herrmann3@amd.com> Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com> Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15bootmem allocator: alloc_bootmem_core(): page-align the end offsetMikulas Patocka1-1/+1
This is the minimal sequence that jams the allocator: void *p, *q, *r; p = alloc_bootmem(PAGE_SIZE); q = alloc_bootmem(64); free_bootmem(p, PAGE_SIZE); p = alloc_bootmem(PAGE_SIZE); r = alloc_bootmem(64); after this sequence (assuming that the allocator was empty or page-aligned before), pointer "q" will be equal to pointer "r". What's hapenning inside the allocator: p = alloc_bootmem(PAGE_SIZE); in allocator: last_end_off == PAGE_SIZE, bitmap contains bits 10000... q = alloc_bootmem(64); in allocator: last_end_off == PAGE_SIZE + 64, bitmap contains 11000... free_bootmem(p, PAGE_SIZE); in allocator: last_end_off == PAGE_SIZE + 64, bitmap contains 01000... p = alloc_bootmem(PAGE_SIZE); in allocator: last_end_off == PAGE_SIZE, bitmap contains 11000... r = alloc_bootmem(64); and now: it finds bit "2", as a place where to allocate (sidx) it hits the condition if (bdata->last_end_off && PFN_DOWN(bdata->last_end_off) + 1 == sidx)) start_off = ALIGN(bdata->last_end_off, align); -you can see that the condition is true, so it assigns start_off = ALIGN(bdata->last_end_off, align); (that is PAGE_SIZE) and allocates over already allocated block. With the patch it tries to continue at the end of previous allocation only if the previous allocation ended in the middle of the page. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Johannes Weiner <hannes@saeurebad.de> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: replace node_boot_start in struct bootmem_dataJohannes Weiner1-19/+21
Almost all users of this field need a PFN instead of a physical address, so replace node_boot_start with node_min_pfn. [Lee.Schermerhorn@hp.com: fix spurious BUG_ON() in mark_bootmem()] Signed-off-by: Johannes Weiner <hannes@saeureba.de> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: revisit alloc_bootmem_sectionJohannes Weiner1-21/+6
Since alloc_bootmem_core does no goal-fallback anymore and just returns NULL if the allocation fails, we might now use it in alloc_bootmem_section without all the fixup code for a misplaced allocation. Also, the limit can be the first PFN of the next section as the semantics is that the limit is _above_ the allocated region, not within. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: Make __alloc_bootmem_low_node fall back to other nodesJohannes Weiner1-9/+16
__alloc_bootmem_node already does this, make the interface consistent. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: respect goal more likelyJohannes Weiner1-38/+54
The old node-agnostic code tried allocating on all nodes starting from the one with the lowest range. alloc_bootmem_core retried without the goal if it could not satisfy it and so the goal was only respected at all when it happened to be on the first (lowest page numbers) node (or theoretically if allocations failed on all nodes before to the one holding the goal). Introduce a non-panicking helper that starts allocating from the node holding the goal and falls back only after all thes tries failed, thus moving the goal fallback code out of alloc_bootmem_core. Make all other allocation functions benefit from this new helper. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: factor out the marking of a PFN rangeJohannes Weiner1-119/+69
Introduce new helpers that mark a range that resides completely on a node or node-agnostic ranges that might also span node boundaries. The free/reserve API functions will then directly use these helpers. Note that the free/reserve semantics become more strict: while the prior code took basically arbitrary range arguments and marked the PFNs that happen to fall into that range, the new code requires node-specific ranges to be completely on the node. The node-agnostic requests might span node boundaries as long as the nodes are contiguous. Passing ranges that do not satisfy these criteria is a bug. [akpm@linux-foundation.org: fix printk warnings] Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: free/reserve helpersJohannes Weiner1-22/+43
Factor out the common operation of marking a range on the bitmap. [akpm@linux-foundation.org: fix various warnings] Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: clean up alloc_bootmem_coreJohannes Weiner1-136/+76
alloc_bootmem_core has become quite nasty to read over time. This is a clean rewrite that keeps the semantics. bdata->last_pos has been dropped. bdata->last_success has been renamed to hint_idx and it is now an index relative to the node's range. Since further block searching might start at this index, it is now set to the end of a succeeded allocation rather than its beginning. bdata->last_offset has been renamed to last_end_off to be more clear that it represents the ending address of the last allocation relative to the node. [y-goto@jp.fujitsu.com: fix new alloc_bootmem_core()] Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: clean up free_all_bootmem_coreJohannes Weiner1-45/+38
Rewrite the code in a more concise way using less variables. [akpm@linux-foundation.org: fix printk warnings] Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: revisit bootmem descriptor list handlingJohannes Weiner1-13/+10
link_bootmem handles an insertion of a new descriptor into the sorted list in more or less three explicit branches; empty list, insert in between and append. These cases can be expressed implicite. Also mark the sorted list as initdata as it can be thrown away after boot as well. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: revisit bitmap size calculationsJohannes Weiner1-18/+9
Reincarnate get_mapsize as bootmap_bytes and implement bootmem_bootmap_pages on top of it. Adjust users of these helpers and make free_all_bootmem_core use bootmem_bootmap_pages instead of open-coding it. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: add debugging frameworkJohannes Weiner1-7/+44
Introduce the bootmem_debug kernel parameter that enables very verbose diagnostics regarding all range operations of bootmem as well as the initialization and release of nodes. [akpm@linux-foundation.org: fix printk warnings] Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: add documentation to API functionsJohannes Weiner1-1/+149
Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: clean up bootmem.c file headerJohannes Weiner1-9/+5
Change the description, move a misplaced comment about the allocator itself and add me to the list of copyright holders. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: reorder code to match new bootmem structureJohannes Weiner1-179/+177
This only reorders functions so that further patches will be easier to read. No code changed. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24mm: introduce non panic alloc_bootmemAndi Kleen1-0/+12
Straight forward variant of the existing __alloc_bootmem_node, only subsequent patch when allocating giant hugepages at boot -- don't want to panic if we can't allocate as many as the user asked for. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24mm: unexport __alloc_bootmem_core()Johannes Weiner1-12/+12
This function has no external callers, so unexport it. Also fix its naming inconsistency. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24mm: normalize internal argument passing of bootmem dataJohannes Weiner1-8/+6
All _core functions only need the bootmem data, not the whole node descriptor. Adjust the two functions that take the node descriptor unneededly. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24mm: fix free_all_bootmem_core alignment checkJohannes Weiner1-11/+10
The check for node_boot_start is bogus because we start freeing at the corresponding pfn. So check if the pfn is properly aligned instead in a more readable way and adjust the documentation. Also remove an unneeded accounting variable. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24mm: move bootmem descriptors definition to a single placeJohannes Weiner1-0/+2
There are a lot of places that define either a single bootmem descriptor or an array of them. Use only one central array with MAX_NUMNODES items instead. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kyle McMartin <kyle@parisc-linux.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David S. Miller <davem@davemloft.net> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24mm: make defensive checks around PFN values registered for memory usageMel Gorman1-0/+1
There are a number of different views to how much memory is currently active. There is the arch-independent zone-sizing view, the bootmem allocator and memory models view. Architectures register this information at different times and is not necessarily in sync particularly with respect to some SPARSEMEM limitations. This patch introduces mminit_validate_memmodel_limits() which is able to validate and correct PFN ranges with respect to the memory model. It is only SPARSEMEM that currently validates itself. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-21Add return value to reserve_bootmem_node()Bernhard Walle1-2/+4
This patch changes the function reserve_bootmem_node() from void to int, returning -ENOMEM if the allocation fails. This fixes a build problem on x86 with CONFIG_KEXEC=y and CONFIG_NEED_MULTIPLE_NODES=y Signed-off-by: Bernhard Walle <bwalle@suse.de> Reported-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28memory hotplug: make alloc_bootmem_section()Yasunori Goto1-0/+31
alloc_bootmem_section() can allocate specified section's area. This is used for usemap to keep same section with pgdat by later patch. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28memory hotplug: register section/node id to freeYasunori Goto1-0/+1
This patch set is to free pages which is allocated by bootmem for memory-hotremove. Some structures of memory management are allocated by bootmem. ex) memmap, etc. To remove memory physically, some of them must be freed according to circumstance. This patch set makes basis to free those pages, and free memmaps. Basic my idea is using remain members of struct page to remember information of users of bootmem (section number or node id). When the section is removing, kernel can confirm it. By this information, some issues can be solved. 1) When the memmap of removing section is allocated on other section by bootmem, it should/can be free. 2) When the memmap of removing section is allocated on the same section, it shouldn't be freed. Because the section has to be logical memory offlined already and all pages must be isolated against page allocater. If it is freed, page allocator may use it which will be removed physically soon. 3) When removing section has other section's memmap, kernel will be able to show easily which section should be removed before it for user. (Not implemented yet) 4) When the above case 2), the page isolation will be able to check and skip memmap's page when logical memory offline (offline_pages()). Current page isolation code fails in this case because this page is just reserved page and it can't distinguish this pages can be removed or not. But, it will be able to do by this patch. (Not implemented yet.) 5) The node information like pgdat has similar issues. But, this will be able to be solved too by this. (Not implemented yet, but, remembering node id in the pages.) Fortunately, current bootmem allocator just keeps PageReserved flags, and doesn't use any other members of page struct. The users of bootmem doesn't use them too. This patch: This is to register information which is node or section's id. Kernel can distinguish which node/section uses the pages allcated by bootmem. This is basis for hot-remove sections or nodes. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-26mm: allow reserve_bootmem() cross nodesYinghai Lu1-23/+69
split reserve_bootmem_core() into two functions, one which checks conflicts, and one which sets the bits. and make reserve_bootmem to loop bdata_list to cross the nodes. user could be crashkernel and ramdisk..., in case the range provided by those externalities crosses the nodes. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26mm: offset align in alloc_bootmem()Yinghai Lu1-26/+34
need offset alignment when node_boot_start's alignment is less than the alignment required. use local node_boot_start to match alignment - so don't add extra operation in search loop. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26mm: fix alloc_bootmem_core to use fast searching for all nodesYinghai Lu1-6/+12
Make the nodes other than node 0 use bdata->last_success for fast search too. We need to use __alloc_bootmem_core() for vmemmap allocation for other nodes when numa and sparsemem/vmemmap are enabled. Also, make fail_block path increase i with incr only after ALIGN to avoid extra increase when size is larger than align. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-24mm: fix boundary checking in free_bootmem_coreYinghai Lu1-6/+19
With numa enabled, some callers could have a range of memory on one node but try to free that on other node. This can cause some pages to be freed wrongly. For example: when we try to allocate 128g boot ram early for gart/swiotlb, and free that range later so gart/swiotlb can get some range afterwards. With this patch, we don't need to care which node holds the range, just loop to call free_bootmem_node for all online nodes. This patch makes free_bootmem_core() more robust by trimming the sidx and eidx according the ram range that the node has. And make the free_bootmem_core handle this out of range case. We could use bdata_list to make sure the range can be freed for sure. So next time, we don't need to loop online nodes and could use free_bootmem directly. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Andi Kleen <ak@suse.de> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Ingo Molnar <mingo@elte.hu> Tested-by: Ingo Molnar <mingo@elte.hu> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07Introduce flags for reserve_bootmem()Bernhard Walle1-6/+21
This patchset adds a flags variable to reserve_bootmem() and uses the BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions between crashkernel area and already used memory. This patch: Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE. If that flag is set, the function returns with -EBUSY if the memory already has been reserved in the past. This is to avoid conflicts. Because that code runs before SMP initialisation, there's no race condition inside reserve_bootmem_core(). [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix powerpc build] Signed-off-by: Bernhard Walle <bwalle@suse.de> Cc: <linux-arch@vger.kernel.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2006-12-07[PATCH] remove EXPORT_UNUSED_SYMBOL'ed symbolsAdrian Bunk1-2/+0
In time for 2.6.20, we can get rid of this junk. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] enable booting a NUMA system where some nodes have no memoryChristian Krafft1-0/+4
When booting a NUMA system with nodes that have no memory (eg by limiting memory), bootmem_alloc_core tried to find pages in an uninitialized bootmem_map. This caused a null pointer access. This fix adds a check, so that NULL is returned. That will enable the caller (bootmem_alloc_nopanic) to alloc memory on other without a panic. Signed-off-by: Christian Krafft <krafft@de.ibm.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Martin Bligh <mbligh@google.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26[PATCH] bootmem: use MAX_DMA_ADDRESS instead of LOW32LIMITHeiko Carstens1-3/+8
Introduce ARCH_LOW_ADDRESS_LIMIT which can be set per architecture to override the 4GB default limit used by the bootmem allocater within __alloc_bootmem_low() and __alloc_bootmem_low_node(). E.g. s390 needs a 2GB limit instead of 4GB. Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26[PATCH] bootmem: miscellaneous coding style fixesFranck Bui-Huu1-37/+44
It fixes various coding style issues, specially when spaces are useless. For example '*' go next to the function name. Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26[PATCH] bootmem: use pfn/page conversion macrosFranck Bui-Huu1-37/+45
It also creates get_mapsize() helper in order to make the code more readable when it calculates the boot bitmap size. Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26[PATCH] bootmem: remove useless headers inclusionsFranck Bui-Huu1-7/+3
Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26[PATCH] bootmem: limit to 80 columns widthFranck Bui-Huu1-10/+18
Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26[PATCH] bootmem: mark link_bootmem() as part of the __init sectionFranck Bui-Huu1-1/+1
Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10[PATCH] mm/bootmem.c: EXPORT_UNUSED_SYMBOLAdrian Bunk1-3/+1
This patch marks an unused export as EXPORT_UNUSED_SYMBOL. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-09[PATCH] x86_64: Handle empty PXMs that only contain hotplug memoryAndi Kleen1-1/+8
The node setup code would try to allocate the node metadata in the node itself, but that fails if there is no memory in there. This can happen with memory hotplug when the hotplug area defines an so far empty node. Now use bootmem to try to allocate the mem_map in other nodes. And if it fails don't panic, but just ignore the node. To make this work I added a new __alloc_bootmem_nopanic function that does what its name implies. TBD should try to use nearby nodes here. Currently we just use any. It's hard to do it better because bootmem doesn't have proper fallback lists yet. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] for_each_online_pgdat: for_each_bootmemKAMEZAWA Hiroyuki1-10/+29
Add a list_head to bootmem_data_t and make bootmems use it. bootmem list is sorted by node_boot_start. Only nodes against which init_bootmem() is called are linked to the list. (i386 allocates bootmem only from one node(0) not from all online nodes.) A summary: 1. for_each_online_pgdat() traverses all *online* nodes. 2. alloc_bootmem() allocates memory only from initialized-for-bootmem nodes. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] x86_64: Try to allocate node memmap near the end of nodeAndi Kleen1-1/+1
This fixes problems with very large nodes (over 128GB) filling up all of the first 4GB with their mem_map and not leaving enough space for the swiotlb. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] FRV: Clean up bootmem allocator's page freeing algorithmDavid Howells1-16/+4
The attached patch cleans up the way the bootmem allocator frees pages. A new function, __free_pages_bootmem(), is provided in mm/page_alloc.c that is called from mm/bootmem.c to turn pages over to the main allocator. All the bits of code to initialise pages (clearing PG_reserved and setting the page count) are moved to here. The checks on page validity are removed, on the assumption that the struct page arrays will have been prepared correctly. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] Cleanup bootmem allocator and fix alloc_bootmem_lowRavikiran G Thirumalai1-7/+31
Patch cleans up the alloc_bootmem fix for swiotlb. Patch removes alloc_bootmem_*_limit api and fixes alloc_boot_*low api to do the right thing -- allocate from low32 memory. Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12[PATCH] fix in __alloc_bootmem_core() when there is no free page in first ↵Haren Myneni1-0/+2
node's memory Hitting BUG_ON() in __alloc_bootmem_core() when there is no free page available in the first node's memory. For the case of kdump on PPC64 (Power 4 machine), the captured kernel is used two memory regions - memory for TCE tables (tce-base and tce-size at top of RAM and reserved) and captured kernel memory region (crashk_base and crashk_size). Since we reserve the memory for the first node, we should be returning from __alloc_bootmem_core() to search for the next node (pg_dat). Currently, find_next_zero_bit() is returning the n^th bit (eidx) when there is no free page. Then, test_bit() is failed since we set 0xff only for the actual size initially (init_bootmem_core()) even though rounded up to one page for bdata->node_bootmem_map. We are hitting the BUG_ON after failing to enter second "for" loop. Signed-off-by: Haren Myneni <haren@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29[PATCH] core remove PageReservedNick Piggin1-0/+1
Remove PageReserved() calls from core code by tightening VM_RESERVED handling in mm/ to cover PageReserved functionality. PageReserved special casing is removed from get_page and put_page. All setting and clearing of PageReserved is retained, and it is now flagged in the page_alloc checks to help ensure we don't introduce any refcount based freeing of Reserved pages. MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being deprecated. We never completely handled it correctly anyway, and is be reintroduced in future if required (Hugh has a proof of concept). Once PageReserved() calls are removed from kernel/power/swsusp.c, and all arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can be trivially removed. Last real user of PageReserved is swsusp, which uses PageReserved to determine whether a struct page points to valid memory or not. This still needs to be addressed (a generic page_is_ram() should work). A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue. Signed-off-by: Nick Piggin <npiggin@suse.de> Refcount bug fix for filemap_xip.c Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-19[PATCH] swiotlb: make sure initial DMA allocations really are in DMA memoryYasunori Goto1-10/+21
This introduces a limit parameter to the core bootmem allocator; The new parameter indicates that physical memory allocated by the bootmem allocator should be within the requested limit. We also introduce alloc_bootmem_low_pages_limit, alloc_bootmem_node_limit, alloc_bootmem_low_pages_node_limit apis, but alloc_bootmem_low_pages_limit is the only api used for swiotlb. The existing alloc_bootmem_low_pages() api could instead have been changed and made to pass right limit to the core allocator. But that would make the patch more intrusive for 2.6.14, as other arches use alloc_bootmem_low_pages(). We may be done that post 2.6.14 as a cleanup. With this, swiotlb gets memory within 4G for both x86_64 and ia64 arches. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Ravikiran G Thirumalai <kiran@scalex86.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-30Revert "x86-64: Reverse order of bootmem lists"Linus Torvalds1-11/+3
As requested by Thomas Gleixner <tglx@linutronix.de>: "5d3d0f7704ed0bc7eaca0501eeae3e5da1ea6c87 breaks a couple of ARM boards, which depend on the historical bootmem allocation order. There is a cleaner solution around to remove the pgdat list completely, but this is a topic for post 2.6.14 Andi signalled ACK already." Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12[PATCH] x86-64: Reverse order of bootmem listsAndi Kleen1-3/+11
This leads to bootmem allocating first from node 0 instead of from the last node. This avoids swiotlb allocating on the last node, which doesn't really work on a machine with >4GB. Note: there is a better patch around from someone else that gets rid of the pgdat list completely. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25[PATCH] Use ALIGN to remove duplicate codeNick Wilson1-3/+3
This patch makes use of ALIGN() to remove duplicate round-up code. Signed-off-by: Nick Wilson <njw@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25[PATCH] kdump: Retrieve saved max pfnVivek Goyal1-0/+8
This patch retrieves the max_pfn being used by previous kernel and stores it in a safe location (saved_max_pfn) before it is overwritten due to user defined memory map. This pfn is used to make sure that user does not try to read the physical memory beyond saved_max_pfn. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23[PATCH] sparsemem memory modelAndy Whitcroft1-2/+7
Sparsemem abstracts the use of discontiguous mem_maps[]. This kind of mem_map[] is needed by discontiguous memory machines (like in the old CONFIG_DISCONTIGMEM case) as well as memory hotplug systems. Sparsemem replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually become a complete replacement. A significant advantage over DISCONTIGMEM is that it's completely separated from CONFIG_NUMA. When producing this patch, it became apparent in that NUMA and DISCONTIG are often confused. Another advantage is that sparse doesn't require each NUMA node's ranges to be contiguous. It can handle overlapping ranges between nodes with no problems, where DISCONTIGMEM currently throws away that memory. Sparsemem uses an array to provide different pfn_to_page() translations for each SECTION_SIZE area of physical memory. This is what allows the mem_map[] to be chopped up. In order to do quick pfn_to_page() operations, the section number of the page is encoded in page->flags. Part of the sparsemem infrastructure enables sharing of these bits more dynamically (at compile-time) between the page_zone() and sparsemem operations. However, on 32-bit architectures, the number of bits is quite limited, and may require growing the size of the page->flags type in certain conditions. Several things might force this to occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of memory), an increase in the physical address space, or an increase in the number of used page->flags. One thing to note is that, once sparsemem is present, the NUMA node information no longer needs to be stored in the page->flags. It might provide speed increases on certain platforms and will be stored there if there is room. But, if out of room, an alternate (theoretically slower) mechanism is used. This patch introduces CONFIG_FLATMEM. It is used in almost all cases where there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM often have to compile out the same areas of code. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Martin Bligh <mbligh@aracnet.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16Linux-2.6.12-rc2v2.6.12-rc2Linus Torvalds1-0/+400
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!