IRC Logs of #nouveau on irc.freenode.net for 2025-06-25

06:13 olivia_fl[d]: mohamexiety[d]: I'm looking at the VK_EXT_host_image_copy implementation in nvk, and I think it's the only driver in mesa that allocates a temp buffer for the whole image in the tiled image->tiled image case. The other drivers all detile/tile smaller chunks at a time. How did you end up making that decision?
06:15 olivia_fl[d]: currently working on host image copy for panvk, and there at least copying one tile at a time the way the other drivers do would be pretty inefficient if either region is not tile-aligned. We would hit the slow-path for the tiling code every time. My current code works in fixed-size chunks that are larger than a tile to mitigate this
06:16 olivia_fl[d]: (I was kinda hesitant to buffer the whole image because a stated motivation in the VK_EXT_host_image_copy proposal was reducing memory spikes)
07:50 mohamexiety[d]: olivia_fl[d]: The original plan was to do what you do and work in smaller chunks at a time however there were a few concerns that I wasn’t sure about like what size should the chunk be and what about alignment concerns. Ultimately though I ended up going with the whole image approach just to get something working at the time since I had a few bugs I had to track and then kinda forgot about it tbh
07:54 mohamexiety[d]: For us it’s not really a big deal either way as memory isn’t too tight and perf will always remain bad as we have to go over PCIe though it is ofc not as optimal as it can be. I want to revisit it later but I am still not sure what the optimal strategy is
08:39 olivia_fl[d]: ah, that makes sense, thanks!
08:40 olivia_fl[d]: for sizing it appears that all the other drivers do one tile, which is almost certainly not optimal
08:42 olivia_fl[d]: gonna benchmark a few sizes once everything is working. Might also be worth trying a direct tiled->tiled copy implementation without an intermediate buffer...
09:20 mohamexiety[d]: olivia_fl[d]: for tiled -> tiled, can you guarantee that the two images will have identical tiling on mali, and both are also tile aligned? I couldn't do direct tiled -> tiled cuz we couldnt guarantee both :/
09:28 olivia_fl[d]: mohamexiety[d]: no alignment guarantees, which makes direct tiled -> tiled annoying (and likely slow) but not impossible. The actual tiling layout is always the same for a given block size
09:28 mohamexiety[d]: ahh nice then
09:28 olivia_fl[d]: for the special case where the src and dst have the same relative alignment, you can just memcpy the complete tiles which is neat
09:28 mohamexiety[d]: yup!
09:28 olivia_fl[d]: and I imagine is actually the common case
09:29 olivia_fl[d]: I had trouble finding host-image-copy usage in the wild, but I suspect it's almost always just copying the whole image
09:30 mohamexiety[d]: yeah it's still not too common. and the issue is most apps are usually written for dGPU systems where host image copy just isn't optimal vs using the copy engines anyway
09:30 olivia_fl[d]: that makes sense
09:40 linkmauve: ffmpeg recently switched to using it, as it was faster on some drivers.
09:41 olivia_fl[d]: ooh, thanks, I'll check out what they're doing 🙂
09:42 olivia_fl[d]: tangentially related: the only thing that stops nvk from advertising 1.4 on maxwell is host image copy right?
09:43 olivia_fl[d]: and the only missing piece for host image copy on maxwell is somebody needs to RE the gob swizzling?
09:43 olivia_fl[d]: if so, I can _maybe_ give that a shot this weekend
09:43 olivia_fl[d]: would like to get back into doing nvk stuff
09:47 mohamexiety[d]: olivia_fl[d]: yeah
09:47 avhe[d]: isn't the maxwell gob layout described in the t210 reference manual?
09:47 mohamexiety[d]: olivia_fl[d]: and yeah. I have actually been working on it these days but it’s a bit… weird
09:47 mohamexiety[d]: avhe[d]: T210?
09:47 avhe[d]: tegra 210
09:48 avhe[d]: or tegra x1
09:48 mohamexiety[d]: It’s described in the X1 manual but it doesn’t match the dGPUs 😦
09:48 marysaka[d]: It's not matching yeah
09:48 avhe[d]: oh i thought it was identical
09:48 mohamexiety[d]: tegra in general seems to be different than dGPUs in that regard. we were similarly burnt during initial hic implementation when we used the Orin TRM documentation as a reference
09:50 avhe[d]: actually that makes sense. nvdec stuff has a config field called tileFormat which is set to 1 or 0 depending on discrete/tegra
12:02 avhe[d]: in case this is of interest to someone: on volta+ tegras they removed the SYNCPOINTA/B host methods they were used to synchronize between host1x and gpfifo.
12:02 avhe[d]: instead, gpu address spaces have a special read-only memory region which mirrors the syncpt state, and regular semaphore methods can be used instead
12:02 avhe[d]: <https://github.com/alliedvision/nvidia-nvgpu/blob/l4t/l4t-r36.3-avt/main/drivers/gpu/nvgpu/hal/sync/syncpt_cmdbuf_gv11b.c#L32>
12:02 avhe[d]: (i've been porting my code to my newly arrived orin nano and hit some illegal method error)
16:46 mmu_kavya: I've recently got my hands on a beautiful 17 inch macbook with the NV84 GPU and a lot of free time, if I was to try implementing reclocking where would I begin? How can I determine the proprietary blob's behaviour
16:48 tiredchiku[d]: tesla..
17:29 joseph69: Hello! I would like to ask about what is the chance that ION 2 H.264 will work one day on nouveau? Now its nvidia 340 only, which means older kernel too.
17:48 gfxstrand[d]: Well this is fun... https://gitlab.freedesktop.org/drm/nouveau/-/issues/434
17:51 mhenning[d]: gfxstrand[d]: huh. I don't see the same thing here on 6.15.3
17:56 gfxstrand[d]: Fedora?
17:57 gfxstrand[d]: I wonder if it's a firmware loading issue
17:58 joseph69: i extracted firmware using python script and put that into /usr/lib/firmware/nouveau and nouveau stopped complaining in kernel logs, but still no h.264 decode with mpv --hwdec
17:58 mhenning[d]: gfxstrand[d]: no, I'm on arch
18:12 mhenning[d]: gfxstrand[d]: maybe you're missing the 535 firmware, or the wrong one is in initramfs, or something?
18:32 gfxstrand[d]: I think I have 535 but I've hacked up my firmwares a bit
18:55 airlied[d]: gfxstrand[d]: full dmesg with nouveau.debug=debug maybe
18:55 airlied[d]: There was a suspend resume problem in 6.15 and the fixes are headed for stable, but not sure they made it yet
19:48 anarsoul: airlied: eventually I've got PM resume failure even with your patch on 6.16-rc: https://gist.github.com/anarsoul/bc3288225ebbb92eb3181e0f4cbcde88
19:51 airlied: that looks more traditional failure than the regression one, but you never seen that with 6.14?
19:52 airlied[d]: asdqueerfromeu[d]: radv landed some of the nv coop mat2 ext bits btw
19:53 airlied[d]: oops wrong @ sorry, karolherbst[d]
19:53 karolherbst[d]: ahh cool
19:53 karolherbst[d]: in your branch?
19:54 karolherbst[d]: I was wondering to look into it myself, but I'm happy with cleaning up stuff and make them ready to go upstream 🙃
19:54 airlied[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34793 just got merged
19:55 karolherbst[d]: ohh you meant for radv
19:55 karolherbst[d]: I thought for nvk
19:55 karolherbst[d]: ohh and you said "radv" I thought you were hacking on code 🙃
19:56 karolherbst[d]: yeah, I'm aware of that
19:56 karolherbst[d]: don't want to start _too_ many things at once, so mostly focusing on getting the things done that are already started
19:57 karolherbst[d]: anyway.. somebody still needs to review the main coop matrix MR
20:37 gfxstrand[d]: Ugh... VRAM maps are so fried pre-Turing
20:37 gfxstrand[d]: I thought it was just Kepler but mohamexiety[d] and I have found that it applies all the way to Pascal and probably Volta.
20:38 snowycoder[d]: Fried in what way? 0_o
20:38 gfxstrand[d]: Unclear
20:38 gfxstrand[d]: The data seems to show up, or at least some of it does
20:39 gfxstrand[d]: But it's swizzled around all funny
20:39 gfxstrand[d]: Best guess is that the GPU is trying to de-tile for us somewhat and just thrashing everything
20:39 gfxstrand[d]: On Maxwell+ it works well enough for linear memory that we can kinda use it
20:41 gfxstrand[d]: Or at least I thought it did
20:42 gfxstrand[d]: On Kepler it's so bad we just GART everything that needs to be mapped
20:44 gfxstrand[d]: But with what I'm seeing trying to hack on host_image_copy, IDK that I trust Maxwell, either.
20:45 gfxstrand[d]: Okay, enough Maxwell pain. Time to plug my Blackwell back in.
20:48 anarsoul: airlied: I never saw it with 6.14
20:49 anarsoul: I guess the blob doesn't support runtime PM on these cards for reason? :)
20:50 anarsoul: anyway, I guess I'll just blacklist nouveau for now, at least I get decent battery life with no driver loaded and that's what I care about
20:52 snowycoder[d]: Can I ask a quick question on NAK?
20:52 snowycoder[d]: I've seen that codegen tracks resource utilization for instruction latencies (e.g. imul unit, tex unit, ld-st...) but NAK only seems to track data-hazards, is this correct?
20:52 snowycoder[d]: Should I add a resource tracking system similar to codegen to avoit MUL to MUL or SFU to SFU hazards?
20:52 snowycoder[d]: (This is always related to Kepler instruction latencies)
21:14 gfxstrand[d]: Certainly could
21:16 mhenning[d]: snowycoder[d]: That's on my TODO list for later generations, not sure exactly how the hardware works
21:16 mhenning[d]: I'd probably skip it initially if I were you, can always go back and that to the model later
21:17 snowycoder[d]: If we skip it we could emit some wrong latencies and pay a hefty performance price
21:17 gfxstrand[d]: And it probably needs to be a separate pass anyway since it's likely going to involve rewriting imul to imad and stuff like that.
21:18 mhenning[d]: oh, yeah the most general version is one that can switch operations to different functional units
21:18 gfxstrand[d]: snowycoder[d]: Huh? Why would they be wrong. The latency callbacks have both the source and destination op.
21:18 mhenning[d]: snowycoder[d]: If it's a hefty performance price then we're paying it on every generation right now
21:20 snowycoder[d]: gfxstrand[d]: But they only check for register conflicts, if we do something like:
21:20 snowycoder[d]: mul r1, r2, r3
21:20 snowycoder[d]: mul r4, r5, r6
21:20 snowycoder[d]: These do not have any data-hazard but they both use the imul unit
21:20 mhenning[d]: yeah, and in that case ^ the hardware stalls for us
21:21 mhenning[d]: so it's correct everywhere but we're paying for stalls in some cases
21:21 mhenning[d]: I'm not convinced it's a hefty price but we are paying that price right now
21:22 gfxstrand[d]: Yeah, I would expect pipelining is good enough it's not a big deal most of the time.
21:22 gfxstrand[d]: Like maybe a 1 cycle stall to be able to shove another instruction in the pipe
21:22 snowycoder[d]: Huh, even on later generations? That's convenient
21:22 gfxstrand[d]: Yeah, it's pipelined. It doesn't wait for the first to be done before starting the second.
21:23 gfxstrand[d]: If it did, that would suuuuuuck
21:23 gfxstrand[d]: But there is an issue rate that's not infinite so there may be a small stall
21:24 snowycoder[d]: Should I create a gitlab issue to keep track of that?
21:25 gfxstrand[d]: <a:shrug_anim:1096500513106841673>
21:25 gfxstrand[d]: Maybe? It's the whole dual-issue thing, I think.
21:25 gfxstrand[d]: Which is really annoying to get right and potentially even affects RA. :blobcatnotlikethis:
21:29 mhenning[d]: I think it's worse than every other cycle. I think some of the functional units can only take an input every ~4 cycles or something? I don't really know how the hardware works.
21:30 mhenning[d]: gfxstrand[d]: Dual-issue is really niche on nvidia, I think it's basically kepler only
21:30 gfxstrand[d]: I wonder if the answers are in Dave's magic docs. 🙂
21:30 mhenning[d]: I'd prefer not to worry about dual issue too much
21:30 mhenning[d]: gfxstrand[d]: Yeah, I'm not sure if the tables have that info or not
21:31 snowycoder[d]: mhenning[d]: On Kepler it's just ~3 cycles for most units expect for TEX that is 17?? "`// TEX to non-TEX delay 17 (0x11)`"
21:33 mhenning[d]: mhh "TEX to non-TEX" is strange, not sure why there would be a delay across functional units if they don't touch the same registers
21:33 mhenning[d]: but also take the model in codegen with a bit of a grain of salt. It's all reverse engineered and I think parts of it are guesses
21:34 snowycoder[d]: I'm still amazed that they could reverse engineer something so strange
21:35 mhenning[d]: yeah, it's remarkable how much they did get working
21:50 airlied[d]: the tables have all the units for each instr
21:52 airlied[d]: so ampere I can see alu, fma, fp16, ipa, lsu, cbu, xu64, mma etc
21:57 mhenning[d]: airlied[d]: right. does it say how many cycles you need to wait between two alu ops?
21:57 mhenning[d]: that is, what's the reciprocal throughput of each functional unit?
22:10 gfxstrand[d]: Ugh... I once again have no clue how nouveau.ko works. 😩
22:14 gfxstrand[d]: I wonder what `NV_MMU_PTE_KIND_PITCH_NO_SWIZZLE` is
22:15 gfxstrand[d]: mohamexiety[d]: ^^ Maybe something to play with?
22:30 gfxstrand[d]: Okay, I'm done for today. Tomorrow I review Blackwell.
22:37 gfxstrand[d]: Maxwell memory maps are a black hole I can't afford to get lost in, as much as they may bug me.
22:41 mohamexiety[d]: gfxstrand[d]: Will look into it :thonk:
22:41 mohamexiety[d]: But where is this from? I guess kernel playing with PTE kinds?
22:50 gfxstrand[d]: Really, someone needs to dig into the mapping code and figure out where the kinds are getting plumbed into maps and figure out how to tell the Kernel to knock it off.
22:53 skeggsb9778[d]: the kernel won't touch whatever kind you pass in (unless you're the GL driver, pass a compressed kind, and comptags can't be allocated - in which case it'll translate to the equivalent uncompressed kind)
22:58 mohamexiety[d]: skeggsb9778[d]: We’re seeing really really weird behavior where the memory is being altered
23:01 mohamexiety[d]: Like there’s this simple RE tool we have that allocates an image, fills it up with 512B worth of data making sure to fill only the first GOB and then copies that image to a buffer and inspects it. The goal is to review the swizzling layout and retrieve it… this works fine on Turing+. But on Maxwell+ the data gets really mangled.
23:01 mohamexiety[d]: Similarly image layout of anything in VRAM gets weirdly distorted, and we only get proper/correct layouts if we force GART
23:02 mohamexiety[d]: To give you an idea this is how our test looks like with VRAM on Maxwell+
23:02 mohamexiety[d]: GV100:
23:02 mohamexiety[d]: crucible: info : re.nvidia-gob.2d.q0: gob_extent_B = { 32, 16, 1 }
23:02 mohamexiety[d]: crucible: info : re.nvidia-gob.2d.q0: GOB byte ordering for slice 0:
23:02 mohamexiety[d]: 020 021 022 023 024 025 026 027 028 029 02a 02b 02c 02d 02e 02f 040 041 042 043 044 045 046 047 048 049 04a 04b 04c 04d 04e 04f
23:02 mohamexiety[d]: 030 031 032 033 034 035 036 037 038 039 03a 03b 03c 03d 03e 03f 050 051 052 053 054 055 056 057 058 059 05a 05b 05c 05d 05e 05f
23:02 mohamexiety[d]: 000 001 002 003 004 005 006 007 008 009 00a 00b 00c 00d 00e 00f 060 061 062 063 064 065 066 067 068 069 06a 06b 06c 06d 06e 06f
23:02 mohamexiety[d]: 010 011 012 013 014 015 016 017 018 019 01a 01b 01c 01d 01e 01f 070 071 072 073 074 075 076 077 078 079 07a 07b 07c 07d 07e 07f
23:02 mohamexiety[d]: 0e0 0e1 0e2 0e3 0e4 0e5 0e6 0e7 0e8 0e9 0ea 0eb 0ec 0ed 0ee 0ef 080 081 082 083 084 085 086 087 088 089 08a 08b 08c 08d 08e 08f
23:02 mohamexiety[d]: 0f0 0f1 0f2 0f3 0f4 0f5 0f6 0f7 0f8 0f9 0fa 0fb 0fc 0fd 0fe 0ff 090 091 092 093 094 095 096 097 098 099 09a 09b 09c 09d 09e 09f
23:02 mohamexiety[d]: 0c0 0c1 0c2 0c3 0c4 0c5 0c6 0c7 0c8 0c9 0ca 0cb 0cc 0cd 0ce 0cf 0a0 0a1 0a2 0a3 0a4 0a5 0a6 0a7 0a8 0a9 0aa 0ab 0ac 0ad 0ae 0af
23:02 mohamexiety[d]: 0d0 0d1 0d2 0d3 0d4 0d5 0d6 0d7 0d8 0d9 0da 0db 0dc 0dd 0de 0df 0b0 0b1 0b2 0b3 0b4 0b5 0b6 0b7 0b8 0b9 0ba 0bb 0bc 0bd 0be 0bf
23:02 mohamexiety[d]: 120 121 122 123 124 125 126 127 128 129 12a 12b 12c 12d 12e 12f 140 141 142 143 144 145 146 147 148 149 14a 14b 14c 14d 14e 14f
23:02 mohamexiety[d]: 130 131 132 133 134 135 136 137 138 139 13a 13b 13c 13d 13e 13f 150 151 152 153 154 155 156 157 158 159 15a 15b 15c 15d 15e 15f
23:02 mohamexiety[d]: 100 101 102 103 104 105 106 107 108 109 10a 10b 10c 10d 10e 10f 160 161 162 163 164 165 166 167 168 169 16a 16b 16c 16d 16e 16f
23:02 mohamexiety[d]: 110 111 112 113 114 115 116 117 118 119 11a 11b 11c 11d 11e 11f 170 171 172 173 174 175 176 177 178 179 17a 17b 17c 17d 17e 17f
23:02 mohamexiety[d]: 1e0 1e1 1e2 1e3 1e4 1e5 1e6 1e7 1e8 1e9 1ea 1eb 1ec 1ed 1ee 1ef 180 181 182 183 184 185 186 187 188 189 18a 18b 18c 18d 18e 18f
23:02 mohamexiety[d]: 1f0 1f1 1f2 1f3 1f4 1f5 1f6 1f7 1f8 1f9 1fa 1fb 1fc 1fd 1fe 1ff 190 191 192 193 194 195 196 197 198 199 19a 19b 19c 19d 19e 19f
23:02 mohamexiety[d]: 1c0 1c1 1c2 1c3 1c4 1c5 1c6 1c7 1c8 1c9 1ca 1cb 1cc 1cd 1ce 1cf 1a0 1a1 1a2 1a3 1a4 1a5 1a6 1a7 1a8 1a9 1aa 1ab 1ac 1ad 1ae 1af
23:02 mohamexiety[d]: 1d0 1d1 1d2 1d3 1d4 1d5 1d6 1d7 1d8 1d9 1da 1db 1dc 1dd 1de 1df 1b0 1b1 1b2 1b3 1b4 1b5 1b6 1b7 1b8 1b9 1ba 1bb 1bc 1bd 1be 1bf
23:03 mohamexiety[d]: And this is what it looks like on GART
23:03 mohamexiety[d]: crucible: info : re.nvidia-gob.2d.q0: gob_extent_B = { 64, 8, 1 }
23:03 mohamexiety[d]: crucible: info : re.nvidia-gob.2d.q0: GOB byte ordering for slice 0:
23:03 mohamexiety[d]: 020 021 022 023 024 025 026 027 028 029 02a 02b 02c 02d 02e 02f 040 041 042 043 044 045 046 047 048 049 04a 04b 04c 04d 04e 04f 120 121 122 123 124 125 126 127 128 129 12a 12b 12c 12d 12e 12f 140 141 142 143 144 145 146 147 148 149 14a 14b 14c 14d 14e 14f
23:03 mohamexiety[d]: 030 031 032 033 034 035 036 037 038 039 03a 03b 03c 03d 03e 03f 050 051 052 053 054 055 056 057 058 059 05a 05b 05c 05d 05e 05f 130 131 132 133 134 135 136 137 138 139 13a 13b 13c 13d 13e 13f 150 151 152 153 154 155 156 157 158 159 15a 15b 15c 15d 15e 15f
23:03 mohamexiety[d]: 000 001 002 003 004 005 006 007 008 009 00a 00b 00c 00d 00e 00f 060 061 062 063 064 065 066 067 068 069 06a 06b 06c 06d 06e 06f 100 101 102 103 104 105 106 107 108 109 10a 10b 10c 10d 10e 10f 160 161 162 163 164 165 166 167 168 169 16a 16b 16c 16d 16e 16f
23:03 mohamexiety[d]: 010 011 012 013 014 015 016 017 018 019 01a 01b 01c 01d 01e 01f 070 071 072 073 074 075 076 077 078 079 07a 07b 07c 07d 07e 07f 110 111 112 113 114 115 116 117 118 119 11a 11b 11c 11d 11e 11f 170 171 172 173 174 175 176 177 178 179 17a 17b 17c 17d 17e 17f
23:03 mohamexiety[d]: 0e0 0e1 0e2 0e3 0e4 0e5 0e6 0e7 0e8 0e9 0ea 0eb 0ec 0ed 0ee 0ef 080 081 082 083 084 085 086 087 088 089 08a 08b 08c 08d 08e 08f 1e0 1e1 1e2 1e3 1e4 1e5 1e6 1e7 1e8 1e9 1ea 1eb 1ec 1ed 1ee 1ef 180 181 182 183 184 185 186 187 188 189 18a 18b 18c 18d 18e 18f
23:03 mohamexiety[d]: 0f0 0f1 0f2 0f3 0f4 0f5 0f6 0f7 0f8 0f9 0fa 0fb 0fc 0fd 0fe 0ff 090 091 092 093 094 095 096 097 098 099 09a 09b 09c 09d 09e 09f 1f0 1f1 1f2 1f3 1f4 1f5 1f6 1f7 1f8 1f9 1fa 1fb 1fc 1fd 1fe 1ff 190 191 192 193 194 195 196 197 198 199 19a 19b 19c 19d 19e 19f
23:03 mohamexiety[d]: 0c0 0c1 0c2 0c3 0c4 0c5 0c6 0c7 0c8 0c9 0ca 0cb 0cc 0cd 0ce 0cf 0a0 0a1 0a2 0a3 0a4 0a5 0a6 0a7 0a8 0a9 0aa 0ab 0ac 0ad 0ae 0af 1c0 1c1 1c2 1c3 1c4 1c5 1c6 1c7 1c8 1c9 1ca 1cb 1cc 1cd 1ce 1cf 1a0 1a1 1a2 1a3 1a4 1a5 1a6 1a7 1a8 1a9 1aa 1ab 1ac 1ad 1ae 1af
23:03 mohamexiety[d]: 0d0 0d1 0d2 0d3 0d4 0d5 0d6 0d7 0d8 0d9 0da 0db 0dc 0dd 0de 0df 0b0 0b1 0b2 0b3 0b4 0b5 0b6 0b7 0b8 0b9 0ba 0bb 0bc 0bd 0be 0bf 1d0 1d1 1d2 1d3 1d4 1d5 1d6 1d7 1d8 1d9 1da 1db 1dc 1dd 1de 1df 1b0 1b1 1b2 1b3 1b4 1b5 1b6 1b7 1b8 1b9 1ba 1bb 1bc 1bd 1be 1bf
23:05 mohamexiety[d]: mohamexiety[d]: And this distortion only preserves the data if we increase the height of the image. If we don’t, the data outright gets destroyed because for some reason we start getting 32x8 256B GOBs
23:05 mohamexiety[d]: The other 32x8 is full of zeroes
23:12 mohamexiety[d]: This is what it looks like when we increase width but not height
23:12 mohamexiety[d]: crucible: info : re.nvidia-gob.2d.2bpp.q0: gob_extent_B = { 96, 8, 1 }
23:12 mohamexiety[d]: crucible: info : re.nvidia-gob.2d.2bpp.q0: GOB byte ordering for slice 0:
23:12 mohamexiety[d]: 020 021 022 023 024 025 026 027 028 029 02a 02b 02c 02d 02e 02f 040 041 042 043 044 045 046 047 048 049 04a 04b 04c 04d 04e 04f 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 120 121 122 123 124 125 126 127 128 129 12a 12b 12c 12d 12e 12f 140 141 142 143 144 145 146 147 148 149 14a 14b 14c 14d 14e 14f
23:12 mohamexiety[d]: 030 031 032 033 034 035 036 037 038 039 03a 03b 03c 03d 03e 03f 050 051 052 053 054 055 056 057 058 059 05a 05b 05c 05d 05e 05f 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 130 131 132 133 134 135 136 137 138 139 13a 13b 13c 13d 13e 13f 150 151 152 153 154 155 156 157 158 159 15a 15b 15c 15d 15e 15f
23:12 mohamexiety[d]: 000 001 002 003 004 005 006 007 008 009 00a 00b 00c 00d 00e 00f 060 061 062 063 064 065 066 067 068 069 06a 06b 06c 06d 06e 06f 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 100 101 102 103 104 105 106 107 108 109 10a 10b 10c 10d 10e 10f 160 161 162 163 164 165 166 167 168 169 16a 16b 16c 16d 16e 16f
23:12 mohamexiety[d]: 010 011 012 013 014 015 016 017 018 019 01a 01b 01c 01d 01e 01f 070 071 072 073 074 075 076 077 078 079 07a 07b 07c 07d 07e 07f 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 110 111 112 113 114 115 116 117 118 119 11a 11b 11c 11d 11e 11f 170 171 172 173 174 175 176 177 178 179 17a 17b 17c 17d 17e 17f
23:12 mohamexiety[d]: 0e0 0e1 0e2 0e3 0e4 0e5 0e6 0e7 0e8 0e9 0ea 0eb 0ec 0ed 0ee 0ef 080 081 082 083 084 085 086 087 088 089 08a 08b 08c 08d 08e 08f 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 1e0 1e1 1e2 1e3 1e4 1e5 1e6 1e7 1e8 1e9 1ea 1eb 1ec 1ed 1ee 1ef 180 181 182 183 184 185 186 187 188 189 18a 18b 18c 18d 18e 18f
23:12 mohamexiety[d]: 0f0 0f1 0f2 0f3 0f4 0f5 0f6 0f7 0f8 0f9 0fa 0fb 0fc 0fd 0fe 0ff 090 091 092 093 094 095 096 097 098 099 09a 09b 09c 09d 09e 09f 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 1f0 1f1 1f2 1f3 1f4 1f5 1f6 1f7 1f8 1f9 1fa 1fb 1fc 1fd 1fe 1ff 190 191 192 193 194 195 196 197 198 199 19a 19b 19c 19d 19e 19f
23:12 mohamexiety[d]: 0c0 0c1 0c2 0c3 0c4 0c5 0c6 0c7 0c8 0c9 0ca 0cb 0cc 0cd 0ce 0cf 0a0 0a1 0a2 0a3 0a4 0a5 0a6 0a7 0a8 0a9 0aa 0ab 0ac 0ad 0ae 0af 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 1c0 1c1 1c2 1c3 1c4 1c5 1c6 1c7 1c8 1c9 1ca 1cb 1cc 1cd 1ce 1cf 1a0 1a1 1a2 1a3 1a4 1a5 1a6 1a7 1a8 1a9 1aa 1ab 1ac 1ad 1ae 1af
23:12 mohamexiety[d]: 0d0 0d1 0d2 0d3 0d4 0d5 0d6 0d7 0d8 0d9 0da 0db 0dc 0dd 0de 0df 0b0 0b1 0b2 0b3 0b4 0b5 0b6 0b7 0b8 0b9 0ba 0bb 0bc 0bd 0be 0bf 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 000 001 1d0 1d1 1d2 1d3 1d4 1d5 1d6 1d7 1d8 1d9 1da 1db 1dc 1dd 1de 1df 1b0 1b1 1b2 1b3 1b4 1b5 1b6 1b7 1b8 1b9 1ba 1bb 1bc 1bd 1be 1bf
23:13 mohamexiety[d]: (The 001s are supposed to be 000s, it’s a reporting error)
23:33 skeggsb9778[d]: I'm not sure what's going on there. But the kernel shouldn't be modifying the kinds you pass in. Older GPUs did some additional reordering of large pages, but I don't think that's true since maxwell
23:34 skeggsb9778[d]: And you're likely not using large pages either