04:17 duttasankha: I was wondering if someone could help me with the usage of envydis for falcon...I have gone through the doc section and specifically I think I am having some problem with the map file....I was wondering if someone could provide me with some example of an example of a map file for falcon...I mean what I should provide for the map file option for falcon
07:05 pmoreau: karolherbst: OpenCL has the “-cl-denorms-are-zero”, but there is no equivalent in SPIR-V AFAICS. We can probably work around that though.
12:14 imirkin: duttasankha: a bunch of lines with "C 0x1234 somename"
12:15 imirkin: then instead of "call 0x1234" it'll show as "call somename"
12:16 pmoreau: imirkin_: Is it possible to get the total amount of VRAM on a card in Mesa (from Nouveau’s PoV)? I’m looking for a better value to return for CL_DEVICE_GLOBAL_MEM_SIZE, as we currently return PIPE_COMPUTE_CAP_MAX_GLOBAL_SIZE which is `1 << 40` regardless of the card.
12:21 karolherbst: pmoreau: I have a patch
12:21 karolherbst: it isn't perfect though: https://github.com/karolherbst/mesa/commit/978af951f10980dba9eafe223af6bb6f2bb8cd87
12:22 karolherbst: only works if you really have vram
12:22 pmoreau: Ah, you can get it from nouveau_device, nice.
12:22 karolherbst: but that should get you started if you want to do it correctly
12:22 imirkin: pmoreau: sure. i thought we did that.
12:23 imirkin: pmoreau: check 'glxinfo' -- vram is listed
12:23 pmoreau:doesn’t have Nouveau at hand
12:24 pmoreau: Ah yes, looking for vram_size does return some results in nvXX_screen.c
12:25 pmoreau: I should have tried grepping from “vram” rather than “ram”, way less noise in the output.
12:26 karolherbst: :)
12:26 karolherbst: no idea what to do with GPU without dedicate_ram though
12:26 karolherbst: also we usually don't want to report 100% of the available ram
12:26 pmoreau: Apparently robclark went with reporting sys_mem, which seems reasonable.
12:27 karolherbst: yeah...
12:27 karolherbst: but
12:27 karolherbst: no
12:27 karolherbst: we really should limit it
12:27 karolherbst: I think /2 in both cases (vram and sys mem) is reasonable
12:27 robclark:probably copied what i965 did..
12:28 karolherbst: robclark: I mean, it is correct, but for example nvidia limits it to 1/4 generally
12:28 karolherbst: you can't use 100% of the vram anyway
12:28 karolherbst: nor sys ram
12:28 robclark: tbh, if you had a device where you weren't running any desktop, just standalone compute thing, limiting it to 1/2 ram might not really be reasonable
12:28 pmoreau: karolherbst: Do they limit CL_DEVICE_MAX_MEM_ALLOC_SIZE to 1/4th, or CL_DEVICE_GLOBAL_MEM_SIZE?
12:28 pmoreau: (or both)
12:28 karolherbst: pmoreau: both
12:29 pmoreau: Okay
12:29 karolherbst: robclark: might be they check for running X or something
12:29 karolherbst: dunno
12:29 karolherbst: but I think application can just exceed that anyway
12:29 karolherbst: it is no hard limit, or is it?
12:29 pmoreau: (Well, CL_DEVICE_MAX_MEM_ALLOC_SIZE is based on CL_DEVICE_GLOBAL_MEM_SIZE, so limiting the latter will limit the former ;-))
12:29 robclark: idk
12:29 karolherbst: https://devtalk.nvidia.com/default/topic/992502/cuda-programming-and-performance/why-is-cl_device_max_mem_alloc_size-never-larger-than-25-of-cl_device_global_mem_size-only-on-nvidia-/
12:29 karolherbst: that might explain it
12:32 karolherbst: pmoreau: pocl reports 3/4 for <7gb total and -2 GB above
12:34 pmoreau: IIRC, NVIDIA exposes 100% (or at least 9x%) of VRAM in CUDA.
12:39 karolherbst: yeah, makes sense
12:39 karolherbst: I think 1/4 is a bit too much anyway
12:39 karolherbst: we can propably go with something like 1/2 or 3/4
12:42 pmoreau: (On a personal note: if Nouveau was ran well on new cards (and has proper compute support), I would patch it to expose the whole RAM: if I have a 12 GiB card, why should I be restrained to only using 9 GiB. I can make good use of those extra 3 GiB! Even more if we’re only exposing half.)
12:42 pmoreau: s/was ran/ran
12:44 karolherbst: it is difficult
12:44 karolherbst: you can't just use all of the vram anyway
12:44 karolherbst: even with no X running
12:44 karolherbst: you still have driver interna buffers and what not
12:45 karolherbst: in the worst case you run your kernel twice
12:46 karolherbst: and if running it twice on the same amount of data gives you a big perf penalty, then maybe you are doing something wrong in the first place
12:47 karolherbst: not saying that there are indeed some workloads which may require a lot of VRAM, but then you usually have a GPU with sufficient VRAM anyway
13:18 karolherbst: pmoreau: how can I dumpm the spirv if it fails at clover::llvm::compile_to_spirv?
13:18 karolherbst: or the cl kernel
13:19 karolherbst: ohh, got it already, just extracted it via gdb
13:22 karolherbst: mhh outside of clover it compiles fine
13:27 pmoreau: karolherbst: Hum :-/
13:28 karolherbst: https://gist.github.com/karolherbst/289df071d4b28aef767c31fe77c7f609
13:28 karolherbst: that's the kernel
13:28 pmoreau: What’s the error that you get?
13:29 karolherbst: assert actually
13:29 karolherbst: test_bruteforce: ../lib/SPIRV/libSPIRV/SPIRVModule.cpp:623: virtual SPIRV::SPIRVEntry* SPIRV::SPIRVModuleImpl::getEntry(SPIRV::SPIRVId) const: Assertion `Loc != IdEntryMap.end() && "Id is not in map"' failed.
13:29 pmoreau: Ah, okay
13:30 pmoreau: Didn’t you hit that one at some point in the past already? It rings a bell, though I don’t remember how we solved it.
13:30 karolherbst: yeah, no clue
13:30 karolherbst: maybe a fix in llvm-spirv
13:30 pmoreau: Could be
13:31 pmoreau: What’s the SPIR-V for that?
13:33 karolherbst: well this happens before it gets written
13:52 karolherbst: pmoreau: the llvm ir: https://gist.github.com/karolherbst/289df071d4b28aef767c31fe77c7f609#file-b-ll
13:53 pmoreau: karolherbst: The blob advertise the whole amount of VRAM in CL_DEVICE_GLOBAL_MEM_SIZE for me, and then 1/4th of that for CL_DEVICE_MAX_MEM_ALLOC_SIZE.
13:54 karolherbst: mhh, interesting
13:55 karolherbst: with clover vs cli: no-frame-pointer-elim=false/true no-frame-pointer-elim-non-leaf=false/true uniform-work-group-size=true/false
13:56 karolherbst: ohh
13:57 karolherbst: in clover we compile against CL 1.1
13:57 karolherbst: okay, but that's not the issue
13:58 pmoreau: You can force to 1.2 though, through the environment variables.
13:58 karolherbst: yeah, I doubt that changes anything though
13:59 karolherbst: only difference now is no-frame-pointer-elim and no-frame-pointer-elim-non-leaf
14:03 karolherbst: mhh
14:03 karolherbst: pmoreau: now the input is identical, the options as well, still different llvm ir result
14:05 pmoreau: That’s weird. :-/
14:06 pmoreau: I might have a look in a bit, doing some PRs for SPIRV-LLVM-Translator, and some extra changes to my branches before sending the updated version.
15:45 pendingchaos: imirkin: how does depth buffer compression work on GM20x+? does it depend on the sample locations?
15:45 imirkin_: no clue.
15:45 imirkin_: depth buffer compression is mostly a mystery to me
15:45 imirkin_: you fiddle some bits in the PTE's
15:45 imirkin_: and let the good times roll
15:46 imirkin_: there's also something called ZCULL which is kinda-sorta like HyperZ/S, but we know little about it
15:46 imirkin_: (and don't use it)
16:01 HdkR: imirkin_: It's all about having squishy depth of course ;)
16:02 imirkin_: :p
16:25 RSpliet: pmoreau: that's after choosing the pointer size. Isn't it possible to explicitly request a 32-bit ptr context for OpenCL even on a 64-bit machine, that upper bounds memory to like no more than 4GiB?
16:31 karolherbst: RSpliet: doesn't that only make sense if there would be any difference?
16:32 karolherbst: I don't think that it makes sense to just reduce the pointers to 32 bit on Fermi+ and I doubt there is any kind of 32 bit mode with any kind of benefit
16:34 karolherbst: RSpliet: and I am sure you can't change it anyway through the API
17:53 duttasankha: imirkin: thank you...so envydis would give us the disassembly from the machine that we would want...so let's say I want to disassemble falcon ISA...but I was wondering how can I get contents of falcon ISA to get it disassemble..is there a way to dump the falcon instruction cache contents and get it disassemble....my goal is write code for falcon and to pass it to falcon either using MMIOs /any other methods (which I don't know as well)....
17:56 duttasankha: I just want to mention that I am going through the falcon documentation in envy ... I am bit confused about writing the code ...I would really appreciate if I could get some pointers..
18:01 imirkin_: duttasankha: are you trying to write code or read code?
18:02 duttasankha: I want to do both....I way of understanding how to write the code is first by reading the code I guess...but essentially I want to do both...
18:04 imirkin_: so the point of map files
18:04 imirkin_: is if you're decoding a bunch of code
18:04 imirkin_: and it wasn't written by a bunch of retarded monkeys
18:04 imirkin_: then you can assume that calls are to functions
18:05 imirkin_: and these functions are used a bunch
18:05 imirkin_: then you can name the functions
18:05 imirkin_: to make reading of the disassembly simpler
18:05 imirkin_: envyas supports names and labels natively, so you can just stick them in
18:05 imirkin_: when writing code
18:13 duttasankha: imirkin: If I understand it correctly I would pass the coded form to the envydis through the map file and I would use -u option to set the label value to make the reading of the code easier ...
18:14 duttasankha: also waht do you mean by "and these functions are used a bunch"
18:15 imirkin_: like let's say the code has "call 0x1234" all over the place
18:15 imirkin_: and then you look at address 1234 and figure out what that function does
18:16 imirkin_: the map file would allow you to provide a name for 1234
18:16 imirkin_: also sometimes you have code and data mixed in together
18:16 imirkin_: so when decoding, envydis gets confused and mis-decodes instruction
18:16 imirkin_: (envydis = disasm. envyas = asm.)
18:17 duttasankha: oh okay...got it
18:20 duttasankha: but then again my question is how can I get the code for falcon to put it to map file....I don't know if this is could be done but is it possible to get the code running in the falcon currently in the format which I can put it to the map file?
18:21 duttasankha: so then dis would disassemble for me and so I could get better understanding
18:27 imirkin_: the map file is completely optional
18:27 imirkin_: you're on your own for getting the code
18:27 imirkin_: once you have it, you can use envydis to disassemble it
18:42 duttasankha: imirkin_: I saw that it is optional and I tried to run envydis without the map file or any other options (other than the machine name) but could get any output.... so I used something like this "./envydis -m falcon" ...but I didn't get any output ....I don't know if I am doing something wrong
18:42 imirkin_: can you make the input file available?
18:43 imirkin_: you probably want to at least specify the falcon variant (fuc3 or fuc5 most likely)
18:48 duttasankha: imirkin_: now I run it like this " ./envydis -m falcon -V fuc4 -F fuc3op" which I have done previously as well...and still no output
18:58 imirkin_: can you make the input file available?
19:03 duttasankha: imirkin_: I am sorry I didn't see that...and I apologize if this is bothering you but I am not sure what to provide in the input file....
19:07 duttasankha: imirkin_: can you please tell me what should I provide inside the input file as nothing is mentioned in the documentation...
19:14 imirkin_: duttasankha: you said you were trying to use envydis
19:14 duttasankha: yes
19:14 imirkin_: it takes data and disassembles it
19:14 imirkin_: send me the data.
19:22 duttasankha: imirkin_: okay...but I don't have any falcon data ...
19:23 imirkin_: so what are you trying to disassemble?
19:23 imirkin_: it's like running "gcc" without input files.
19:23 imirkin_: expecting it to produce the program that you wanted to write is a bit ... ambitious
19:24 duttasankha: I thought that map file was the data.....
19:24 duttasankha: but anyways I got it
19:24 imirkin_: map file is a nice little extra
19:24 imirkin_: for decoding the instruction stream
19:24 imirkin_: but you gotta feed it an instruction stream
19:25 duttasankha: yeah yeah...that was stupid of me....
19:38 duttasankha: imirkin_: so I can poke the MMIOs to get the falcon data and provide it to the envydis right???
19:38 imirkin_: mmmm ... not sure how you'd poke the mmio's
19:38 imirkin_: but envydis just takes a file.
19:38 imirkin_: however you get data into that file isn't envydis's concern.
19:41 duttasankha: so I have program to poke and get MMIO contents but can I provide those raw contents to envydis ?
19:41 duttasankha: I mean putting those contents to an input file and then to envydis
19:54 imirkin_: envydis -i the-file
19:54 imirkin_: i think
19:58 duttasankha: imirkin_: I am clear about the input file thing...I want to know about the falcon data that would go inside it..more specifically how can I get that falcon data....is there any example data for falcon that I can use....
20:00 imirkin_: sure...
20:00 duttasankha: imirkin_: can you please provide me that data?
20:01 imirkin_: https://raw.githubusercontent.com/skeggsb/nouveau/master/drm/nouveau/nvkm/engine/gr/fuc/gpcgk208.fuc5.h
20:01 imirkin_: might be a slight pain to feed that into envydis actually... not sure
20:01 imirkin_: it will take hex words with -w, but you have to get rid of the /* ... */ stuff
20:02 imirkin_: nuke the data array, that'll just confuse things
20:02 duttasankha: oh this is so nice....thanks so much......u made my day.....
20:02 imirkin_: the source is here: https://github.com/skeggsb/nouveau/tree/master/drm/nouveau/nvkm/engine/gr/fuc
20:02 imirkin_: there used to be a makefile for building with envyas, but i dunno where it went
20:04 duttasankha: Oh I have seen this .....I didn't know the purpose then.....I will do the rest of the stuffs...Thanks a ton....
20:05 imirkin_: i also have a tool to extract blob firmware...
20:05 imirkin_: https://github.com/envytools/firmware
20:08 duttasankha: I was trying to use the nvagetbios....and then shove the output to nvbios
20:09 imirkin_: preferred method is to use /sys/kernel/debug/dri/0/vbios.rom
20:09 imirkin_: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/nvidia/gk20a
20:10 imirkin_: the *_inst.bin files should be fuc5
20:10 imirkin_: (or fuc3. i forget.)
20:10 duttasankha: I got the output but I think there is something wrong in there....I have seen your tool as well and I would use it now ....okay...I would do that...do you know anyone in here working on writing falcon code for the bios?
20:11 duttasankha: I mean in the community?
20:11 imirkin_: er ... huh?
20:11 imirkin_: "falcon code for the bios"?
20:11 imirkin_: like for a GOP EFI driver or something?
20:11 imirkin_: actually that shouldn't need it either...
20:12 duttasankha: sorry on falcon in general...I mean writing
20:12 duttasankha: falcon code
20:13 duttasankha: I would like to collaborate if possible
20:14 duttasankha: Actually I was seeing some IRC chat and I saw someone (may be karolherbsdt) is trying to work on falcon...I am not sure
20:15 duttasankha: that's why I am asking if you know and if it is possible to collaborate?
20:16 duttasankha: But thanks so much for all the information you provided .....let me work on the current information I have ....
22:30 duttasankha: imirkin_: So I fed the code section of the falcon file but most of the instruction is coming as unknown instruction....I think it is because of the source file format.....do you know why this might happen?
22:30 imirkin_: PEBKAC
22:31 imirkin_: unfortunately without more info, hard for me to tell
22:53 RSpliet: duttasankha: one thing to bear in mind is that there's multiple versions of falcon (and... there might be different flavours too for vdec vs. pdaemon/fecs/gpccs). Bisect the command line options for envydis to find how they work (soz, don't have that info available in the form of docs for you)
23:19 duttasankha: RSpliet: Thanks for the pointer