00:53 naptastic: Thank you everyone for your help! With the newer kernel, firmware, and Mesa, my graphics are smooth and fast again, and my CPU usage is way down!
00:53 naptastic: Now if I could just make V8 and Webkit suck less! XD
05:26 imirkin: skeggsb: does this seem right? https://bugs.freedesktop.org/attachment.cgi?id=123447
05:27 skeggsb: hm yes.. how did i miss that :/
05:31 imirkin: patch sent... will wait for the tester to try it out, if he can
06:35 mgoodwin: force edid for all displays?
10:32 karolherbst: mupuf: "u32 input_clk = 27e6;" is this even legal?
10:32 mlankhorst: i think it might be
10:33 mupuf: karolherbst: you mean that it would have to be a float?
10:33 karolherbst: no
10:33 karolherbst: because this is super implicit
10:33 karolherbst: think about this:
10:33 karolherbst: u32 input_clk = 27e6;
10:33 mlankhorst: 4004fa: c7 45 fc c0 fc 9b 01 movl $0x19bfcc0,-0x4(%rbp)
10:33 mupuf: in what way?
10:33 mlankhorst: looks valid
10:33 karolherbst: u32 freq = 100;
10:33 karolherbst: so
10:33 karolherbst: what is freq
10:33 mupuf: pwm_freq
10:33 karolherbst: 0x100 or 0x64?
10:34 mlankhorst: 27e6 means 27000000
10:34 mupuf: 0x64
10:34 karolherbst: ohhhhhh
10:34 karolherbst: of course
10:34 karolherbst: I amstupid
10:34 mupuf: ok, you got me worried for an instant
10:34 karolherbst: I thought it is 0x27e6 :D
10:34 mupuf: lol
10:35 karolherbst: mhh
10:35 karolherbst: still
10:35 karolherbst: 27e6 is super confusing
10:35 karolherbst: neversaw this in any source code up until today :)
10:35 mupuf: is it? at least it shows I did not forget any 0
10:35 mupuf: well, /me does not really care
10:36 karolherbst: right
10:36 mlankhorst: confused me for a sec, but then remembered floats
10:36 karolherbst: it is fine, but it is a bit implicit
10:36 mupuf: hmm, if it confused people, either I need to add a comment ... or I need to add 6 zeroes
10:37 mlankhorst: nah keep people on the edge
10:38 mupuf: lol
10:39 mlankhorst: compiler knows what you mean!
10:58 karolherbst: mupuf: we could expose dmesg via morse code through the LED in case of a frozen display :)
10:59 karolherbst: or maybe some error codes
11:01 karolherbst: but seriously, maybe an error code could be nice
11:01 karolherbst: mhh, but the use is rather limited :/
11:08 karolherbst: mupuf: or instead of 27e6 you use the stored crystal clock value
11:09 karolherbst: it is somewhere
11:09 karolherbst: device->crystal I think
11:10 karolherbst: nvkm_device.crystal
11:59 mupuf: karolherbst: yes, I wanted to use it, but nvidia does not seem to use the actual clock speed
11:59 karolherbst: odd
11:59 mupuf: just like for PWM on maxwell
12:00 karolherbst: mhh
12:00 mupuf: it uses 27 MHz instead of 27.xyz MHz
12:00 karolherbst: maybe the crystal changed and we never noticed
12:00 mupuf: nope, I checked this
12:01 karolherbst: mhh odd
12:01 karolherbst: but right, nvatiming usually shows the same clocks
12:01 mupuf: ;)
12:01 karolherbst: still odd
12:01 karolherbst: maybe they have a PWM crystal?
12:02 mupuf: yep
12:02 mupuf: nope, that would cost money for what reaosn?
12:02 mupuf: no, they just hardcoded the valye
12:02 mupuf: value*
12:02 mupuf: because, you know, they DO NOT CARE
12:02 mupuf: as in, OMG, my LED runs at 100.1 Hz instead of 100!
12:02 karolherbst: terrible
12:03 karolherbst: I am sure some hardcore case modder already complaint
12:03 karolherbst: *complained
12:03 mupuf: AHAH
12:03 mupuf: yeah, they want to overclock the LED too :p
12:03 karolherbst: no, that's not the issue
12:03 karolherbst: but they fancy lights effect have to be on time
12:03 karolherbst: with their other LEDs in the case
12:04 karolherbst: like LEDs on spinning fangs showing whatever
12:05 mupuf: ah ah. right
12:05 mupuf: so far, no complaints about this patch, good :D
12:06 mupuf: as in, no user saying: Why don't you work on fixing my bugs, instead of playing!?
12:06 karolherbst: ohh I think you missed those
12:06 karolherbst: it was subtle but some kind of made fun of it :D
12:07 karolherbst: at least here in IRC :p
12:08 mupuf: you mean Tom^? He does not count :D He is a well-informed user
12:08 mupuf: and tester
12:09 karolherbst: "[20:45] <Yoshimo> interesting priorities you have there" :p
12:09 karolherbst: well and Tom benefits from your patch
12:09 karolherbst: so he has no right to complain anyway
12:10 mwk: hey, does anyone remember what happened to shinpei's LLVM for Falcon?
12:10 mwk: I'm kind of considering resurrecting that idea
12:10 mupuf: mwk: I still have the code
12:11 karolherbst: I also think it would make sense
12:11 karolherbst: well maybe not using llvm, but having a real compiler
12:11 mupuf: karolherbst: what else then?
12:11 karolherbst: so we don't have to care about those stupid registers
12:11 mwk: I've done a lot of work on LLVM lately and I'm just itching to make a backend
12:11 karolherbst: mupuf: just something better than the current stuff
12:11 mwk: karolherbst: llvm gives you a lot for a modest price
12:11 karolherbst: if we have constraints, the asembler, compiler should error if we violate them uncontrolled
12:11 mupuf: mwk: this could come in time for us to rewrite pdaemon with it :)
12:12 karolherbst: well I don't care if this would be done in llvm or in gcc or we hack something up ourself in the end
12:12 mwk: for a start, we don't need a separate assembler with llvm
12:12 karolherbst: if the llvm stuff is maintainable, then this is good enough
12:13 mwk: envyas is kind of horrible
12:13 karolherbst: it is
12:13 karolherbst: I alsway run into the dumbest issues
12:14 mwk: the annoying thing is that, for a new backend, you have to fork llvm+clang
12:14 karolherbst: why not making it upstream?
12:14 karolherbst: mhh
12:14 mwk: well, if we can get it accepted, I'm all for it
12:14 karolherbst: well maybe the use is kind of limited
12:14 mupuf: mwk: so, shinpei deleted his repo
12:14 mupuf: but I have a local copy
12:15 mwk: hmm
12:15 karolherbst: mwk: but if we have a llvm backend, we automatically get the c++/C frontend stuff?
12:15 mwk: is there a license on it?
12:15 mupuf: Thu Jun 2 14:37:57 2011 -0400
12:15 mwk: karolherbst: not fully automatically, but with only a minor tweaks to clang, yes
12:15 karolherbst: right
12:15 karolherbst: mwk: how do we handle this process stuff?
12:16 mwk: what process stuff?
12:16 karolherbst: mwk: seperate archive files with a process tag and the final assemble stage merges those together?
12:16 mwk: uh?
12:16 mwk: are you asking about a linker?
12:16 mwk: that's a nontrivial issue
12:17 mupuf: mwk: well, we can have only one file, this is fine
12:17 karolherbst: mwk: https://github.com/karolherbst/nouveau/blob/master_4.5/drm/nouveau/nvkm/subdev/pmu/fuc/macros.fuc#L163
12:17 karolherbst: this part
12:17 mwk: llvm has its very own linker, lld
12:17 mwk: I think it should be reasonably easy to port
12:18 karolherbst: every "prcoess" calls this macro like process(PROC_PERF, #perf_init, #perf_recv)
12:18 mwk: we'd get ELF .o files and ELF executable output
12:18 mwk: now... this is the most annoying part... we'd have to make some sort of objcopy tool that'd hammer it into a flat binary
12:18 mupuf: mwk: this is already done by shinpei
12:19 mupuf: so, can't find any license
12:19 mwk: ah, awesome
12:19 mwk: hm, not awesome
12:19 karolherbst: mwk: ahh this is a section thing
12:19 karolherbst: mwk: like .section #gf119_pmu_data and then you get all those process stuff"
12:19 mwk: karolherbst: I imagine you'd just do it in C...
12:19 karolherbst: :/
12:19 karolherbst: kind of ugly though
12:19 mupuf: mwk: well, do you want me to send you the source code anyway?
12:20 mupuf: because it did work at some point at least
12:20 mwk: struct process myprocess = { ...} __attribute__((section("gf119_pmu_data")));
12:20 mupuf: and it was really ugly, output-wise
12:20 mwk: which is equivalent
12:20 karolherbst: mwk: I would rather do it through the compiler with -xProcess=some_value
12:20 karolherbst: or at linking stage
12:20 mupuf: karolherbst: come on, this is a no-issue IMO
12:21 mwk: karolherbst: what I gave about is a direct translation
12:21 karolherbst: mwk: this is used for some really trivial IPC stuff
12:21 mwk: you could also go C++ and use global constructors
12:21 karolherbst: mupuf: yeah I know, but I think this is something which is far to much off the usual stuff you do in C :/
12:21 mwk: but C++ has its own problems
12:21 mupuf: karolherbst: we will adapt to writing this in C
12:21 mupuf: we will need to rewrite the entire damn thing anyway because of nvidia
12:21 mupuf: so...
12:21 karolherbst: yeah, but the thing is, in the falcon binaries, those perf decleration are in the same space
12:22 mwk: mupuf: send it in, why not
12:22 karolherbst: so if we split the processes like we do today
12:22 karolherbst: the linker has to merge those stuff into one area
12:22 mwk: karolherbst: so what? __attribute__((section)) was made exactly for that
12:24 karolherbst: mwk: so every process.c file needs three of those __attribute__((section)) thinks for the process_id, init and fini functions
12:24 karolherbst: mhh
12:24 mwk: karolherbst: C has a preprocessor
12:24 karolherbst: maybe we could enforce it through an ABI
12:25 karolherbst: and the dev just declares a __init and a __fini function, and a field for the process_id
12:25 mwk: anyhow, the biggest annoyance will be dealing with the final output
12:26 karolherbst: right
12:26 mwk: objcopy is fine if it fits in the code segment, trouble starts when we need paging
12:27 karolherbst: how do we do the IPC stuff? __builtin functions?
12:28 mupuf:has sent shinpei's fucc to mwk
12:28 mupuf: karolherbst: boost's signal? :D
12:28 mwk: karolherbst: normal functions...
12:28 mwk: I don't know what's so special about IPC, really
12:28 mupuf: or just, you know, functoin pointers
12:28 mwk: asm() is a possibility if really needed
12:28 mupuf: virtual functions if we want to go the c++ way
12:28 karolherbst: yeah, but we shouldn't have to use asm
12:28 mupuf: of course not
12:28 karolherbst: and we shouldn't duplicate stuff for every falcon
12:29 karolherbst: which we do
12:29 karolherbst: currently
12:29 mwk: karolherbst: calm down
12:29 mwk: there's absolutely nothing that we can do now that we won't be able to do in C
12:29 karolherbst: yeah I know
12:29 mupuf: and calm down again, because there is LLVM backend yet either
12:30 mupuf: there is no
12:30 mwk: and we're not too good for asm
12:30 mwk: of course we'll have __builtin_iord etc.
12:30 mupuf: that was your modifications to clang you were talking about?
12:30 mupuf: adding fuc-related intrinsics?
12:30 mwk: but context switching will go through asm
12:30 mwk: mupuf: not just that, you also have to describe the ABI to clang
12:31 mwk: sizeof(int) etc.
12:31 mupuf: ah, right, that too
12:31 mwk: and crazy shit like va_arg
12:31 mupuf: mwk: what context switching? Do we want to have full preemption? :o
12:31 mwk: that's a fun one... va_start, va_end, va_copy are implemented by llvm, va_arg is implemented by clang
12:31 mupuf: or you are talking about handling IRQs?
12:31 mwk: mupuf: uh, don't you? what's this IPC doing?
12:31 karolherbst: mwk: calling find: https://github.com/karolherbst/nouveau/blob/master_4.5/drm/nouveau/nvkm/subdev/pmu/fuc/kernel.fuc#L425
12:32 mupuf: the current IPC design is just calling a function
12:32 karolherbst: and then bra $p1 #send_proc
12:32 mupuf: on the sender, you post messages
12:32 mwk: ah, there's no context switching?
12:32 mwk: fine, we may get away with only a single asm after all :)
12:32 mupuf: then when the sender is done, the process 0 keeps running
12:32 karolherbst: mupuf: not really, it is a bit more
12:32 mwk: the one to initialize stack pointer and call main
12:32 mupuf: and and will post the messages
12:32 mupuf: karolherbst: like what?
12:33 karolherbst: mupuf: did you look into send_proc: ?
12:33 mupuf:fixed it
12:33 mupuf: but I may have forgotten
12:33 mupuf: yep, it does what I said
12:33 mwk: anyhow
12:34 mwk: I'll be getting started on this in ... 2 weeks or so?
12:34 mupuf: sounds good!
12:34 RSpliet: gives us some time to contact shinpei? :-P
12:34 mupuf: have fun writing another fuc assembler :D
12:34 karolherbst: mupuf: ahh okay, didn't saw your newest message
12:34 karolherbst: s
12:34 mwk: and when we're done with it, let's do a gm107 LLVM backend and switch mesa to it!
12:35 karolherbst: :D
12:35 mwk:now runs away
12:35 mupuf: Ah ah
12:35 karolherbst: I am pretty sure there will be at least one against it ;)
12:35 mwk: though in all seriousness
12:35 RSpliet:throws a NIR bomb and seeks for shelter
12:35 mwk: I'm curious whether this has been attempted before and what the results were
12:36 mwk: it's also getting my fingers twitchy, I have to admit
12:36 karolherbst: RSpliet: well, that would make actually sense maybe
12:36 mwk: esp. since clang already supports CUDA and OpenCL
12:36 mwk: but Falcon sounds much more doable
12:36 karolherbst: mwk: I thought only as frontends?
12:37 mwk: karolherbst: of course, clang is a frontend
12:37 mupuf: mwk: pmoreau is writing a LLVM to NVC0 IR converter
12:37 karolherbst: yeah, but even if we have the nvc0 backend, that wouldn't help us with OpenCL or cuda afaik
12:37 mupuf: mwk: so, this is coming anyway
12:37 mwk: so if you throw a g80 or whatever backend at llvm, the theory goes, you could just use clang with it to get a g80 OpenCL compiler
12:37 pmoreau: mupuf: Nope, Hans is
12:37 mupuf: pmoreau: oh, right!
12:37 mupuf: sorry, what are you working on then?
12:38 karolherbst: spirv :p
12:38 RSpliet: pmoreau: I thought Hans was working on TGSI support for OpenCL...
12:38 pmoreau: Oh no sorry: Hans is working on LLVM -> TGSI, and I work on SPIR-V -> NV50 IR
12:38 pmoreau: So no one is working on LLVM -> NV50 IR
12:39 mupuf: lol
12:39 mupuf: what's the point to have both?
12:39 karolherbst: pmoreau: LLVM -> TGSI -> NV50 IR ?
12:39 mupuf: less cpu usage?
12:39 karolherbst: yeah well
12:39 RSpliet: karolherbst: SPIR-V -> whatever is useful for shipping "binary" OpenCL kernels in software
12:39 karolherbst: I know
12:40 mupuf: RSpliet: but there is a SPIR-V -> LLVM already written
12:40 RSpliet: whether NV50 IR is the right target or not is a different issue :-)
12:40 mupuf: so, what Hans is working on seems more useful :s
12:40 mupuf:may be missing out on some things
12:40 mupuf: anyway, back to work!
12:41 karolherbst: mupuf: well who says that we will have a tgsi based vulkan thing later ;)
12:41 pmoreau: mupuf: Vulkan (and now OpenCL) takes SPIR-V as input, so having SPIR-V -> NV50 IR avoids going SPIR-V -> LLVM -> TGSI -> NV50 IR
12:41 RSpliet: nah, there's just an exponential amount of paths to get to the final IR, and we seem to be building all of them :-P
12:41 pmoreau: mupuf: And historically, I started working on SPIR-V -> NV50 IR before Hans started his work
12:42 RSpliet: pmoreau: would SPIR-V -> TGSI (god forbid... :-D) help other driver vendors?
12:42 RSpliet: but a more serious question: are LLVM and SPIR-V that far apart?
12:43 pmoreau: Hum… I don’t remember. I think some drivers were hoping for such a translation, but some people were saying they should rather go SPIR-V -> NIR.
12:44 pmoreau: RSpliet and mupuf: You were both on the e-mail thread about that (sometime around last Aug/Sept), after we learned Hans was working on it, where we were trying to find out which paths would be better
12:45 pmoreau: IIRC
12:45 RSpliet: I know
12:45 RSpliet: problem is that there seem to be no two drivers taking the same approach atm :-P
12:45 pmoreau: They are: SPIR was based on LLVM, but that’s not the case for SPIR-V
12:46 pmoreau: (Ok, maybe not *that* far apart, I don’t know LLVM IR to judge, but from what I heard they are different.)
12:46 RSpliet: who are the users for TGSI, NIR and LLVM directly? respectively (Intel, Freedreno, VC4), (Nouveau + some software renderers), (AMD)?
12:47 hakzsam_: the only thing we know is that NV50 IR is the final target, that's a good start :p
12:47 pmoreau: hakzsam_: ;-)
12:47 RSpliet: hakzsam_: and the start points are GLSL, OpenCL C, DirectX SM, SPIR-V :-P
12:48 hakzsam_: right :)
12:49 karolherbst: SPIR-V -> NV50 IR sounds efficient
12:50 RSpliet: karolherbst: sounds on your definition of efficient
12:50 RSpliet: in terms of CPU cycles I reckon it's quite efficient for the nouveau case
12:50 karolherbst: yeah
12:50 loonycyborg: how is your progress with reclocking support for nvc0? Maybe I could help somehow?
12:50 RSpliet: in terms of code reusability not so much, if we can agree on a shared IR (be it NIR, TGSI or LLVM :-D)
12:50 karolherbst: but with a compiler cache it is unimportant anyway
12:50 loonycyborg: I have a GT 440
12:51 RSpliet: loonycyborg: stalled at this point in time... I thought I was fairly close to get my one card to get to the middle perflvl without much hardcoded faff, but not there yet
12:51 karolherbst: RSpliet: well what do you think how long it would take to use something NIR based? then we would use the same stuff intel/freedreno/vc4 does
12:52 pmoreau: I guess the shared IR should move to NIR from TGSI, but well.
12:52 RSpliet: and a combination of work pressure and personal reasons prevent me from working very actively on it atm
12:52 pmoreau: Give me a sec to read up the logs, because I have no context about why we came to talk about this :-D
12:52 RSpliet: pmoreau, karolherbst: you might want to ask robclark about that. He ported freedreno to NIR
12:53 karolherbst: I think porting to NIR is a good idea anyway
12:53 RSpliet: I don't know the specifics of NIR to judge
12:53 karolherbst: I heard that writing opts in nir on the glsl level is super easy
12:53 karolherbst: :D
12:54 RSpliet: sure, but as it stands we have quite a few things on the NV50 IR level already
12:54 karolherbst: yeah I know, that's why I said glsl level
12:54 karolherbst: we would just get a bit more optimized stuff already in the end
12:54 karolherbst: and would deal with more nvidia hardware related stuff
12:55 karolherbst: I think general stuff is a smaller problem if we get nir
12:55 karolherbst: RSpliet: like in TGSI we get empty if else endif branches
12:56 robclark: I think if nouveau didn't already have a fairly mature IR already then NIR would make a lot of sense..
12:57 RSpliet: robclark: imho the biggest motivation would be to benefit from front-end work done by the rest of the community
12:57 karolherbst: robclark: yeah, but maybe it is also more CPU efficient in the end, because NIR gives us better optimized starting points
12:57 karolherbst: and what RSpliet said
12:59 RSpliet: robclark: how much effort did you have to put in your NIR -> IR3 pass? Any difficulties? HW mismatches?
12:59 RSpliet: I guess you had to co-develop the NIR Vec4 support as well, didn't you?
13:00 robclark: no vec4.. that only applies to a2xx which isn't using NIR
13:00 robclark: a3xx and later are all scalar
13:01 robclark: most of the difficulties were ir3.. instruction scheduling is mandatory which makes some things complicated.. that an instantiating arrays in consecutive registers..
13:02 RSpliet: oh yes, no HW pipeline stalling... I guess nouveau wouldn't have to worry about that
13:02 robclark: RSpliet, karolherbst, keep in mind, currently you do opt/cleanup passes after translating into native IR.. which is better in the handful of cases where one nir instruction turns into multiple native instructions.. I still think one day that NIR needs to learn about hw specific alu instructions so we can have 1:1 between nir and hw instructions..
13:03 karolherbst: robclark: yeah I know
13:03 karolherbst: robclark: but we would get much better input already
13:03 RSpliet: robclark: and, forgive my lack of knowledge here, which front-ends do you get with NIR? OpenCL C? DirectX SM?
13:03 robclark: currently glsl/prog/spirv
13:03 karolherbst: robclark: we have cases were an empty if else endif clause in TGSI makes some shaders have 30 more instructions (300 instructions big shaders)
13:04 RSpliet: "prog"?
13:04 robclark: arb shader program stuff
13:04 robclark: (ie. don't care about it :-P)
13:04 RSpliet: that's not the really old ATI one is it?
13:04 robclark: no
13:04 robclark: although there I guess is tgsi support for that one now..
13:05 RSpliet: heheh... you just can't win with a single IR now can you?
13:05 robclark: (anways, for simple shaders like arb shader / ati / internal blit shaders, etc, tgsi->nir is fine)
13:05 robclark: single IR doesn't even make sense, I think.. although doing less in glsl and more in nir does..
13:06 RSpliet: as far as OpenCL goes... I take it there'll either be an OpenCL C -> SPIR-V pass or a direct OpenCL C -> NIR pass in the pipeline somewhere?
13:06 robclark: I guess ocl c -> spirv -> nir.. although the current spriv->nir probably misses some compute related bits currently..
13:07 RSpliet: robclark: oclc->spirv makes sense, given how Khronos already did that for us :-P
13:08 robclark: RSpliet, karolherbst, btw if you build freedreno, there is ir3 cmdline compiler which you can feed tgsi.. give it --verbose and it will dump out various NIR and ir3 intermediate stages.. probably a good way to get an idea what NIR helps with..
13:09 karolherbst: robclark: well afaik tgsi doesn't do anything anyway?
13:09 karolherbst: at least I was told that optimizations aren't/shouldn't be done in tgsi
13:09 robclark: in general, given the amount of work gone into codegen, and the fact that making big changes in compiler is hard without regressing anything, I'm not sure if switching makes sense.. or at least there would have to be some thigns that NIR does significantly better..
13:09 robclark: tgsi does very little.. I guess most of what is done is done in glsl, but not 100% sure..
13:11 RSpliet: robclark: nouveau in isolation maybe doesn't, but if we get in a position that we're the only TGSI consumer left we'd have to maintain the front-end ourselves
13:11 RSpliet: which, given the limited manpower might not be ideal :-)
13:13 robclark: perhaps, but I don't think that will happen soon.. my bigger interest in skipping tgsi for freedreno is to avoid for every new feature having to add it both in tgsi_to_nir and ir3 ;-)
13:14 RSpliet: robclark: is anyone, besides Hans, actively looking at OpenCL support for any driver with TGSI in the middle?
13:15 robclark: idk, anyways, maybe it could be a good idea to make a list of what codegen does well and what is missing as far as opt passes, and compare that to NIR.. that should give an idea about whether all the work for nir->codegen is worth it or not..
13:15 robclark: I think just hans
13:15 RSpliet: robclark: "would OCLC -> TGSI ends up being more work than NIR -> NV50 IR?" is I think the appropriate question :-)
13:16 RSpliet: *end
13:16 robclark: hmm, not entirely sure..
13:17 robclark: although jekstrand did mention some compute related bits missing in spriv->nir (iirc, it was mostly about casting things to different sorts of pointers).. not sure what level of ocl that is required for..
13:17 robclark: (or how much work that would be to add)
13:18 RSpliet: exactly, OCLC -> TGSI would be our one-man project, NIR->NV50IR gains us an army of Intel/Broadcom/you on the job
13:18 robclark: well, "army" is a bit generous ;-)
13:19 RSpliet: haha, true, but it makes a difference :-)
13:19 RSpliet: anyway, that's my personal opinion on this
13:19 robclark: anyways, hopefully imirkin wakes up at some point.. he at least has some experience w/ both codegen and nir->ir3.. maybe keeping codegen as-is and just having a parallel nir->codegen pass might be interesting..
13:20 RSpliet: but unless I put my fingers where my mouth is, I'm not entitled to an opinion :-D
13:21 robclark: I guess if nir->codegen was small enough task that someone didn't mind wiring one up as an experiment then you could compare tgsi->codegen vs nir->codegen..
13:21 RSpliet: (I do realise that curro did a lot of work on OCLC -> TGSI in the past, and Hans is presumably leveraging that work, so the initial investment might have already been 90% there. However, maintenance isn't free either)
13:22 robclark: (yeah, afaiu it is reviving the old llvm->tgsi thing that curro did)
13:30 Tom^: mupuf: <3
13:30 mupuf: Tom^: ?
13:30 Tom^: mupuf: every addition to nouveau no matter the feature is a step in the right direction, i would never complain about such things :P
13:31 mupuf: ah ah
13:31 Tom^: i might mock the feature itself but not the code implenting it xD
13:31 mupuf: well, sure, but priorities, you know ;p
13:45 karolherbst: mupuf: well how much time did you spend with this feautre? :D
13:47 karolherbst: mupuf: well anyway, those maxwell things will be a bit more complicated because of the color support :/
13:47 karolherbst: I doubt the LED subsystem has any support for it
13:47 mupuf: karolherbst: too long? :D
13:47 karolherbst: but maybe we can split it by color
13:47 mupuf: but yeah, ~4h
13:47 karolherbst: yeah
13:47 Tom^: also spend your own time on the stuff you enjoy, things gets much more fun that way.
13:47 Tom^: ;)
13:47 karolherbst: and you add code paths to check more backlight stuff
13:47 karolherbst: or not?
13:48 karolherbst: or was it just fallout
13:48 karolherbst: ohh wait
13:49 karolherbst: you just use nvkm_gpio_find and simple read/writes
13:50 karolherbst: mupuf: what is 0xc0000000? commit + enable + ?
14:01 imirkin: well, i don't think my feelings on nir are a big secret - i want nothing to do with it
14:03 karolherbst: imirkin: any special reasons for this?
14:03 imirkin: also, imo having a llvm backend for gpu code generation is a bad move - just look at all the trouble that amd has with versions, etc. not worth it. (not to mention the colossal effort to port things over, and get it working smoothly)
14:03 imirkin: and then you'd be stuck having to deal with llvm, which is a large, complex, slow project
14:04 karolherbst: imirkin: llvm is only for the falcon code compilers to replace envyas
14:04 imirkin: whereas at least i like to think i understand *most* of codegen. any such delusion would be gone with llvm.
14:04 imirkin: nir is just a big duplication of effort
14:05 imirkin: anything it does has to be done a second time by the backend
14:05 imirkin: so it might as well just not do anything
14:05 imirkin: since the backend will have to do it anyways
14:05 imirkin: looking at freedreno/vc4 as sample users is not a great idea - they don't have real backend compilers
14:06 imirkin: look at i965 - it has its own backend compiler, which has a bunch of the same opts as in nir
14:06 imirkin: but you have to do them a second time, because the IR doesn't always match up 1:1 with the hw's capabilities, so you end up emitting extra code, which has to be reoptimized
14:06 imirkin: might as well just do it in one go
14:07 karolherbst: imirkin: yeah, but nir is shared code, if we just use NIR instead of tgsi, we would get this, and would then still only develop mainly in nv50 ir
14:07 imirkin: nir vs tgsi is not a thing. tgsi is not an ir
14:07 karolherbst: in the end the question is, if tgsi ir or nir gives us a better starting point and if we would generate better code in the avarage case
14:07 imirkin: or at least it doesn't have 97% of what one might expect an ir to have.
14:08 imirkin: which is why i like it so much - nice and simple
14:08 imirkin: and simplicity in compilers is... very useful
14:08 imirkin: nir brings nothing to the table - it comes with a certain set of opts, which could easily be implemented in codegen, for example. (many of which are)
14:09 imirkin: i'm not aware of any plans to drop TGSI from the r600 or radeonsi drivers either, so it's not like nouveau is the last user
14:10 imirkin: in fact nouveau and r600/radeonsi are basically the only 2 users that matter - the rest are fringe GPU's. majority of users are on nvidia or amd gpu's.
14:10 karolherbst: well majority of users are on intel
14:11 imirkin: and yet it's taken up until now for NIR to gain fp64 support
14:11 imirkin: perhaps it's my personal failing, but every time i interact with NIR, my internal rage level goes way up
14:11 imirkin: i don't understand the API
14:12 imirkin: and i find that it's difficult to manipulate.
14:12 karolherbst: no fp64 support is indeed a big problem
14:12 imirkin: long story short, i have absolutely no interest in touching NIR for nouveau
14:13 karolherbst: or is there decent fp64 support now?
14:13 imirkin: that said, if some other person came along who loved nir and made a nir <-> nv50 ir connector, and was willing to take over maintainership of the whole compilation component, i'd be happy to retreat into the shadows
14:13 karolherbst: well we can have both and see what runs better in avarage
14:14 karolherbst: or check which path is better in which situations and improve the nv50 codegen
14:14 imirkin: (a) i don't care - i find nir hard to work with
14:14 imirkin: (b) any improvement from such a path would be easy to reproduce in the nv50 ir backend.
14:15 karolherbst: nay hard feelings about using a llvm based falcon code compiler?
14:37 imirkin: if the thing works, sure, no problem
14:37 imirkin: in fact, very positive feelings on the matter
14:38 imirkin: the falcon isa is a good match for what llvm does with most backends, so it shouldn't be too bad
14:39 karolherbst: I think it will also give us a lot more stability and less "I forgot the restore the register" problems
14:40 imirkin: mmmm... but it'll also give more "the compiler miscompiled" bugs, so i'm sure it'll even out on that front
14:40 imirkin: but it'll make the fuc logic readable, which is a plus
14:41 karolherbst: yeah
14:41 karolherbst: having a C frontend is a big plus here
14:41 karolherbst: and we can also enforce specific compiler ABIs or enforce other stuff
14:43 mupuf: karolherbst: commit + enable, yes
14:44 karolherbst: funny how nvidia sometimes writes the value and commits later or does it with one write
14:46 mupuf: karolherbst: yeah, does not matter
14:48 hakzsam_: imirkin, btw, compute images override 3d images and vice versa on fermi, that sounds familiar :)
14:48 mupuf: mwk: that's for you! https://www.youtube.com/watch?v=bYQ_lq5dcvM
14:48 imirkin: hakzsam_: quite :)
14:49 mwk: mupuf: heeh, fun
14:49 mupuf: but I mean, watch it!
14:54 imirkin: hakzsam_: good thing you know how to deal with that now
14:55 hakzsam_: sure
15:00 mupuf: mwk: oh yeah, the end just rocks :o
15:01 mupuf: access to the framebuffer
15:01 mupuf: mandelbrot in 8 lines of python
15:03 karolherbst: though it was quite slow :D
15:03 mupuf: yeah :D
15:03 mupuf: but it runs in qemu
15:03 karolherbst: I guess changing every pixel at the time really slows it down
15:03 mupuf: so... :D
15:03 karolherbst: ahhh
15:04 karolherbst: but I still think most of the overhead is the framebugger operations
15:04 karolherbst: *are
15:04 karolherbst: *framebuffer
15:04 mupuf: and ... this is not fast on anything except GPUs
15:04 karolherbst: it is usually
15:04 karolherbst: rendering fullhd in realtime is possible
15:04 karolherbst: I've done it in school once
15:04 mupuf: ack
15:05 mupuf: oh, there is a question on the performance
15:05 mupuf: oh, pure software floating point
15:05 karolherbst: uhhh
15:06 mupuf: that was earlier in the talk
15:06 karolherbst: but an issue easy to solve
15:06 karolherbst: well
15:06 karolherbst: kind of
15:06 mupuf: still looks fast enough for me for writing tests on EFI :)
15:06 mupuf: or hacking around
15:06 karolherbst: :D
15:06 karolherbst: right
15:07 karolherbst: it makes a difference though if you modify each pixel
15:07 mupuf:found this video because he looked for the spelling of Josh's name
15:07 karolherbst: or just copy the data over once
15:23 robclark: imirkin, btw, offhand I think tex sample stuff in ir3 is where we are mostly not 1:1 but more of that could be moved into an ir3 specific nir->nir pass.. I just didn't think of that approach at the time.. and I don't think adding hw specific alu instructions to nir would be hard.. (but I do agree that it seems harder to justify adding nir support when you already have re-implemented all the different opt passes in your own backend already)
15:24 robclark: but I don't find nir hard to manipulate.. the nir_builder API is nice (I have written a good number of nir lowering passes by now.. compare some of them in my gallium-nir branch to equiv tgsi passes :-P)
15:28 karolherbst: robclark: one question: how big is the perf difference between having only nir opts and nir+ir3 opts?
15:28 robclark: well, I mean most of the ir3 opts are just clean-up for extra mov's/etc that you end up w/ in front end..
15:29 karolherbst: yeah well, if the input is a lot optimized already, then yeah
15:29 karolherbst: but this is my question in general
15:29 karolherbst: what if nouveau would use NIR and all the nouveau opts in addition to that only give like 10% more perf, because nir optimizes a lot already
15:30 karolherbst: then nv50 ir opts would just be for a bit of more perf
15:30 imirkin: karolherbst: it's not a fair comparison - ir3 never was a full optimizing compiler
15:30 karolherbst: imirkin: I know, I am still interessted
15:31 robclark: so there is pretty much going to be some manditory clean-up after the front end..
15:31 imirkin: karolherbst: ok. best of luck.
15:31 robclark: like avoiding moving uniforms into gprs if you have instructions that can take a const as a src reg..
15:31 karolherbst: robclark: well opts in the backend can produce stuff which the frontend could optimize again
15:32 karolherbst: robclark: right, nvidia hardware has some really odd things there, but usually you can immediate in many arithmetic instructions
15:32 robclark: I think probably most stuff that needs a feedback loop is handled in nir, at least for me.. (the exception might be spilling, I think)
15:32 karolherbst: imirkin: but I really don't think if the input is good enough, that we still need those code deduplications and all that stuff, I can't believe that this would be usefull at all
15:33 imirkin: robclark: when i was trying to add the atomic ops stuff in ttn, nir confused me greatly.
15:33 robclark: for ir3, the passes after nir->ir3 are mostly cleanup / scheduling / register-assignment, so nothing too hard-core..
15:33 robclark: imirkin, not sure if that was before or after some of the cleanups related to intrinsics..
15:33 imirkin: dunno
15:33 imirkin: i spent many hours on rather trivial matters though
15:34 imirkin: also nir goes against my thesis of "do less work in frontend, more work in backend", so... i'm less interested.
15:34 robclark: (like the const_index stuff and src stuff)
15:34 karolherbst: robclark: how long did it take to port stuff over?
15:34 robclark: well, switching over was kinda intertwined w/ a big re-work in ir3 to add flow control..
15:34 karolherbst: imirkin: I really thing that doing more in the frontend would improve the compiler on nouveaus end, because less man time is needed, because I am pretty sure some passes wouldn't be needed anymore
15:36 imirkin: ok
15:37 imirkin: maybe i'm just bitter that nv50 ir does everything nir does and more but people for some reason love nir. who knows.
15:37 karolherbst: just a theory though and I think it is a good idea to at least try it out and check how big the difference is (CPU time and generated code)
15:37 robclark: karolherbst, anyways, ir3_compiler_nir.c is pretty much the whole nir->ir3 frontend, maybe adding an experimental nir frontend to codegen (and basically leave all the backend stuff the same) would be an interesting experiment to see what nir buys you.. at least it should be a smaller front/end than TGSI since you can use NIR to lower things (like idiv, etc)
15:37 karolherbst: imirkin: maybe they also think that the frontend deals with enough stuff already so the backend does need less code
15:37 karolherbst: at least this is how I see it
15:37 imirkin: karolherbst: check i965. note the tons of code :)
15:38 karolherbst: well they didn't used nir from the start and I doubt that nv50 codegen would be cleaned up after using nir
15:38 imirkin: robclark: yeah, but the nir lowering is less efficient
15:38 karolherbst: but yeah, maybe nir isn't as good as I think it is
15:38 imirkin: robclark: for example, it doesn't lower idiv-by-constant into something nice
15:38 robclark: tbh, I don't (yet) have a very good setup for profiling, but it seemed to me like glsl front-end was the bulk of the time (for example, glsl->tgsi->nir doesn't make that much of a diff compared to glsl->nir.. maybe a few percent)
15:39 imirkin: [and that whole div pass is copied from nv50, as you're well aware]
15:39 karolherbst: robclark: okay, so in fact glsl -> nir -> tgsi
15:39 imirkin: robclark: and iirc it doesn't even flip mul * 2^n into shl.
15:39 robclark: imirkin, the constant-propagation pass should cleanup to some degree idiv-by-const, I think..
15:39 imirkin: i don't think you understand
15:39 imirkin: idiv-by-const == mul * huge const
15:40 imirkin: e.g. look at the code gcc generates when you do x / 5
15:40 imirkin: it'll be x * 0xaaaaaaaaaaaa or something
15:40 imirkin: (i forget the exact details)
15:40 robclark: the mul/div into shifts is something that is missing.. but oneday someone who knows more py than me will get around to adding support for that sort of thing to algebraic opt stuff
15:40 imirkin: anyways, nouveau implements that
15:40 imirkin: and sure, anything you can implement as a nv50 ir opt you can also implement as a nir opt. and vice-versa.
15:41 robclark: well, anyways, I wouldn't recommend a new nir->codegen implement that (vs spiffing up nir idiv pass) at any rate..
15:41 imirkin: karolherbst: if you want to play with nir, be my guest. imo it's a total waste of time.
15:42 karolherbst: yeah and that's why I asked how much time it would take
15:42 robclark: I expect it should be ~2k loc ;-)
15:42 karolherbst: robclark: nir to codegen?
15:42 RSpliet: robclark: is that spaghetti before or after boiling?
15:42 imirkin: i expect it will be much much more.
15:42 robclark: well, guestimate based on nir->ir3, which I guess should be comparable..
15:43 karolherbst: robclark: is ir3 SSA?
15:43 karolherbst: allthough it shouldn't matter that much
15:43 robclark: (ok, yeah, tgsi->codegen probably supports some things that are missing in ir3.. and other compiler stages.. but just getting things to basic gl3 feature level is enough to compare tgsi->codegen vs tgsi->nir->codegen to see if it helps or not)
15:43 karolherbst: I just hope the nv50 pre SSA stage can deal with a lot of registers being used
15:44 imirkin: no, it matters
15:44 robclark: karolherbst, yes, ssa
15:44 imirkin: all the nv50 ir lowering is pre-ssa
15:44 imirkin: so you'd have to either rewrite all the lowering
15:44 imirkin: (and there are reasons why it's done pre-ssa, even if they're not great reasons)
15:44 karolherbst: or lower into pre-ssa
15:44 imirkin: or you have to go out-of-ssa from nir -> nv50 ir
15:44 imirkin: and then do the lowering
15:44 imirkin: and then re-ssa
15:45 robclark: karolherbst, imirkin, you can come out of SSA before nir->codegen too.. ir3 is the only one right now that doesn't come out of SSA
15:46 robclark: imirkin, ignoring things like idiv which possibly could be lowered better, are there things in codegen lowering which don't already have an equiv nir pass?
15:46 imirkin: hmmm... i think matt finally added the byte-extract stuff
15:47 imirkin: i dunno, i'd have to go over it opt-by-opt
15:47 imirkin: but like i said... anything that can be done in nv50 ir can also be done in nir
15:48 robclark: imirkin, btw, I added https://trello.com/c/blaU8BF0/115-nir-idiv-lowering-optimizations .. feel free to fill in details/pointers and I'll look at it at some point
15:49 robclark: anyways, just curious.. I know r600 (for example) could use some of the lowering passes that have been implemented in NIR.. but I know less about what sort of things (other than idiv) need lowering for nv..
15:49 imirkin: robclark: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp#n1017
15:50 imirkin: (s = the arg that's the immediate. so 1 = the second arg)
15:50 imirkin: robclark: oh, note that this requires the "mul high" concept. which i'm sure nir has. but just pointing it out.
15:52 robclark: so might be another excuse to expose madsh/mull/etc stuff as nir instructions..
15:52 imirkin: nir already has a mulhi, at least
15:53 imirkin: (it's accessible from glsl... [ui]mulExtended() iirc )
15:57 imirkin: robclark: feel free to peruse nv50_ir_peephole.cpp
15:57 imirkin: it's basically all the opts that are done
15:57 imirkin: also observe that the opts are NOT done in a loop
15:57 imirkin: which ideally improves runtime without sacrificing too much perf
15:58 imirkin: the one thing that nv50 ir needs but doesn't have is GVN
16:00 robclark: (jfyi, with nir, the "loop" part, and really pretty much which opt passes are used vs not used, is all up to the driver.. I did have the vague idea of single-loop fast-compile and then background thread re-compile and switch out to more optimized compiler.. but bigger fires)
16:01 karolherbst: robclark: I don't think it matters with a compiler cache anymore then anyway
16:01 karolherbst: as long as cache lookups are cheaped than compilations the compilation can be as long as it wants (to a certain dagree) as long as it is only done once
16:01 imirkin: i just meant that the opt passes are designed s.t. the loop is largely unnecessary
16:02 imirkin: there are *occasional* cases where they improve things
16:02 karolherbst: for stupid reasons mostly
16:54 karolherbst: robclark: in emit_instructions() freedreno converts nir to ir3?
16:56 robclark: karolherbst, probably emit_function() is the most interesting part..
16:57 karolherbst: yeah, seems that way
16:57 robclark: (which eventually ends up in emit_instr()..)
17:09 karolherbst: maybe I try out to have a tgsi -> nir -> nv50 ir translation working for rather simple shaders and see where it gets me
17:09 karolherbst: and with simple I mean not many different instructions are used
17:22 imirkin_: karolherbst: good luck! :)
17:22 karolherbst: yeah...
17:23 imirkin_: should all be easy until you hit control flow
17:23 karolherbst: and phis maybe
17:23 karolherbst: all that ssa stuff
17:23 karolherbst: robclark: you convert the ssa stuff by hand, don't you?
17:23 imirkin_: phi nodes appear with control flow
17:23 karolherbst: mhh right
17:24 karolherbst: but it can't be that hard in the end
17:25 robclark: karolherbst, I do come out of ssa myself (although probably not as efficiently as NIR's out-of-ssa pass)..
17:25 karolherbst: imirkin_: and I think I will start from tgsi, because that already exists might be a good starting point
17:25 karolherbst: robclark: ohh there is an out of ssa pass
17:25 robclark: but that is mainly because I want it to still be in SSA when I do instruction scheduling..
17:25 karolherbst: any reason not to use it?
17:25 imirkin_: karolherbst: to be clear though, if you want this to go forward in any significant way, you'll be the one maintaining it going forward
17:25 robclark: yeah, I think for anyone who doesn't have to do instruction scheduling, use NIR's out-of-ssa pass ;-)
17:25 karolherbst: imirkin_: well it would only be the translating part
17:25 imirkin_: [or at least it won't be me]
17:26 karolherbst: I don't plan to mess other parts
17:26 imirkin_: compiler is one unit.
17:26 karolherbst: except maybe marking some passes as "not with nir"
17:27 karolherbst: and nir still lacks fp64? Or are there patches for that now
17:27 imirkin_: i think those patches were just pushed
17:28 imirkin_: i965 still doesn't support it, but the nir support is there
17:28 imirkin_: at least for the bits of it that i965 needs
17:28 karolherbst: ahh right, seems that way
17:28 karolherbst: https://cgit.freedesktop.org/mesa/mesa/commit/src/compiler/nir?id=2ab2d2e5881d289a8239467a97516e4e410cebfb
17:28 karolherbst: :D
17:28 karolherbst: funny
17:28 imirkin_: note that nir is tuned to intel ops, not nvidia ops
17:28 karolherbst: fmod@32 and fmod@64 instructions
17:28 karolherbst: imirkin_: yeah, I already expect this
17:29 karolherbst: and if something is stupid inside nir they should change it though or make it more generic
17:29 karolherbst: or I do when I am up to it :D
17:29 karolherbst: or nobody does and I give up
17:31 karolherbst: and starting with tgsi ->nir has the benefit I can use nouveau_compiler :D
17:32 imirkin_: i think i said it before, but i'll say it again: this is a colossal waste of time.
17:32 imirkin_: spend 1/10th of the time you'd spend on this bs instead on improving codegen, and we'll be way better off.
17:33 karolherbst: yeah, but I always annoyed by the bs we get from tgsi :/
17:33 imirkin_: the solution to that problem is to teach codegen how to deal with it
17:33 imirkin_: because anything we get from tgsi
17:33 imirkin_: we might cause to happen internally due to various optimizations
17:34 imirkin_: the fact that the tgsi code is dumb is irrelevant. it's not SUPPOSED to be optimized.
17:34 karolherbst: yeah I understant this, but maybe we are better off if we get something optimized already
17:34 imirkin_: codegen is supposed to be able to optimize it. any failing to do so should be remedied by improving codegen.
17:34 imirkin_: no, because codegen alters code, adds/removes it
17:34 imirkin_: and can generate those stupid scenarios on its own
17:35 imirkin_: it needs to be able to handle them nicely
17:41 karolherbst: yeah I know and I don't say it shouldn't be added, but maybe (and this is something I would like to verify) starting with NIR gives us a much better starting point and let us generate better code without improving codegen, so that further codegen improvements aren't as important as while starting with tgsi
17:42 karolherbst: but maybe I am wrong, but we can't be sure about that, can we?
17:42 imirkin_: what i can be sure of is that the time spent in attempting this would be much better spent improving codegen.
17:43 karolherbst: right with a perfect codegen this is true for sure
17:43 imirkin_: no, with the current codegen
17:44 karolherbst: well I could work on that empty branch elimination stuff first, but I failed real hard on this :/
17:45 imirkin_: in that case, prepare for huge fail when trying to hook up nir.
17:45 imirkin_: anyways... it's your time... do whatever you want
17:45 imirkin_: i have no plans on supporting it... ever.
17:47 karolherbst: well at least I would learn a lot about nir and nv50 ir this way
17:49 imirkin_: i think i've said my piece. do whatever.
17:50 karolherbst: well in any case I will deal with the empty branch elimination first
17:51 karolherbst: robclark: by the way, do you know if it is possible to add a nir -> tgsi converter or wouldn't that work for various resons?
17:52 robclark: there is a (partially complete) nir->tgsi converter..
17:52 karolherbst: ohh
17:52 karolherbst: that sounds interessting
17:52 robclark: anholt was using it initially, I think airlied has been playing with it more recently..
17:52 karolherbst: because then I would simple to tgsi -> nir -> nir_opts -> tgsi
17:52 karolherbst: *do
17:52 robclark: not sure how complete it is tho..
17:53 karolherbst: well at least this could be used by other drivers
17:53 karolherbst: and new gallium based drivers would get a crap load of opts already
17:54 robclark: well, probably new drivers should start off w/ NIR directly..
17:54 imirkin_: there are two types of drivers
17:54 imirkin_: drivers that are going to have a real optimizing compiler - those should stay as far away from nir as possible
17:55 imirkin_: and those that want to have the most barebones backend compiler possible - those should use nir for everything and just live with the fact that they won't have the best codegen.
17:55 robclark: I disagree with that.. especially if that real optimizing compiler isn't written yet..
17:55 karolherbst: well using tgsi as the input and just to a simply tgsi -> nir + opts -> tgsi round should be quite usefull
17:55 karolherbst: in both cases
17:56 karolherbst: even if only used as a starting point
17:56 karolherbst: removed later
17:56 imirkin_: karolherbst: waste of (cpu) time
17:56 robclark: I am more skeptical about usefulness of tgsi->nir->tgsi..
17:56 karolherbst: well yeah, but nir->tgsi kind of exists as you said
17:57 karolherbst: and if it turns out to be useless, then less time spent on this
17:58 karolherbst: do softpipe or llvmpipe do anything with tgsi?
17:58 karolherbst: never check how they work
17:59 imirkin_: "do"?
17:59 imirkin_: they receive it...
17:59 karolherbst: okay
17:59 robclark: I thikn all anyone does w/ tgsi is translate it into something else immediately ;-)
17:59 imirkin_: llvmpipe feeds it to llvm, which jit's it
17:59 imirkin_: yeah, tgsi isn't meant to be manipulated
17:59 karolherbst: yeah, but this is what I meant
18:00 karolherbst: the receiving part
18:00 imirkin_: softpipe executes it directly in ~the slowest way possible
18:00 imirkin_: r300 and r600 and radeonsi also take in tgsi
18:00 imirkin_: as does nv30
18:00 imirkin_: as does ilo, although that one would be prime for flipping to nir
18:03 karolherbst: yeah then I don't see why a tgsi->nir->tgsi round might be a bad thing to have. Even if it turns out to be useless for nv50, maybe nv30 speeds up a little or ilo or something else
18:03 imirkin_: it'll most likely cause nv30 to no longer work
18:04 imirkin_: (nv30 doesn't even do RA... the nir -> tgsi pass will destroy it)
18:04 karolherbst: ohh, that sounds bad then
18:06 RSpliet: not to mention you'd want to check for all those PIPE_CAP_TGSIs in your nir->tgsi pass and lower accordingly, which undoes a lot of what nir might have achieved
18:07 imirkin_: and i wonder if nir supports lack of native ints... dunno. maybe it does.
18:07 karolherbst: RSpliet: well nir opts are controlled from outside of nir afaik
18:08 karolherbst: so you basically convert to nir, run each pass you want to have and convert to something you want to deal with later
18:09 robclark: imirkin, it should, there is nir_shader_opts::native_integers flag..
18:10 imirkin_: doubtful anyone's tested it since vc4 moved to native ints
18:12 robclark: imirkin, I doubt it matters anyways, until you start doing glsl_to_nir directly..
18:17 karolherbst: imirkin_: okay, so what should happen when a BB only has one instruction and this instruction is a unconditional bra
18:17 imirkin_: karolherbst: if the BB in question only has a single incoming edge
18:18 imirkin_: then you should change the destination of that incoming edge
18:18 imirkin_: as well as change the destination of the branch along that edge
18:18 karolherbst: is this important that there is just one incoming edge? Why not changing all incomings?
18:18 imirkin_: mmmmmmmm
18:18 imirkin_: yes, that sounds reasonable.
18:18 karolherbst: they would end up in the same BB after going through the empty one anyway, or is there something else?
18:19 imirkin_: you might also look at the flattening pass
18:19 imirkin_: what you want to do is probably a subset of that
18:19 imirkin_: not sure, would have to look in more detail
18:20 karolherbst: flattening is after RA, right?
18:20 imirkin_: yeah, but your pass should happen before RA
18:20 karolherbst: I had already fun with it because it crashed after I changed the target BB of branches
18:20 karolherbst: ...
18:20 karolherbst: *bra
18:20 karolherbst: because my first idea was
18:20 imirkin_: right, you have to be careful to make sure you fix everything up
18:20 imirkin_: including the edges
18:20 karolherbst: if you bra to a BB with nothing
18:20 imirkin_: not just the bra instructions
18:20 karolherbst: then why not jumpt to the next
18:21 karolherbst: ahh right
18:21 karolherbst: okay, then that was what I missed
18:21 imirkin_: and make sure you don't remove/add edges
18:21 imirkin_: since that will destroy the universe
18:21 karolherbst: okay
18:21 karolherbst: so every bra instruction has a edge associated?
18:21 imirkin_: BB's are in a control flow graph (cfg)
18:21 imirkin_: branches specify which edge gets taken when at the end of the BB
18:22 karolherbst: I think the issue I had was, that I didn't found the right edge of a flowinstruction
18:22 imirkin_: sorry, i can't talk about all this right now
18:23 imirkin_: make a patch which doesn't work, i can take a look at it later and see if i can't notice some issues
18:23 karolherbst: okay
18:23 karolherbst: but any idea how I get the right edge of a flow instruction? well I try to find it out, but I didn't last time either :/
18:23 imirkin_: a flow instruction has a destination bb
18:23 karolherbst: right
18:24 imirkin_: edges have a start/end bb as well...
18:24 karolherbst: ahh and then I can lookup insn->bb and insn->asFlow()->target.bb and get the edge?
18:24 karolherbst: okay
18:25 karolherbst: thanks for the input by the way
18:26 imirkin_: there are edge iterators which ... iterate through the edges
18:26 karolherbst: yeah I saw those
18:26 karolherbst: ahh getOrigin and getTarget which return Nodes
18:27 karolherbst: I am curious if there are two edges with the same start/end bb
18:29 imirkin_: sure, that can happen
18:29 imirkin_: like a while loop
18:29 imirkin_: without internal control flow
18:30 karolherbst: okay
18:30 karolherbst: but it doesn't matter anyway...
18:30 karolherbst: because the result is the same
18:30 karolherbst: if the target is empty I just point the target somewhere else
18:32 karolherbst: okay, then I will do this first: optimizing those edges when the target is an empty BB (or only contains a stupid bra)
18:32 karolherbst: and then I can check if a conditional bra at the end of an BB points to the same destination
18:32 karolherbst: oh well, I would also have to rebind those bras in the first thing :/
18:33 karolherbst: well I think I will manage somehoe
18:33 karolherbst: *somehow
18:35 karolherbst: how should I call that pass? DBE (dead-branch elimination)? :D
18:35 karolherbst: allthough it is more useless-branch
18:36 karolherbst: ohh or empty
20:10 gouchi: hi, I was wondering if there is another way to change fan control for NVC0 card without hwmon ?
20:10 gouchi: because this documentation said we can't https://nouveau.freedesktop.org/wiki/PowerManagement/
20:27 karolherbst: gouchi: why do you need a different interface?
20:30 gouchi: karolherbst: just want to try to change fan control because it running it seems to run really fast
20:31 karolherbst: gouchi: well why don't you find out why it does run really fast?
20:31 karolherbst: this can be usually fixed
20:31 karolherbst: and may be fixed already
20:32 karolherbst: gouchi: what gpu do you have and I would also like to take a look into your vbios /sys/kernel/debug/dri/0/vbios.rom
20:32 gouchi: karolherbst: we are running kernel 4.4.2
20:32 imirkin_: gouchi: do you have a GF108?
20:32 imirkin_: gouchi: lspci -nn -d 10de:
20:33 gouchi: imirkin: yes
20:34 gouchi: imirkin_: 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF108 [GeForce GT 730] [10de:0f02] (rev a1)
20:34 gouchi: imirkin_: 01:00.1 Audio device [0403]: NVIDIA Corporation GF108 High Definition Audio Controller [10de:0bea] (rev a1)
20:36 gouchi: karolherbst: we are trying to follow this doc https://wiki.archlinux.org/index.php/nouveau#Fan_Control
20:37 karolherbst: gouchi: first mistake: you don't
20:37 gouchi: karolherbst: ok :)
20:37 karolherbst: well you usually don't have to do anything
20:37 karolherbst: if the fan runs to fast, but still changes speed through temperature changes, then it is alright
20:37 karolherbst: we don't parse all fan management related things right
20:37 karolherbst: especially with Kepler there was a lot of new
20:38 gouchi: alright
20:38 karolherbst: but if it is too fast
20:38 karolherbst: like noticeable difference compared to nvidia
20:38 karolherbst: we might be able to fix that
20:41 gouchi: karolherbst: the fan speed seems to be full speed and doesnt speed up or slow down with temperature it seems
20:41 karolherbst: mhh
20:42 imirkin_: gouchi: iirc some GF108's have messed up fan controls
20:42 karolherbst: we fixed something like that in 4.5 or 4.6
20:42 imirkin_: whereby the lower we set the fan, the faster it spins
20:42 imirkin_: or something
20:42 karolherbst: but no idea if that helps for fermi
20:42 imirkin_: iirc mupuf was looking into it
20:42 imirkin_: but that was like 6 months ago... =/
20:42 gouchi: ok I will ask to make some test running kernel 4.5 or 4.6
20:42 karolherbst: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/gpu/drm/nouveau/nvkm/subdev?id=a814a29d7bbfdfe56fe1bb9641a185077066eb9f
20:43 karolherbst: ohh wait
20:43 karolherbst: this was something else
20:43 imirkin_: afaik the issue i'm thinking of was never fixed
20:43 karolherbst: gouchi: maybe just check dmesg
20:43 imirkin_: although mupuf was able to locate a GF108 which had the issue
20:43 karolherbst: imirkin_: stupid values we can't handle in the vbios?
20:44 imirkin_: karolherbst: different programming model for the fans.
20:44 karolherbst: ahh
20:44 imirkin_: or the pwms, dunno
20:44 karolherbst: well we control all keplers wrong anyway
20:44 imirkin_: i don't remember the details
20:46 gouchi: karolherbst: http://www.hastebin.com/ebazihaqeb.vhdl
20:47 karolherbst: okay, then it seems something we parse wrongly
20:47 karolherbst: gouchi: would be nice to have your vbios so we can look into the the issue at some point
20:48 gouchi: karolherbst: cat /sys/kernel/debug/dri/0/vbios.rom > /tmp/vbios.txt ?
20:48 karolherbst: yeah
20:48 karolherbst: as root
20:49 karolherbst: user can't read debugfs
20:56 gouchi: karolherbst: http://www.filedropper.com/vbios
20:56 karolherbst: imirkin_: BB is considered empty if bb->getEntry is NULL or !bb->getEntry()->asFlow()->isPredicated() or bb->getEntry()->asFlow()->target.bb == nextBB(bb)? Or did I miss something?
20:56 karolherbst: gouchi: thanks
20:56 gouchi: karolherbst: no problem you are welcome
20:57 JamesWorts: cd ..
20:57 imirkin_: karolherbst: for you, i think it's if (a) bb->getEntry() == bb->getExit() and (b) bb->getExit()->next == null.
20:57 JamesWorts: oops
20:58 imirkin_: karolherbst: actually maybe if bb->getFirst()... if there are phi nodes in there you can't just drop the bb like that
20:58 karolherbst: mhh right
20:58 karolherbst: but why should there be phi nodes in an empty bb?
20:58 imirkin_: i forget what the accessors are actually - there's one that starts at the real beginning, one that starts after the phi nodes
20:58 imirkin_: i dunno, there's a million reasons that could happen
20:58 imirkin_: it wouldn't be empty
20:58 imirkin_: since it'd have the phi nodes :) and the branch at the end.
20:59 karolherbst: mhh okay
20:59 karolherbst: so I check for getFirst()
20:59 karolherbst: but why getEntry() == getExit()?
20:59 karolherbst: ...
20:59 karolherbst: ohh wait
20:59 karolherbst: yeah, you are right
20:59 imirkin_: because you want the branch to be the first (and last) thing
20:59 karolherbst: mhh
20:59 karolherbst: not true though
21:00 karolherbst: if it is an unpredicated branch
21:00 karolherbst: the next instructions are dead either way
21:00 imirkin_: can't have instructions after a branch
21:00 imirkin_: that's what makes a block basic
21:00 karolherbst: see, doesn't matter then anyway
21:00 imirkin_: it's that it doesn't have instructions in the middle
21:00 imirkin_: er
21:00 imirkin_: it's that it doesn't have branches in the middle
21:00 karolherbst: right
21:00 karolherbst: and I check for getEntry being a branch
21:00 imirkin_: (and in case you haven't picked it up, BB = basic block)
21:00 karolherbst: well maybe there is more flow instructions I am not aware of
21:01 karolherbst: and I should explicitly check for op == OP_BRA
21:01 imirkin_: yeah you should
21:01 karolherbst: yeah I know
21:01 karolherbst: but then I don't need to check getEntry == getExit
21:01 imirkin_: it might be something dumb... like OP_BRK
21:01 imirkin_: etc
21:01 karolherbst: ahh right
21:01 karolherbst: yeah I check for BRA then
21:04 karolherbst: imirkin_: uhhh
21:04 karolherbst: getFirst also returns nice stuff like vfetch b128 { %r368 %r372 %r376 %r405 } a[0x0]
21:04 karolherbst: ...
21:05 imirkin_: but then it all gets DCE'd?
21:05 karolherbst: ahh
21:05 karolherbst: getPhi
21:08 karolherbst: imirkin_: what do you mean?
21:09 karolherbst: removing empty branches should in fact only change the destination of branches and edges of BBs and the flattening pass removes those bras itself
21:10 karolherbst: the issue I found is just that a conditional branch went into BB:3, the branch itself into BB:2
21:10 karolherbst: BB:2 and BB:3 into BB:4 (with bra BB:4)
21:10 karolherbst: and BB:2 and BB:3 were empty except the bra instruction
21:11 karolherbst: so the bra from BB:1 can its predicate removed and turned into a normal bra and the source can be DCEed away
21:11 imirkin_: karolherbst: i have no idea what you're talking about
21:11 imirkin_: however if you present a simple tgsi program
21:11 imirkin_: along with your patch
21:12 karolherbst: yeah I already did, wait a second
21:12 imirkin_: and a description of what you perceive to be the problem
21:12 imirkin_: and send me an email with all that
21:12 imirkin_: i could have a look tonight
21:12 karolherbst: well it was your idea in the end, because you figured out what the probelm was :p
21:12 karolherbst: well not the solution I have, but what has to be changed
21:13 karolherbst: it is code like this: if (some_condition) { // empty } else { //empty }
21:13 imirkin_: right
21:13 karolherbst: and the flattening pass removed those branches (and a PostDCE would eliminate the condition)
21:13 imirkin_: flattening is too late
21:13 imirkin_: that's post-ra
21:13 karolherbst: right
21:13 imirkin_: you want a pre-ra block elimination pass
21:13 karolherbst: that's why I have to remove the condition in pre RA
21:14 karolherbst: I just need to detect pointless conditional bras
21:14 imirkin_: and then you want a pre-ra branch elimination pass
21:14 imirkin_: e.g. if all the branches at the end of a block point to the same place, get rid of the conditionals
21:14 karolherbst: right
21:14 karolherbst: that's what I try to do
21:14 imirkin_: but... fix one problem at a time
21:15 imirkin_: otherwise you'll lose your head :)
21:15 karolherbst: right
21:15 karolherbst: first thing: point to latest possible BB
21:15 karolherbst: this should get me figuring out the edge thing
21:15 karolherbst: and shouldn't hurt anything
21:17 karolherbst: imirkin_: do you know if there is an easy way to get the edge created because of a given flowInstruction?
21:17 karolherbst: I kind of need this :/
21:17 imirkin_: no. just iterate over all the outgoing edges
21:19 karolherbst: mhh right, if I change all it shouldn't matter
21:19 karolherbst: I just change all the fitting edges and all the bra instructions
21:20 karolherbst: any instruction I have to take care of too?
21:23 karolherbst: gouchi: could you also install envytools and run nvapeek 101000 if that's no issue?
21:23 karolherbst: isn't that important though
21:23 imirkin_: karolherbst: you need to change the edge's "destination" pointers
21:24 gouchi: karolherbst: ok I will try with https://nouveau.pmoreau.org
21:24 imirkin_: that should be it, i think
21:24 naptastic: so, I updated my kernel to 4.6-rc6, installed linux-firmware from git, and Mesa 1.2.1 from Debian "experimental".
21:24 naptastic: Now only one monitor gets detected (lol)
21:24 naptastic: and while video playback was great last night, after a reboot it's choppy, and this is in dmesg:
21:24 karolherbst: imirkin_: right, but I also have to change the target of the FlowInstructions, right?
21:24 naptastic: [ 1.964129] nouveau 0000:08:00.0: Direct firmware load for nvidia/gm206/gr/sw_nonctx.bin failed with error -2
21:24 imirkin_: karolherbst: in the original BB, yeah
21:24 karolherbst: imirkin_: right
21:24 imirkin_: naptastic: i guess you're still missing some firmware?
21:25 imirkin_: naptastic: perhaps you're loading nouveau at a time when the firmware is not available?
21:25 karolherbst: gouchi: your vbios is funny :/
21:25 naptastic: imirkin, you saying that makes me realize my mistake. Hold plz...
21:25 karolherbst: huh
21:25 karolherbst: the heck
21:26 gouchi: karolherbst: using distribution based on OpenElec
21:26 karolherbst: mupuf: this is awesomely bad :D "Voltage entry 90" in a pm_mode 0x40 table
21:26 karolherbst: mupuf: and the vmap tbale has 4 entries, all 0
21:26 naptastic: imirkin_, like the derp I am, I compiled nouveau in =y instead of =m, and didn't compile the firmware into the kernel. Lemme fix that and get back to you. :D
21:27 karolherbst: mupuf: never thought I would see something like that in a fermi card, especially a 730 :/
21:29 imirkin_: naptastic: fwiw just about everybody does nouveau=m
21:29 naptastic: imirkin_, yeah I know. I just like having KMS that much earlier.
21:30 naptastic: (Ever watch Willy Wonka And The Chocolate Factory? I am Veruca Salt. I'm also extremely bad luck with computers, which is amazing, because I'm a developer by vocation.)
21:35 imirkin_: "but i want one NOW!"
21:35 naptastic: :D :D :D
21:39 naptastic: I do wish kernel compiles were faster though. At some point recently they went from ~9 minutes to ~25 minutes for me. I have no idea what happened.
21:39 naptastic:gets back to work
21:39 imirkin_: you used a distro .config
21:39 naptastic: well, I started with one, but then I went in and turned off everything I was sure I didn't need (which was a lot)
21:39 airlied: make localmodconfig
21:40 imirkin_: i've yet to try it... i start with defconfig and add to it
21:40 karolherbst: airlied: all loaded modules?
21:40 imirkin_: (occasionally removing things)
21:40 naptastic: I always regret doing that... it invariably leaves things out that I need (masq target, for example, or drivers for my SATA controller, amazingly enough)
21:40 karolherbst: naptastic: these days you use ahci
21:40 airlied:hasn't use a monolithic kernel in a long time
21:40 imirkin_: yeah... back in like the 2.0 and 2.2 days i'd forget the disk controller... or the filesystem
21:40 imirkin_: but since then i've gotten a lot better at it :)
21:40 karolherbst: :D
21:41 karolherbst: yeah I made the mistake once too
21:41 karolherbst: or used the UUID in the command line....
21:41 naptastic: I've made that mistake... "once"
21:41 naptastic: for very large values of "once" XD
21:41 karolherbst: well PARTUUID works though
21:41 imirkin_: meh. that all requires an initrd i think
21:41 karolherbst: not PARTUUID
21:41 karolherbst: I use it myself without initrd
21:42 imirkin_: maybe recent userspace has gotten smarter
21:42 karolherbst: using sdc1 or something is kind of hacky if you have to remove a disc :D
21:42 imirkin_: i don't switch disks around often enough to care
21:42 karolherbst: yeah, but then one breaks
21:42 karolherbst: and your machine doesn*t boot :p
21:42 imirkin_: nah, that can never happen
21:42 karolherbst: :D
21:42 karolherbst: well PARTUUID is the way to go
21:42 imirkin_: unless it's physically broken, in which case all bets are off
21:43 karolherbst: no initrd needed and works inside the kernel
21:46 karolherbst: imirkin_: is bb->dom the Node with all incoming edges?
21:46 gouchi: karolherbst: is this correct http://www.hastebin.com/siyugevuko.coffee ?
21:46 gouchi: karolherbst: I had to type it :(
21:46 karolherbst: 8040488e seems good
21:57 karolherbst: imirkin_: got it: eIt = bb->cfg.incident(); BasicBlock::get(eIt.getNode()); :)
21:58 imirkin_: karolherbst: bb->cfg
21:58 imirkin_: karolherbst: bb->cfg.incident()
21:58 imirkin_: oh right, you got it
21:58 karolherbst: :)
21:58 imirkin_: dom = dominator tree
21:59 imirkin_: which is a SSA thing, don't worry about it
21:59 karolherbst: okay
21:59 karolherbst: now I just need to know how to rebind those edges :/
21:59 karolherbst: changing the target of the bras is easy
21:59 imirkin_: and also just adjust those edges
21:59 imirkin_: it might take a bit of doing, not 100% sure.
22:00 imirkin_: you might look at nv50_ir_graph.*
22:02 karolherbst: mhh no crash when just changing the targets of the bras...
22:13 karolherbst: uhh, node->data is the BasicBlock :/
22:15 karolherbst: well doesn't matter because I will only touch the edges :/
22:16 karolherbst: imirkin_: okay, I think I just have to change the Edge.target to the Node if the next BB...
22:17 imirkin_: yes :)
22:17 imirkin_: although you need to be careful
22:17 imirkin_: yeah
22:17 imirkin_: so
22:17 imirkin_: right
22:17 imirkin_: that won't quite work
22:17 karolherbst: I am carefull, really carefull :)
22:17 imirkin_: even though it feels like it should :)
22:17 karolherbst: :D
22:17 imirkin_: you need to find the relevant *incoming* edge in the target bb
22:18 imirkin_: and the relevant outgoing edge in the source bb
22:18 karolherbst: ohh right
22:18 karolherbst: I have to change both
22:18 imirkin_: welllll
22:18 imirkin_: no.
22:18 imirkin_: heh
22:18 karolherbst: :D
22:18 imirkin_: so normally they're one and the same Edge object
22:18 imirkin_: but now you're stuck with 2
22:18 karolherbst: shouldn't it be the same Edge object?
22:18 karolherbst: ohhhh
22:18 imirkin_: so you need to pick one of them to "survive" and patch the other container object to realize that desire
22:19 karolherbst: okay
22:19 karolherbst: so I have two edges
22:19 imirkin_: sadly the way the graph is represented is largely incomrehensible to the average person with an advanced degree in hyperbolic topology
22:19 karolherbst: edge for parent->bb
22:19 karolherbst: and edge for bb->next
22:19 karolherbst: and I have to merge those
22:19 imirkin_: basically eyah
22:19 karolherbst: what if
22:20 karolherbst: how hard would it be to create a new incoming edge?
22:20 karolherbst: because this would be the sane thing to do
22:20 karolherbst: mod the outgoing edge and create a fitting edge for this and remove the incomming edge for bb :/
22:20 karolherbst: mhhh
22:20 karolherbst: messy too
22:20 karolherbst: the graph has to stay sane no matter what I do
22:22 karolherbst: because when I rebind the edge from the bb->parent, bb has an invalid incoming edge
22:22 karolherbst: ohh no
22:22 karolherbst: it is the same object
22:22 imirkin_: right, so you need to fix it up in that crazy array
22:22 imirkin_: each node has an edge array
22:22 imirkin_: so you need to "patch" the array
22:22 karolherbst: okay
22:22 imirkin_: it's an array of arrays where [0] is incoming and [1] is outgoing
22:22 imirkin_: for maximal confusion
22:23 karolherbst: bb->parent->cfg.outgoing() find the right edge and patch the target
22:23 karolherbst: bb->cfg.incident() remove that^^ edge from the list
22:23 imirkin_: mmmmmm
22:23 imirkin_: no
22:23 imirkin_: you can't *remove* things from the list
22:23 imirkin_: that will end the universe
22:23 karolherbst: :/
22:23 imirkin_: and you probably don't want that to happen
22:23 imirkin_: so you need to patch things in-place
22:24 karolherbst: yeah, but the bb->cfg.incident() list will point to the edge which has nothing todo with the BB anymore
22:29 karolherbst: imirkin_: so basically two nodes are connected through the same edge object where edge->origin and edge->target points to the nodes
22:29 karolherbst: and there is only one edge object
22:29 karolherbst: so how can I change the target of that edge without messing with one of the two nodes
22:29 imirkin_: right
22:29 imirkin_: you can't
22:29 imirkin_: er
22:29 imirkin_: you take the source node's object
22:29 imirkin_: you find the edge
22:29 imirkin_: you adjust the target
22:29 imirkin_: and then you go into the target's edge list
22:29 imirkin_: and swap out the old edge for the "new" edge
22:30 karolherbst: and let the useless bb point to itself?
22:30 karolherbst: :D
22:31 karolherbst: mhh anyway, the node of that useless BB has then trashed incident and outgoing lists
22:31 karolherbst: because they still point to those edges
22:31 imirkin_: yeah but nobody cares
22:31 karolherbst: ahh okay
22:31 imirkin_: that BB is now unreachable
22:31 karolherbst: ohh okay
22:31 imirkin_: you can just clear out its inbound/outgoing edges, just in case
22:32 karolherbst: so I just throw those old edges into that bb
22:32 karolherbst: ohh yeah
22:32 imirkin_: (er hm, that might trigger an assert somewhere if there's a totally disconnected bb....)
22:32 karolherbst: that's what I meant with deleting though
22:32 karolherbst: :D
22:32 karolherbst: well
22:32 karolherbst: when I trigger that I see how severe that assert is
22:32 imirkin_: iirc it's in the RA
22:33 karolherbst: well if the bb is empty, we could not fail
22:33 karolherbst: because it is empty
22:33 imirkin_: i forget the details
22:33 karolherbst: empty unconnected BB, who cares..
22:33 imirkin_: it took me a long time to debug the failure, after which i threw the assert in
22:34 imirkin_: but i don't remember the precise conditions
22:34 karolherbst: well I will leave the assert there but add a bb->getFirst() != NULL &&
22:34 karolherbst: or something
22:34 karolherbst: ohh
22:34 imirkin_: or just cause that bb to not get processed in the first place or osmething
22:34 karolherbst: no, first should be fine
22:35 imirkin_: anyways, figure it out when you get to it
22:35 imirkin_: just be warned :)
22:35 karolherbst: yeah
22:36 karolherbst: anyway, retargeting the bra instructions already works :)
22:37 imirkin_: if you don't fix up the cfg, then fail is ahead
22:37 imirkin_: the cfg is used for lots of stuff, esp RA and code layout
22:40 karolherbst: yeah I know, I just did this first, because it is easier
22:50 karolherbst: imirkin_: so I just modify EdgeIterator.e in the bb->next?
22:50 karolherbst: or is there something else important
22:51 karolherbst: I see als Node.in and Node.out?
22:51 imirkin_: you need to modify the bb->cfg node's edge list
22:51 karolherbst: ohhh
22:51 karolherbst: those are arrays
22:51 imirkin_: arrays of arrays, even
22:52 imirkin_: the first array is incoming, the second outgoing
22:52 imirkin_: or something ridiculous like that
22:52 imirkin_: or maybe arrays of linked lists
22:52 imirkin_: it's crazy stuff, no matter which way you slice it
22:52 karolherbst: I think this was in the Edges
22:52 karolherbst: Edge has "Edge *next[2];"
22:52 karolherbst: / next edge outgoing/incident from/to origin/target
22:52 imirkin_: right
22:53 karolherbst: okay, so I just find the edge in Node.in
22:53 karolherbst: and change the array entry
22:53 imirkin_: next[0] is the next outgoing edge, next[1] is the next incoming edge. or something.
22:54 karolherbst: shit, that is just too messy :/
23:22 karolherbst: I made the code emite really unhappy now :D
23:22 karolherbst: but it seems to work
23:42 karolherbst: oh my holy crap
23:51 RSpliet:hands karolherbst some sacret bog-roll
23:51 karolherbst: this stuff is just so insane
23:51 karolherbst: and super hard to deal with :/