04:54mardination: since lsu like an FPGA is asycnrhonous where as dispatcher is synchronous, the hardware clock signal only carries meaning in the full pipeline mode, and than it is only pointless heat generator too, which is to say reclocking on correct pipeline takes effect only on first pixel rendering and queue filling
04:58mardination: and earlier intel chips bundled to netbooks were allready extremely powerful if run right, they were 16 4VLIWbudles wide chips, so 16identidiers for the decode and 4identifiers for the warp, in shorthened pipeline the VLIW bundle out of 16 in question, is targeted by the destination address, you are allowed to have LSU and alu operations in flight at the same time, so you block the registers on the first vliw slot
04:59mardination: as soon as the fetch queues do fill up, and no ready instruction is available, it extremely fast can arbitrate between those VLIWs
05:05mardination: so there is nothing to be done actually in the implementation , since it is very easy, and you have waste of time for the world computer graphics enthusiasts, it has massive 64 queues whichhever opcode you load from arb spec into one slot, it is able to execute async very fast when you flesh out the scheduler header
05:12mardination: you have no clue what metals do, where large amount of estonian metal businessmen buffered or served russian metal to euopes market, nickel and cobalt, you can not even understand how the circuit is formed taking advantage of superior signaller to brain
05:21mardination: metal is an inorganic material working in gigahertz range in the switching speed, human brain is organic one working in kilohertz ranges, i wonder when to you realize that calculations do not match what you do
05:35mardination: and machine learning or neural networks is probably according to wikipedia just the way to modify input based of the output of another ALU, and replay it with new inputs gotten from the output of another alu
05:42mardination: it analysis some type of image or sensor data, and robot will take accoring to such source data a movement on properly, and that is some tuype of calculation in floating point data what alus to execute and with what type of data
05:50mardination: finnish are involved alot in that type of business some time ago, i've been told, manssinen sea container lifter, but robotics obviously can be used to build houses too
06:12mardination: anyhow it'd be best if you do not no longer waste others time, well the driver might be ok when fixed abit, but not running stuff according to hw capablities is just a waste of time and also resources, i am a bit affected by this stuff my own but can manage my own stuff
06:13mardination: I'd rather wish others to do also something sane in the cooperative way even if i am not involved, faster progress
06:16mardination: and when i talked about mattst88 him saying youre a smart guy rambling often, we just wish you do well, for me it is samewise
06:17mardination: i have understood everything and samewise i wish others doing well too
06:21mardination: i have liked the best daewoo electronics company, which is richest in the world, cause they do not put the risk on only selling certain products, but have managed to inject their money into various assets instead in addition
06:21mardination: so their success would not depend on selling chips only etc.
06:22mardination: and so if there is similar brainsters in intel, they have massive amounts of legal money, instead of trolling they may widen their range of products
06:28mardination: if there is lots of free legal money, swat can be done in such ways that their is a side manufacturinig entity in case of when such a dork like me comes and the main mission takes a minor hit, than other entities wont etc.
06:29mardination: sorry swot i meant
06:29mardination: it should be strenghts weaknesses opprotunities and threats ansysis
08:01mardination: and you should honestly give up on finding algorithms on dx asm or glsl at higher level it is not going to work out when there is dependency chain composing of different instruction latency types, or branch the code goes wrong right away
08:02mardination: you need to control precisely what register indexes are allocated, and this type of control is not possible from glsl neither directx asm , but only in machine code
08:02mardination: or in the drivers registers allocator.
09:34karolherbst: cyberpear: seems like the build went through :)
15:14cyberpear: karolherbst: the kernel build works :) but the laptop doesn't resume from sleep :(
15:14karolherbst: cyberpear: mhh, even with runpm=0?
15:14karolherbst: it would be really awesome if you woulc be able to retrieve a dmesg from it... but I kind of think it's that easy
15:14cyberpear: so, it's a big step in the right direction
15:14karolherbst: netconsole might be the only way
15:15cyberpear: can I do netconsole over wifi?
15:15karolherbst: well, generally yes
15:15karolherbst: but not for that issue
15:15cyberpear: Would journalctl store the required dmesg data somewhere?
15:15karolherbst: because some daemon has to reestablish the connection
15:16cyberpear: or is the failure too early?
15:16karolherbst: cyberpear: only if everything got flushed out to the disc before whatever freeze happened
15:17cyberpear: I don't know how frozen it is... might it be possible to force a sync with alt-sysrq magic?
15:17cyberpear: caps lock doesn't respond, so it might be pretty frozen, though
15:18cyberpear: I'll see if I can set up a wired connection later.
16:09cyberpear: karolherbst: I am able to 'systemctl hibernate' then resume from hibernate (by manually passing resume=/dev/mapper/fedora-swap as suggested: https://bugzilla.redhat.com/show_bug.cgi?id=1206936 )
16:10cyberpear: but looking at dmesg for resume, the last message I see in "previous" boot is "kernel: PM: suspend entry (deep)" -- you're right I'd need netconsole
16:19karolherbst: cyberpear: ohh, you mean you can't resume from hibernation?
16:19karolherbst: uff, I highly doubt that's something supported all that well generally
16:19karolherbst: normally you just want to do the "normal" suspend and resume stuff
17:17mardination: and about branches, i think i gave all the links times ago, branches on AMD are stackless, both sides have sbranch instructions, and compiler takes care of them, however on nvidia branches can be also based of hw branch stack, i would not use such branching on NVIDIA, instead use similar as AMD but implement that in sw based of pointers.
17:18mardination: stack based branching means also that they are divergent or unanimous, divergent branches use convergent instruction or convergent point is detected by hw
17:27mardination: i have not looked into the codegen LLVM what parser semantics GLSL compiler uses, maybe it is recursive descendant, however LLVM is linear sweep.
17:32mardination: I do not know where the AMD folks transform the code of nested branches into, consecutive if else, maybe there is a GLSL pass for this
17:32mardination: or maybe they have a machine code pass or whatever
17:51mardination: aah ok, well i remember someone mentioning that nv50 IR is in ssa, so..prolly branches can be easily handled as wanted in that form
17:53mardination: where this translates into control flow pass over the edges of basic blocks
19:09cyberpear: karolherbst: I /can/ resume from hibernate (suspend-to-disk), but I /cannot/ resume from suspend-to-ram
19:11karolherbst: ohh, interesting
19:13cyberpear: I saw some complaints online that suspend-to-disk is broken, but it worked for me, though I did have to add the resume= kernel cmdline arg
19:38mardination: Maybe it is the same pdf what mattst88 alleary linked on dri-devel, however it is shown there what ssa does in phi-elimination phase. https://www.sciencedirect.com/science/article/pii/S1571066107005002
19:39mardination: the branch normalisation does this kind of transformation as I was referening to, i think this should be quite default way in LLVM too to lift stuff.
19:44mardination: and why you did not respect my talks, on SM spec , I do not know, since register allocation phase is covered there, but not in that regard, since register allocation avoids recursion under machine code even, it does not allow using the whole 64 texture instructions which is default in sm2.0
19:46mardination: since there are two operands in texture lookup, address and destination in the vector register file, and source can not match the destination of any other instruction
19:46mardination: because this is soft-clause which could trigger in recursion, and that is easiest way to avoid it
19:47mardination: so 64/2 = 32 instructions and possible 4 dependent lookups on sm2.0 added
19:47mardination: if you were to change register allocator, all 64 can be used too
19:56mardination: if you do saner analysis of what i have told so far, than 1. the smarter heads should see the implementation right away inside the driver. 2. also saner guys should understand those facts are underruling or ruling out the possibility that any non-hw specific hlsl or glsl code optimization
19:56mardination: in other words, no existing code can take advantage in graphics pipeline a superior performance, unless driver is modified
19:57mardination: what sweds did on the opencl kernel, was very pritimive kernel where mul and alu were with the same latency, hence this worked out
19:58mardination: once you have div instead of one of the adds, and the same dependencies between instructions, it no longer works
20:02mardination: similarly if you branch in the code with true divergent or non-divergent branching instructions hw ones, this breaks the scheduling right away too, hence this optimization can not be done without machine code
20:03cyberpear: karolherbst: to be more precise, hibernate (suspend-to-disk) reliably works for one hibernate/resume cycle. after resuming the first time, further attempts to hibernate either cause the machine to immediately re-awaken or never actually hibernate; just brings me to the lock screen after 10 seconds (no data loss, though)
20:06karolherbst: cyberpear: okay, so something is messed anyway
20:12cyberpear: the first hibernate cycle is reliable after a fresh boot, though. reboot + hybernate + resume + (attempt to suspend OR attempt to hibernate) fails.
20:13mardination: whatever the nir does I have not looked into, but I say even though vertex shader is on cpu on NV34 gme945 and r300 , gma950 seems to have them on gpu though
20:14mardination: all of those VLIW chips are allready very very advanced big rocketish solutions, you just have to make them run properly
20:14mardination: very advanced electrical circuit they have on specialized fragment shading programmable pipeline
20:34mardination: i got a ban from intel-3d when talking about 16x64 opcodes, so i need to reread, 2x2 fragments may mean with 4 pixels max in one clock also 4x64 size of register file
20:35mardination: but i very long time ago read that 4cycles for pixels means 16wide vector registers
20:35mardination: and this is more logical to me at least for some reason since process is 4times on 90nm compared to 22nm larger only
21:32mardination: i can not find those documents anymore, but listen come on this is not possible to have less than 16 if four pixels are rendered per clock
21:32mardination: remember that every pixel has 4 components
21:50mardination: so 8096/4 to be accurate, it has 2048 in todays world 2CU apu equivalent rendering power on fragment shaders, cause via registers you can render ultra fast 2048 instruction slots
22:17cyberpear: karolherbst: is there any chance of your patch getting accepted upstream? -- failure of suspend is a huge improvement over failure to boot -- otherwise, do you think it could be added to fedora?
22:28cyberpear: I see it was mentioned on the nouveau list: https://lists.freedesktop.org/archives/nouveau/2017-November/029249.html
22:30karolherbst: cyberpear: doubtful as the patch itself is quite hacky
22:30karolherbst: we have to work on a proper solution but we don't know yet how that would look like
22:32cyberpear: as a hotfix, would it be difficult to add the patch and special-case the cards it's known for?
22:33cyberpear: or worst case, have nouveau detect these cards and prevent itself from loading since it's completely broken w/o the patch
22:34karolherbst: well first issue: which cards are affected
22:34cyberpear: (I didn't try nouveau.modeset=0 because that seems to effectively disable nouveau based on the docs)
22:35cyberpear: seems to at least include gp107, but I don't know if it's ALL gp107 or just mine and yours
22:35cyberpear: or just cases where there's also an intel card
22:35cyberpear: so, your point is valid "which cards?"
22:37cyberpear: I only have the one card that I can test at the moment, though I can get access to a mobile 'Quadro M1200'
22:37cyberpear: not sure which chipset that maps to yet, though
22:45cyberpear: the M1200 I can get access to looks like a NV117 (GM107), so I could conceivably test it there if it would be helpful
22:48cyberpear:wonders how hard it would be to unload nouveau prior to suspend/hibernate
22:50karolherbst: I think it mainly hits gp107
22:50karolherbst: but.. not all afaik
22:52cyberpear: I guess testing it on the maxwell card could help verify it doesn't break things there, assuming the driver otherwise works there
22:55karolherbst: thing is, especially with that secure boot stuff, we can't be sure
22:55karolherbst: I'd rather not touch it if we don't know the implications
22:55karolherbst: it has to work on all cards, that's the biggest issue here
22:55karolherbst: and without understanding why it breaks, we don't know if that fix is actually a good one or not
22:57cyberpear: how can we find out why it breaks?
22:57cyberpear: I assume we're not holding breath for nv to release specs
22:58karolherbst: well, we can't know
22:58karolherbst: the hw won't tell us
22:59karolherbst: well, there might be ways to try to extract state from the parts which run in secure mode... but... well, that's quite hard as hw engineers tried their best so that it won't happen
23:05cyberpear: that's frustrating...
23:05cyberpear: is it possible to load nouveau after the system is booted?