00:31 ilios: Is there any information i can find about VRAM physical memory map? For example, CUDA binary code will be uploaded at 0xABCD in VRAM, etc..
00:33 skeggsb: doubtful.. userspace wouldn't even know where that was
00:33 skeggsb: not to mention it's no even necessarily in a linear block (there's page tables)
00:33 skeggsb: not*
00:34 ilios: oh yes, GPU also have MMU
00:35 ilios: But aren't there any linear region? like linear region kernel within CPU system memory.
00:36 skeggsb: a lot of stuff is uploaded directly through the push buffer, other stuff is copied from system memory buffers.. it really depends, on loads of stuff
00:36 ilios: assuming i have kernel privilege to access pci region so i can use envytools like nvadownload, etc.
00:37 skeggsb: if you're trying to capture stuff nvidia does, valgrind-mmt is your best bet
00:37 skeggsb: it has magic to decode all sorts of stuff automatically
00:37 ilios: oh thank you for suggestion!
00:38 ilios: Do you mean push buffer for PFIFO?
00:43 skeggsb: in this case, yes
00:43 ilios: thanks
01:02 imirkin: skeggsb: btw, there were grumblings about MSI fail on nv47 as well
01:02 imirkin: skeggsb: i suspect the whole family is buggered in unpredictable ways
01:10 skeggsb: imirkin: i *suspect*, but haven't confirmed yet, that a whole lot of them probably need the same treatment as nv50
01:10 skeggsb: (ie. poke it via pci config space, rather than the mmio interface to it)
01:11 skeggsb: that would probably explain why hans didn't see anything when he traced nvidia doing msi..
01:12 imirkin: unrelatedly, here's a ddr3 doc: http://www.samsung.com/global/business/semiconductor/file/product/ddr3_device_operation_timing_diagram_rev121.pdf
01:12 imirkin: it has the timings that i "discovered" in my trace, and a couple of extra ones
01:12 skeggsb: not sure i have a nv47, but i'll plug some other nv4x in tomorrow and see what happens too.. my nv46 is definitely broken, i wasted hours looking for regressions in my rework before i remembered...
01:12 skeggsb: the joys of skipping sleep for a night ;)
01:13 imirkin: doh!
01:13 imirkin: it took me a bit to remember too that my nv17 only works in one but not the other pci slot, for some odd reason
01:13 skeggsb: technology is fun like that :)
01:14 skeggsb: my pcie test machine in the office occasionally likes to stop detecting pcie boards when you change them, until you reset the cmos...
01:14 skeggsb: that screwed me for ages too once upon a time
01:14 imirkin: that's a sad property
01:14 imirkin: i figured RH would be able to spring for some decent dev boxes for you guys
01:14 skeggsb: i think i just picked a bad mobo :P
01:15 imirkin: or at least shelves to put your GPU's on :)
01:15 skeggsb: that's more my fault for not asking :P the bne office isn't meant to have the likes of airlied and I in it
01:15 skeggsb: ... which reminds me.. i still need to do that
01:16 imirkin: which is why they put you guys together
01:25 imirkin: hrm, actually looking at that doc, the WR = 15 thing *is* odd... none of them have it, only 14 and 16
01:31 pq: I just recalled... maybe http://people.freedesktop.org/~pq/nouveau-drm/ has not been useful in years and should be put to rest finally?
01:41 pq: hm, there are still references to that in http://nouveau.freedesktop.org/wiki/InstallNouveau/
01:42 pq: and in http://nouveau.freedesktop.org/wiki/InstallDRM/
03:24 karolherbst: yeah
03:24 karolherbst: got it to work now
03:25 karolherbst: so does anybody know what is a good instruction ordering at least for my gk106? or should I just look what the blob produces and try to find a pattern?
03:42 karolherbst: imirkin: strange, after handling 731 instructions in a BB it works, with 732 it does not :/
03:44 karolherbst: I just flush my lists now and it kind of works, I checked the difference, but there were just a sat, two mov, a mad, a shr different and their regs aren't imortant for the instructions they jumped over :/
04:50 karolherbst: nice found the source of the DRI_PRIME tearing issues
05:37 Eliasvan: karolherbst: hi
05:37 Eliasvan: karolherbst: I've got some questions if you don't mind:
05:37 Eliasvan: karolherbst: First a practical one: what setup do you use to test your nouveau kernel patches? Is it possible to only build the "nouveau.ko" module, and insmod it while running?
05:37 Eliasvan: karolherbst: 2. How do I use envytools to reclock the pci link speed?
05:37 Eliasvan: karolherbst: 3. What procedure did you use to reverse engineer mem reclocking on the 770M? (rough procedure)
05:37 Eliasvan: karolherbst: If you're interested, once I did a mmiotrace of power-level switching by the nvidia blob on my GTX760, it's in the same mail that contains the vbios.
05:40 karolherbst: Eliasvan: http://cgit.freedesktop.org/~darktama/nouveau/ this is the standalone nouveau repository
05:40 karolherbst: cd into drm
05:40 karolherbst: make
05:40 karolherbst: then you got nouveau/nouveau.ko
05:40 karolherbst: *get
05:41 karolherbst: 2. nvapeek 0x08c040 => read the value and set 0x00001 with mask 0xc0001
05:42 karolherbst: 3. checked mmiotraces looked into the pdaemon code and noticed PLL values are completly different, then I investigated this
05:42 Eliasvan: 1. cool, thanks
05:42 Eliasvan: 2. how can I know that means that?
05:43 Eliasvan: 2. I mean, how do I know I should take 0x08c040?
05:44 karolherbst: 0x08c040 is the reg
05:45 karolherbst: it should contain something like 80089000
05:45 karolherbst: or 40489000
05:45 karolherbst: or 80889000
05:45 Eliasvan: ah, it's probably in the docs
05:45 karolherbst: or something else
05:45 karolherbst: which docs?
05:45 Eliasvan: docs about the registers
05:46 karolherbst: which docs? :D
05:46 karolherbst: I would be suprised if there are any
05:46 karolherbst: you mean like from nvidia?
05:47 karolherbst: Eliasvan: but you should simply do a nvapeek 0x08c040
05:47 karolherbst: it doesn't do much
05:47 karolherbst: only reads the values out
05:47 Eliasvan: 3. thanks! do you think you can see some patterns if you got my mmiotrace? (just asking ;) )
05:47 karolherbst: for some registers the gpu might get upset, but not for this one
05:47 karolherbst: 3. for which problem?
05:48 karolherbst: 3. skeggsb said something about an isohub issue, but it looked like he wants to find that out. I am not that good with memory reclocking in general, it was just a lucky guess of mine
05:48 Eliasvan: no problems, just a mmiotrace for reclocking using the nvidia-settings
05:49 Eliasvan: *nvidia-settings tool from nvidia
05:51 karolherbst: yeah, I don't think this will be of much help, because I have made one myself
05:51 Eliasvan: karolherbst: 2. so the exact location of the register (0x08c040 in this case) was probably found by reverse engineering?
05:51 karolherbst: yeah
05:51 Eliasvan: karolherbst: yeah, makes sense
05:51 karolherbst: actually it was known for a long time already
05:52 karolherbst: but I guess nobody really thought that it may improve performance at all
05:52 Eliasvan: 2. and it it documented in the envytools repo somewhere?
05:52 karolherbst: and actually it really does not. Some benchmarks are fine with that
05:52 karolherbst: and the talos principle gets like a 20% boost
05:52 karolherbst: but I doubt there is much more
05:52 karolherbst: "lookup 0x08c040"
05:52 karolherbst: as a command
05:53 karolherbst: you can also give your value with it
05:53 karolherbst: try "lookup 0x08c040 $(nvapeek 0x08c040| cut -d\ -f2)"
05:54 Eliasvan: ok, thank you very much for your help, maybe I'll manage to make the mem reclocking work on my card
05:54 karolherbst: mhhhh
05:54 karolherbst: its difficult
05:54 karolherbst: the thing I found out was just simply something wrongly done before
05:55 Eliasvan: because if your patch works for you, maybe I can make a similar one/edit the one you made
05:55 karolherbst: it also works for you
05:55 karolherbst: there are several issues with gddr5 reclocking
05:55 karolherbst: and my patch only fixes one of them
05:55 karolherbst: I still get some hangs
05:55 karolherbst: allthough its pretyy rare
05:55 karolherbst: and maybe because I have a hybrid gpu laptop
05:56 karolherbst: and the intel card usually renders everything
05:56 Eliasvan: as you saw in dmesg, the pstate setting to 0e 0f was not effective on my card
05:56 karolherbst: this is another problem
05:56 Eliasvan: ah, ok
05:56 karolherbst: it has nothing todo with gddr5
05:56 karolherbst: it just tries to set a voltage where nouveau said: no, you can't set "this" voltage
05:57 karolherbst: no idea how to handle this correctly
05:57 Eliasvan: hmm, ok
05:57 karolherbst: the memory is clocked up, so this part works
05:57 karolherbst: its just around 250MHz core speed you are missing
05:58 Eliasvan: correct
05:58 karolherbst: which is no big deal in general
05:58 Eliasvan: but is 250MHz extra that important?
05:58 karolherbst: I think with nouveau more stuff is memory bottlenecked than with nvidia
05:58 Eliasvan: ah, ok
05:59 karolherbst: actually its nearly 300MHz
05:59 Eliasvan: because of the pci link speed difference?
05:59 karolherbst: so you may not get +30% at most
05:59 karolherbst: and this is pretty optimistic
05:59 karolherbst: no, I think it has more todo with what nouveau feeds the card with
05:59 karolherbst: if the instructions aren't optimised in order, the gpu has to read from memory a lot
06:00 karolherbst: and this decreases speed
06:00 Eliasvan: yeah probably because nouveau is mostly made through reverse engineerinng, so soe commands might be suboptimal?
06:00 karolherbst: Eliasvan: you could try that patch out: https://github.com/karolherbst/nouveau/commit/5554a27415b61a59f1667074cd2162c9f2470cdf
06:01 karolherbst: but I doubt that a higher core clock will be much of a performance gain
06:01 Eliasvan: hmm, yeah
06:01 Eliasvan: would it apply on 4.1?
06:01 karolherbst: yeah
06:01 Eliasvan: nice
06:01 karolherbst: it just makes nouveau not be that picky with the voltage
06:01 karolherbst: you should add debug=debug to the module load then
06:02 karolherbst: just to check which voltage gets set
06:02 Eliasvan: oh, ok
06:03 karolherbst: if it gets too high, you may want to clock down again
06:04 karolherbst: but its only a heat problem
06:04 Eliasvan: "too high": how do I know?
06:04 karolherbst: something above 1.5V is pretty high
06:05 karolherbst: but it will print it in uv
06:05 karolherbst: so 1500000uv
06:06 Eliasvan: ah thanks, good to know :)
06:25 karolherbst: Eliasvan: any results yet?
06:28 karolherbst: nice, DRI_PRIME gives better performance than bumblebee-nouveau
06:31 Eliasvan: karolherbst: yes, I'll attach you a dmesg.txt
06:33 Eliasvan: http://pastebin.com/2tXLfnZP
06:34 Eliasvan: V < 1.2375V at all times
06:34 karolherbst: okay nice
06:34 Eliasvan: and V(0e) == V(0f)
06:34 karolherbst: this value seems reasonable
06:34 karolherbst: yeah of course
06:35 karolherbst: the core needs a specific voltage to run stable at a given clock
06:35 Eliasvan: actually, what is the diff between 0e and 0f?
06:35 karolherbst: and if the clock is same, the voltage can be the same too
06:35 karolherbst: mhhh
06:35 karolherbst: not quite sure
06:35 karolherbst: might be gpu boost related
06:35 Eliasvan: is nvidia trying to trick us here, somwhere?
06:35 Eliasvan: oh, ok
06:36 karolherbst: but I really don't know
06:36 karolherbst: gpu boost more a software thing anyway. It just runs at higher clocks until a given temperatur is reached
06:36 karolherbst: then it clocks down
06:37 karolherbst: usually above the pstate boundaries
06:37 karolherbst: also this is windows only for ow
06:37 Eliasvan: ah, interesting
06:38 Eliasvan: hmm, a bit weird why the Linux nvidia blob doesn't have it...
06:40 karolherbst: not worth it in general
06:41 karolherbst: if you really want higher clocks you can also just enable coolbits
07:25 tobijk: imirkin: have you ever had time to constant fold OP_CVT yet?
07:45 karolherbst: Eliasvan: any difference in performance on 0e/0f?
07:48 Eliasvan: karolherbst: Just to be sure, when you say "nvapeek 0x08c040 => read the value and set 0x00001 with mask 0xc0001", do you mean:
07:49 Eliasvan: karolherbst: write(address=0x08c040, value=((read(address=0x08c040) & ~0xc0001) | (0x00001 & 0xc0001)))
07:49 Eliasvan: karolherbst: such that, since read(address=0x08c040) == 80089000, I have to write (with nvpoke): 80009001 ?
07:49 karolherbst: yes
07:49 karolherbst: the 8 has to change to 0
07:49 karolherbst: 4 would be 5.0GT/s
07:49 karolherbst: 0 is 8.0GT/s
07:49 karolherbst: and the 1 is the commit bit
07:49 Eliasvan: the first 8?
07:49 karolherbst: the blob does this: write 80009000, read, write 80009001
07:50 karolherbst: no, the second
07:50 Eliasvan: oh, ok, so I got it right
07:50 karolherbst: yes
07:50 Eliasvan: thnks
07:50 karolherbst: it seems to work with one read though
07:50 karolherbst: you can verify with lspci
07:50 karolherbst: lspci -vv -s 01:00.0
07:50 Eliasvan: I have to reboot my machine, lockup again ;)
07:50 karolherbst: LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
07:51 karolherbst: you should get somethig like that then
07:51 Eliasvan: oh, thanks, that's useful!
07:55 Airwave: I have a early 2008 Mac Pro with an 8800 GT running Fedora 22. If I try using nouveau, I get a kernel panic, usually within a few hours after boot. The proprietary driver runs fine.
07:57 Airwave: I usually try nouveau a couple of times a year to see if it's gotten better, but it's stayed the same for the last couple of years. Would it be useful if I send bug reports?
07:57 Eliasvan: karolherbst: after writing to the register, I don't see any difference in lspci: http://pastebin.com/Awk48zCh
08:05 Eliasvan: karolherbst: I don't see any change in performance in glxspheres when nvpoking 80009001 on 0e/0f
08:08 Eliasvan: karolherbst: but since lspci doesn't mention 8GT/s, I think the register value had no effect (although nvpeeking did confirm 80009000)
08:10 karolherbst: Eliasvan: please use -vv
08:10 karolherbst: ohh
08:10 karolherbst: you have to run lspci as root
08:10 karolherbst: for this
08:15 pmoreau: Airwave: You should: we don't know about all possible issues as we don't have every possible hardware.
08:16 karolherbst: Airwave: which kernel are you using ?
08:17 karolherbst: but a kernel panic after some hours is not that bad, it could have been worse
08:18 pmoreau: Except you might end up loosing some work, whereas if it happens during boot or shortly after you're more on the *safe* side
08:20 karolherbst: would be nice to see dmesg outpuit
08:27 Eliasvan: karolherbst: http://pastebin.com/RDUtCBVf
08:28 Eliasvan: karolherbst: from 2.5 to LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
08:28 karolherbst: mhh okay
08:28 karolherbst: maybe your board doesn't support it
08:29 Eliasvan: weird, my motherboard should support x16...
08:29 karolherbst: x16 is the width
08:29 karolherbst: not the speed
08:29 karolherbst: "Width x16"
08:29 Eliasvan: oh yeh
08:29 karolherbst: do you have a pcie v3 board?
08:29 Eliasvan: my gpu?
08:29 Eliasvan: no, I think a v2
08:31 Eliasvan: however, what I don't understand is the following line in the non-nvpoked state:
08:31 Eliasvan: LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
08:32 Eliasvan: why would the target link speed be 8GT/s??
08:32 karolherbst: for me its always 8.0
08:34 Eliasvan: anyway, you have your answer: for 5GT/s, there is no perceivable performance difference for the 0e/0f states over 0a
08:38 karolherbst: yeah
08:38 karolherbst: its only for some special cases
08:38 karolherbst: benchmark usually get a boost
08:39 karolherbst: or some of them
08:39 karolherbst: I get double fps with glxspheres for example
08:46 karolherbst: Eliasvan: could you do a nvapeek 0x02241c
08:47 karolherbst: we still have to figure out how to determine if we can go to 5.0 or 8.0
08:48 karolherbst: also nvapeek 0x088088 would help
08:56 Eliasvan: karolherbst: Reading the specs on my motherboard: a PCI Express 2.0 x16 slot
08:57 karolherbst: okay
08:57 karolherbst: yeah, but maybe the gpu tells us this somewhere
08:59 Eliasvan: nvapeek 0x02241c => 00000081
09:02 Eliasvan: nvapeek 0x088088 => 11010040
09:03 Eliasvan: so :
09:03 Eliasvan: PUNITS.PCI => { PCIE_VERSION = 2 | PCI_CLASS = DISPLAY | PCIE_SPEED = FULL }
09:03 Eliasvan: PPCI.EXP_LNK_CMD_STA => { CMD = { ASPMC = 0 | CCC } | STA = { SPEED = 2_5GT | WIDTH = 16 | SL_CLK } }
09:03 Eliasvan: and I'll now do a link change...
09:04 karolherbst: mhh no its fine
09:04 karolherbst: there should be something which tells us that 5.0 is the board maximum
09:05 karolherbst: or what the card can do maximum currently
09:05 Eliasvan: hmm yeah; I guess the card can do 8GT/s, but the motherboard is kinda old
09:06 Airwave: karolherbst: 4.1.2-libre.200.fc22.gnu.x86_64
09:06 karolherbst: Airwave: yeah then dmesg would help a lot
09:06 karolherbst: I mean, what is the actual crash
09:07 Airwave: The whole system freezes.
09:07 karolherbst: yeah, but there should be some kind of message somewhere
09:07 Airwave: When I reboot and check the log, it says that it had a kernel panic.
09:07 karolherbst: there are tons of reasons why a system can freeze
09:07 karolherbst: yeah, but where
09:07 karolherbst: what kind of panic
09:07 karolherbst: what is the stacktrace
09:07 karolherbst: what went wrong exactly
09:07 Airwave: I will change back to nouveau and get you more info in probably a pretty short amount of time.
09:08 karolherbst: okay, thanks
09:12 Airwave: karolherbst: Okay, I'm back on nouveau. I'll let you know when it crashes.
09:13 Airwave: I'm on 4.1.4 now by the way. There was an upgrade since I last rebooted.
09:58 imirkin: tobijk: yeah, i improved it, but with my latest changes it'll need minor fixing: https://github.com/imirkin/mesa/commit/d9901d21426d2fac8161229870d791d13b3bcc77
10:28 tobijk: imirkin: ah interesting, i just found my old test patch for it :) seems to work fine as well ;-)
10:36 karolherbst: imirkin: what would be a good reordering goal? reduce space between reg write and last reg read?
10:36 karolherbst: so that in general less registers are "in use" in avarage
10:37 karolherbst: ....
10:37 karolherbst: imirkin: what would be a good reordering goal? reduce space between reg write and last reg read?
10:37 karolherbst: so that in general less registers are "in use" in avarage
10:41 tobijk: karolherbst: yeah using less registers for the same task is a start :) makes us swap less in/out of vram
10:41 tobijk: (spilling) :)
10:47 tobijk: karolherbst: you could try to spill in a more clever way (which i failed at doing properly :/) http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp?id=2438e2fe326d7cb9f9d003f6edf77821e41ef22c#n1576
10:48 tobijk: *unspill
10:52 imirkin: karolherbst: a good goal would be to use up to N registers, e.g. 8 or 16
10:52 imirkin: karolherbst: also when scheduling, make sure that tex ops are as far away as (reasonably) possible from their uses
10:52 imirkin: so subject to those same limits
10:53 imirkin: but when there are other things that will consumer registers, use those first, before using the tex's values
10:53 martm: i suppose more regs used the better, since regs are lots faster then vram of course:)
10:53 imirkin: there are also latency values that you can get from the target. i think that they're moderately accurate
10:53 imirkin: basically it's the number of cycles (aka instructions) for a value to become available after the instruction runs
10:55 tobijk: imirkin: do we have a clue about how long the instructions take?
10:55 imirkin: (i.e. target->getLatency(i) or something)
10:56 imirkin: er sorry. s/latency/throughput/. or something.
10:56 karolherbst: mhh
10:56 karolherbst: okay
10:56 karolherbst: but I think I will really do one thing at the time
10:56 imirkin: http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp#n511
10:56 karolherbst: imirkin: okay so tex instructions at the start makes always sense?
10:57 tobijk: mhm in the end its a compromise between register usage and early instruction scheduling
10:57 imirkin: karolherbst: well... you can't *just* do that. they each produce 4 registers worth of values. so if you have too many, it'll suck a lot.
10:57 karolherbst: mhh
10:57 karolherbst: what is the reason tex should go fist, a lot of cycles used?
10:57 martm: if one goes with perfectly tiling vram , it should be like 2x slower then reg, i mean when it goes through cache
10:58 imirkin: it's an async instruction
10:58 imirkin: in order to use its values, you have to insert a texbar opcode
10:58 karolherbst: ahhh
10:58 imirkin: which will just sit there and wait until the tex completes.
10:58 karolherbst: so tex* early and texbar late?
10:58 imirkin: so you want to give the tex values a chance to come back on their own before just waiting around
10:58 imirkin: you'll never see texbar
10:59 imirkin: it's inserted in a post-ra pass
10:59 karolherbst: I see
10:59 karolherbst: when the reg is read
10:59 karolherbst: then this is inserted before
10:59 imirkin: exactly.
10:59 karolherbst: okay
10:59 imirkin: there's a bit more subtlety to it, but essentially yes.
10:59 karolherbst: at least my really bad order isn't worse then stock nouveau
10:59 martm: but mesa memory manager seems good for tiling, but the reg allocator perhaps could be faster prolly including the scheduler
10:59 imirkin: i.e. you only want one texbar per tex. also it takes a "depth" value, i.e. if you do
10:59 imirkin: a = tex; b = tex; c = tex; use(a);
10:59 imirkin: then you insert "texbar 2"
11:00 imirkin: which means that there can still be 2 tex's outstanding
11:00 karolherbst: yeah okay
11:00 imirkin: this is loads-o-fun when you have a tex in a loop :)
11:01 imirkin: not to mention WaW hazards, e.g. a.xyzw = tex; a.y = 1.0;
11:02 imirkin: you have to make sure the tex completes before the a.y write because otherwise it'll end up overwriting that register later on
11:02 karolherbst: :D
11:02 karolherbst: this is my current version btw: https://github.com/karolherbst/mesa/commit/6c583b48b4d9009e712032b7f1c65ee7f6163efa
11:02 karolherbst: ohhh I changed the flushing stuff
11:02 karolherbst: wait a sec
11:02 imirkin: er hm, robclark -- just realized this is also an issue on freedreno -- do you add the (sy) flag in that case?
11:03 robclark: imirkin, using result of sam instr.. yeah, would need (sy)..
11:03 robclark: (sry, didn't read scrollback)
11:04 imirkin: robclark: not using, overwriting
11:04 imirkin: e.g. if you have like a.xyzw = tex; a.y = 1.0;
11:04 robclark: hmm..
11:04 karolherbst: okay, current working version: https://github.com/karolherbst/mesa/commit/ed8ac47586f1b982051eedfd007694ec4b766fca
11:04 robclark: so WAW..
11:05 imirkin: right
11:06 robclark: I think we only do it for WAR.. need to look at the code more carefully, but possible we have WAW issue..
11:07 imirkin: karolherbst: split the loop into 3 parts
11:07 imirkin: karolherbst: phis, main, exit
11:07 tobijk: karolherbst: where does the magic 731 come from? :D
11:07 karolherbst: don't know
11:07 karolherbst: a unigine shader needs it
11:07 karolherbst: crash with 732
11:07 karolherbst: works with 731
11:08 martm: karolherbst: i did not read the stuff too, what is your goal chaning the codegen?
11:08 martm: *changing
11:08 martm: and sorry for interrupting the convo
11:09 karolherbst: imirkin: is this bad? 34: tex 2D $r1 $s0 f32 $r6d $r6d (8) 35: texbar - # $r6d (8)
11:09 karolherbst: :D
11:09 martm: aah yeah one of the reclocking patches made the stuff crash
11:09 karolherbst: I think this was one thing you were talking about
11:11 tobijk: karolherbst: so you have a te insn and next the texbar? :/
11:11 tobijk: as far as i have understood him the tex insn should be far away from the texbar
11:11 imirkin: then you lose all the idiotic conditions all over
11:11 imirkin: i won't even ask about the flush == 731 bit...
11:11 imirkin: karolherbst: more generally, this approach can't work
11:11 imirkin: karolherbst: you really do have to do it the way i said ;)
11:11 imirkin: or make a very strong argument to convince me otherwise
11:11 karolherbst: mhh strange, there is something odd
11:11 imirkin: this is a nice toy, got you to understand some of the IR and issues/etc, but it's just not an approach that would ever lead to improvement
11:12 karolherbst: I know
11:12 karolherbst: I first have to know how to not break stuff
11:14 imirkin: yeah, you don't want tex; texbar. you want stuff in the middle.
11:14 imirkin: not always possible of course, but that's the preference.
11:18 karolherbst: imirkin: okay, splitting into main does make sense, so I just collect all instructions not in phi/exit and go over them again with some sane ideas
11:19 karolherbst: currently I just wanted to order all "nodeps" insutrctions at the top to see if that kind of works without generating crashes or something
11:23 imirkin: yeah, i get what you were doing ;)
12:05 karolherbst: imirkin: like this? https://github.com/karolherbst/mesa/commit/311d0e4752d31ef240ea67470af5c569c21f783b
12:05 karolherbst: only that I have to do usefull stuff in "doMain" now
12:06 karolherbst: I should clean up that loop :/
12:06 imirkin_: karolherbst:
12:06 imirkin_: er, there was something missing there
12:07 karolherbst: ?
12:07 karolherbst: whats missing
12:07 imirkin_: karolherbst: for (bb->getFirst; != bb->getEntry; ...) {} for (getEntry(); !exit op) {}; for (getExit(); ...) {}
12:07 imirkin_: my comment to you was missing ;)
12:07 karolherbst: I see
12:07 karolherbst: ohhh
12:08 karolherbst: yeah seems much easier
12:11 martm: Vasco: are you from portugal, is it allready determined that you guys also like cold wethear and we should count that we are conquered?
12:16 karolherbst: imirkin_: https://github.com/karolherbst/mesa/commit/a69ef7cbbd753de5091a4daad1176df9ddcb428a
12:16 imirkin_: karolherbst: great. and now you lose all the "what state are we in" logic
12:17 imirkin_: much clearer imo
12:17 karolherbst: yeah
12:17 karolherbst: I did a lot of stuff myself
12:17 karolherbst: becaues I wasn't aware of that ordering
12:17 karolherbst: this 731 bugs me though
12:17 imirkin_: yeah, no worries
12:17 imirkin_: compilers are tricky
12:17 karolherbst: with 732 I get Error compiling program: -4
12:17 imirkin_: feel free to add docs as you go, btw
12:18 karolherbst: okay, now I could move the tex* unstructions up
12:19 glennk: btw quick drive by, beware of iostream usage in drivers, see ecde4b
12:20 martm: Vasco: it isn't like you get more estonian girls, this was just a major slut you got, it's fair that portuguese receive that crap
12:20 karolherbst: glennk: thats for debugging
12:21 karolherbst: I don't intent to leave it there
12:31 martm: hehe i would not say, that we'd make a kebab roll from you, it would be a girls choise no matter what, i doubt most or even some would pick to live with portuguase or chipsy
12:32 martm: at least my bigger seats are better then your ronaldos
12:44 RSpliet: martm: wrong window
12:46 martm: aah yeah, that is the benefit of being top-end sportsmen, while some come because of slut , a lot of real dudes just know what they are doing, high end seats all over the history that i have seen
12:46 martm: Roy did you get a job?
12:48 martm: i mean no irony, or sarcasm, well i do not work, but i am glad you got some work if you did!
12:51 martm: mupuf_ stuff went wery interesting with carbon based substances that we inspected with glennk:) turns out that fermentation process of cellulose when freeze dried with dry ice and heated with hydrogen forms graphene
12:52 marcheu: joss...
12:56 imirkin_: pq: the nouveau/linunx-2.6 tree is in somewhat of a disrepair
12:56 imirkin_: pq: given that skeggsb's tree builds an out-of-tree module, not sure if your thing is needed
12:56 imirkin_: pq: i'd update the various wikis and move on
13:04 karolherbst: imirkin_: are you sure ra is smart about texbar?
13:05 imirkin_: texbar is inserted after RA
13:05 imirkin_: it's not infinitely smart, but it's pretty smart
13:06 imirkin_: do you have an example where the texbar gets inserted earlier than it needs to be?
13:06 karolherbst: ohh, I currently see, that I don't reorder the tex instructions, because they depen on something inside the BB :/
13:06 imirkin_: yeah, that will usually be the case
13:07 karolherbst: okay with "texfetch 2D $r0 $s0 f32 $r44d $r4t"
13:07 karolherbst: what is the dest?
13:07 imirkin_: $r44 and $r45
13:07 karolherbst: okay
13:07 imirkin_: the first 2 things are irrelevant, ignore them
13:08 karolherbst: and always the given one + 1?
13:08 imirkin_: [they're actually not registers at all]
13:08 imirkin_: no, $r44d = double-wide
13:08 karolherbst: I see
13:08 imirkin_: there's also t = three, and q = quad
13:08 karolherbst: for tex 2D $r1 $s0 f32 $r6d $r6d it would be $r6 and $r7?
13:08 imirkin_: yes
13:09 karolherbst: wow, nouveau isn't smart about tex at all :/
13:09 karolherbst: always the same patter with that shader
13:09 karolherbst: tex* texbar usage
13:09 karolherbst: never something in between
13:10 karolherbst: and always bs like "and u32 $r16 $r1 0x000000ff"
13:10 karolherbst: ...
13:10 imirkin_: i mean... that's what the shader code calls for
13:10 imirkin_: you can't always insert random things there
13:10 karolherbst: yeah I know
13:10 imirkin_: however one of my opts should hopefully improve some of that bs, but obviously not the scheduling of it
13:11 karolherbst: nice :)
13:11 karolherbst: now I have to check how far I could move those tex thingies up
13:11 imirkin_: i.e. instead of shift + and + i2f, it'll just be a single i2f
13:11 karolherbst: loop over ->prev until usage?
13:12 imirkin_: huh?
13:12 imirkin_: you know the usage
13:12 imirkin_: src->getInsn() :)
13:12 karolherbst: ohhh
13:12 karolherbst: so I could simply insert it after that?
13:12 imirkin_: which will refer to a merge instruction
13:12 imirkin_: as i said earlier
13:12 imirkin_: you really need to keep a list of schedulable instructions
13:13 imirkin_: everything you do until you have that will be totally buggy by definition
13:13 imirkin_: s/buggy/horrible/
13:14 karolherbst: mhh but isn't everything kind of scedulable somehow? (except obvious exceptions)
13:14 imirkin_: no
13:15 imirkin_: you can't schedule an instruction before the instruction it depends on has been scheduled
13:15 karolherbst: ahh
13:15 imirkin_: so you need to keep a list (or set or some other data structure) of all the isntructions that are ready to be inserted into the instruction stream *rigth now*
13:15 imirkin_: and then pick the "best" one, insert it, repeat
13:15 karolherbst: so I kind of loop over a list and find all instructions without a dep inside that list
13:15 karolherbst: then I schedule those and repeate until the list is empty
13:15 imirkin_: for now your definition of "best" can be "pick random instruction from list"
13:16 imirkin_: and we can improve from there
13:16 imirkin_: however the most important thing
13:16 imirkin_: is that you build up the list
13:16 imirkin_: ideally in an efficient manner
13:16 imirkin_: so that this doesn't become O(n^2) in instructions in the BB (or higher order)
13:16 karolherbst: ohh
13:16 karolherbst: I usually go only linear over the list I have
13:16 imirkin_: sure, but if you do that N times
13:16 imirkin_: then...
13:16 imirkin_: ;)
13:17 karolherbst: yeah I know
13:17 karolherbst: but is my idea okay or not so okay?
13:17 imirkin_: so you want to be a big cleverer
13:17 imirkin_: if your idea is the exact same thing i said, then it's ok
13:17 imirkin_: if it's in any way different, then it's not
13:17 RSpliet: imirkin_: got some literature on that? and... I guess there should be a dependency graph already since it does RA
13:17 karolherbst: yeah that's why I asking, because maybe its not ;)
13:18 imirkin_: karolherbst: build up the list of schedulable instructions, and maintain it as you schedule instructions.
13:18 imirkin_: karolherbst: once you have that, we can worry about how to pick which schedulable instruction to pick
13:18 imirkin_: RSpliet: nothing great, unfortunately.
13:18 RSpliet: hmm, better bug Colin then, I think he has mountains of literature behind what he used for NIR
13:18 imirkin_: RSpliet: but we have all the uses, so as you shcedule an instruction, look at its uses and see if they've suddenly become schedulable
13:18 karolherbst: so idea: iterate over a list of instructions and find those who are schedulable. schedule all of them, then find new ones that become schedulable
13:19 imirkin_: karolherbst: no. schedule ONE of them
13:19 imirkin_: karolherbst: and see if any new ones have become schedulable as a result
13:19 karolherbst: was thinking about that, but I thought I shouldn't be that smart now
13:19 imirkin_: step 1 is creating that logic
13:19 imirkin_: the cleverness comes in step 2
13:19 karolherbst: okay
13:19 imirkin_: which is around *which* instruction to schedule
13:19 imirkin_: for now if you do "random", it should be fine
13:20 imirkin_: (or "first one", wtvr)
13:20 imirkin_: the important thing is to have that list available to you
13:20 karolherbst: so currently I get a list of instrcutions between phi, fixed and exit nodes
13:20 karolherbst: so I already have a list with instrcutions without those
13:20 karolherbst: now its only about dependencies? or something else
13:20 imirkin_: but you can't just go around scheduling any one of them
13:20 imirkin_: they're not all schedulable
13:20 karolherbst: I know
13:21 imirkin_: to start with, the only schedulable ones are the ones that don't have any deps in that bb
13:21 karolherbst: but I want to figure out when an instruction is schedulable
13:21 karolherbst: okay, that's clear
13:21 imirkin_: so that's your starting list
13:21 karolherbst: some other constraints?
13:21 imirkin_: but then you schedule ONE instruction
13:21 imirkin_: and see which other instructions in that bb use that instruction
13:21 imirkin_: and if all the instructions that one depends on have been scheduled already, then all of a sudden that instr becomes schedulable
13:22 imirkin_: make sense?
13:22 karolherbst: bb or current list?
13:22 imirkin_: bb
13:22 imirkin_: since after a fixed instruction
13:22 imirkin_: it's likely that ALL instructions will depend on something that happened earlier
13:22 karolherbst: yeah, but these come later
13:23 karolherbst: and aren't part of the list
13:23 imirkin_: heh
13:23 imirkin_: sure
13:23 imirkin_: but let's say that you got to them
13:23 imirkin_: now the list is just those instructions
13:23 imirkin_: and they all depend on stuff already scheduled in the bb
13:23 imirkin_: that situation needs to work ;)
13:23 karolherbst: I have stuff like this now: phi mainList1 fixed mainList2 fixed mainList3 exit
13:23 karolherbst: and I only look at one mainList at the time
13:24 imirkin_: right
13:24 imirkin_: i think you get what i'm saying
13:24 imirkin_: however you implement it is fine
13:24 karolherbst: yeah but why do I have to check in the bb then?
13:24 imirkin_: but you absolutely must have a list of schedulable instructions available to you.
13:24 imirkin_: schedule one of them
13:24 imirkin_: make sure the schedulable list is up-to-date
13:24 imirkin_: repeat
13:24 imirkin_: however you implement that logic is fine.
14:29 karolherbst: imirkin_: I see that ArrayList::remove is a pretty messy thing to do :/
14:30 karolherbst: maybe I just use std container and be happy with that, don't know
14:30 imirkin_: karolherbst: just use std::
14:32 karolherbst: k
15:00 karolherbst: what's the status on c++11 in mesa by the way?
15:00 imirkin_: lowest gcc allowed is GCC 4.2
15:00 imirkin_: so... no c++11. but i use tr1 every so often
15:00 karolherbst: okay
15:01 karolherbst: because there are better lists
15:01 karolherbst: like forward_list
15:01 imirkin_: i'm not really up on all the latest c++11 stuff tbh
15:01 karolherbst: std::list is double linked
15:01 imirkin_: there's always slist
15:01 karolherbst: forward_list is single-linked
15:01 imirkin_: which is single-linked as well
15:01 imirkin_: no clue what the diff is
15:02 karolherbst: slist is C or C++?
15:02 imirkin_: i just know that c++11 has initializer_list
15:02 imirkin_: std::slist
15:02 imirkin_: c++ :)
15:02 karolherbst: ahh
15:02 karolherbst: mhhh
15:02 imirkin_: https://www.sgi.com/tech/stl/Slist.html
15:02 karolherbst: I don't think its part of the c++ standard though
15:03 imirkin_: errrr dunno
15:03 imirkin_: it's always been there
15:03 imirkin_: i don't see it on cplusplus.com which is odd
15:03 karolherbst: yeah
15:03 imirkin_: but it's there in gcc and everything else i've seen
15:03 karolherbst: because its not part of the standard
15:03 imirkin_: i certainly wouldn't feel bad about using it
15:04 karolherbst: mhh
15:04 karolherbst: there is a reason there is forward_list in c++11 ;)
15:04 imirkin_: if you say so
15:04 karolherbst: they could call it slist though then :/
15:04 imirkin_: like i said, i haven't kept up on all the latest
15:04 imirkin_: c++11 came out after i stopped using c++ "professionally"
15:05 imirkin_: so i've only really investigated a few of its (awesome) features
15:05 imirkin_: like initializer lists :)
15:05 karolherbst: yeah
15:05 karolherbst: thats nice
15:05 karolherbst: std::thread
15:05 karolherbst: also nice
15:05 imirkin_: and it has bind now, finally
15:06 karolherbst: and user defined literals
15:06 imirkin_: and the whole foo&& thing is neat
15:06 imirkin_: bleh, those seem dumb :p
15:06 karolherbst: with c++14 you can actuall do: std::this_thread::sleep_for(1h+30m);
15:06 imirkin_: i started writing a c++11 compiler as part of...
15:06 karolherbst: *Actually
15:06 specing: C++ gets threading support in 2011...
15:06 imirkin_: i forget what the thing was called, some internet class
15:06 specing: Ada had it back in '95
15:06 imirkin_: which i ended up dropping coz it took them 6 months to release the next assignment and i lost interest
15:07 karolherbst: ... specing ... I hope you know why c++ hadn't it before
15:07 karolherbst: really I hope it
15:07 imirkin_: specing: and yet everyone didn't switch to ada in 95. how odd.
15:07 specing: karolherbst: sure http://www-users.cs.york.ac.uk/susan/joke/cpp.htm
15:07 karolherbst: nope
15:07 karolherbst: that's not the reason c++ hadn't threads
15:08 specing: so enlighten me
15:08 karolherbst: beacause while the last c++ standard were done, C didn't had thread support
15:08 karolherbst: and they didn't wanted to add something, C may do different and ABI breaks
15:09 karolherbst: but now C99 has threads and the next C++ could finally add it too
15:09 specing: ok
15:09 karolherbst: there are some other stuff C++ couldn't do because of C, but threading was like the biggest one
15:10 karolherbst: imirkin_: with c++17 there will be most likely filesystem stuff too
15:10 karolherbst: finally :D
15:11 karolherbst: imirkin_: on your link "Defined in the header slist, and in the backward-compatibility header slist.h. The slist class, and the slist header, are an SGI extension; they are not part of the C++ standard."
15:12 imirkin_: karolherbst: ah makes sense.
15:12 imirkin_: well, gcc has it :)
15:12 karolherbst: yeah, it makes sense to add something like that
15:12 karolherbst: but you can't rely on internals :/
15:12 imirkin_: i've always used it when i've needed an slist
15:12 karolherbst: and stuff may be different across compilers
15:12 imirkin_: and never had issues
15:12 imirkin_: admittedly it's a rare day i need a linkedlist
15:12 imirkin_: singly or doubly
15:13 karolherbst: well I need it now :D
15:13 imirkin_: you can also suck it up and just use a doubly-linked list and move on with life
15:13 specing: C++'s biggest weakness is its so-called backward compatibility
15:13 imirkin_: specing: i'd say it's c++ biggest strength
15:13 karolherbst: java *cough*
15:14 imirkin_: specing: and also the reason i refuse to touch python3
15:14 specing: imirkin_: at the start when you are migrating, yes
15:14 karolherbst: it obviosuly doesn't have it and all pm are scared moving to the next version
15:14 karolherbst: => wich weakens security
15:14 specing: imirkin_: not when you want to start sometihng new in it
15:14 imirkin_: and use python2 on all my projects, new and old
15:14 specing: I dislike python completely
15:15 karolherbst: yeah, perl is a lot nicer then python, a LOT
15:19 specing: < karolherbst> $yeah, $perl$ $is$$ $a $lot$ $nicer$ then$$ p$y$t$h$o$n, a $$$$LOT$$$$ FTFY
15:20 specing: so far I've used Bash for pretty much everything in script form
15:20 specing: but slowly migrating towards AdaScript
15:20 karolherbst: yeah, that make sense
15:20 specing: Bash gets really hard to maintain, really fast
15:20 karolherbst: by the way, what kind of compiler do you use
15:21 specing: GNAT gcc
15:21 karolherbst: ohh right, this still exists
15:21 specing: it is the de-facto Ada compiler
15:23 specing: got a cortex-m0 to do Ada on. I have the toolchain built to support AVR as well
15:26 karolherbst: oh man :/ iterating over containers in pre c++11 is annoying when you did a lot of c++11 stuff :D
15:27 specing: yes, hello 3-line for loops
15:27 specing: and then you say fsck it and do it the old [] way
15:27 specing: and then you get a segfault
15:28 imirkin_: karolherbst: oh yeah, auto is the other great thing about c++11 ;)
15:28 imirkin_: karolherbst: just typedef your way out of it
15:28 karolherbst: well its even easier
15:29 karolherbst: for (auto & stuff : list)
15:29 imirkin_: karolherbst: right
15:29 karolherbst: I meant this begin() end() insanity :/
15:29 imirkin_: good times
15:29 karolherbst: :D
15:29 imirkin_: but you get stuff like rbegin/rend ;)
15:29 karolherbst: awesome
15:30 imirkin_: which are useful every 100000000 times you write a loop
15:30 karolherbst: there is also std::for_each
15:30 imirkin_: oh right
15:30 imirkin_: i never use that stuff
15:30 karolherbst: so why use a loop at all
15:30 karolherbst: :D
15:35 RSpliet: ahem... maybe I should be embarrassed, but I tend to use C++ as C with objects, and try to steer clear from overly fancy constructions unless I decide I really need them
15:36 RSpliet: point in case: in a C++ constructor I regularly use "throw -ENOSYS" on failure rather than trying to contruct exception classes and objects
15:36 RSpliet: *hides*
15:36 karolherbst: ohhh I forgot
15:36 karolherbst: std::list orders
15:36 karolherbst: but not always
15:37 karolherbst: mhh
15:37 karolherbst: how was that again
15:37 imirkin_: RSpliet: you must love clover code :)
15:37 imirkin_: one might read it as "C Lover", but that is clearly not the case.
15:37 imirkin_: perhaps it should be renamed to cpplover
15:38 RSpliet: hah, clover does *all* the fancy things?
15:38 RSpliet: like, including a few they made up themselves, like garbage collection?
15:40 imirkin_: RSpliet: i dunno about *all*
15:40 imirkin_: but definitely most
15:41 RSpliet: oh I don't mind if others do it
15:41 RSpliet: but some contructions are just too much faff for my taste, esp. in the interest of "making life easier by not reinventing the wheel"
15:42 RSpliet: with an API that has so many restrictions that it takes more time studying them than to write it yourself.
15:58 karolherbst: mhhh
15:59 karolherbst: status update: no gpu hang today, allthough I only use it at 0f now
16:55 karolherbst: imirkin_: I need some help when I can call an instruction schedulable: currently I have something like Instruction *source = insn->getSrc(i)->getInsn(); if (source != NULL && source->bb == bb) notschedulable
16:55 karolherbst: but I don't think I can leave it like that while I work a list of instruction
16:56 karolherbst: because at some point instructions are part of a blob but already scheduled
16:56 karolherbst: *block
16:57 karolherbst: ohhh, found my thinking mistake
16:58 karolherbst: allthough...
16:58 karolherbst: no the source is either in my list of schedulable instructions or current list of not scheduled instructions or is already scheduled
16:59 karolherbst: mhhh
16:59 karolherbst: but I lost an instruction somehow anyway
17:13 imirkin: where'd it go? :)
17:13 imirkin: anyways, that logic only works for the first time
17:13 imirkin: after you hit a fixed instruction
17:13 imirkin: it's likely that the subsequent instructions will only have ops that have deps
17:13 imirkin: and even ops in the first list might all depend on the phi nodes
17:13 imirkin: which are fake deps
17:14 karolherbst: mhhh
17:15 karolherbst: currently my logic doesn't change even after I scheduled one instruction, so maybe I just need a total new one after I scheduled one :D
17:17 karolherbst: or can I pretty simply tell from one instruction which depend on it?
17:17 imirkin: sounds like you're not getting the algo yourself
17:17 imirkin: so let me spell it out a bit more
17:17 imirkin: (a) keep a unordered_set<Instruction *> which are the scheduled isntructions
17:18 imirkin: (b) when you schedule an instruction, look at it's def's uses
17:18 imirkin: for each use which is in your "overall" list, check whether its sources have all been scheduled already
17:18 imirkin: if so, add it to your "schedulable list"
17:18 karolherbst: ahh okay
17:18 karolherbst: didn't know that def part
17:18 imirkin: hope that makes more sense
17:21 karolherbst: sadly unordered_set are c++11 :p or am I allowed to use the tr1?
17:22 imirkin: i already use it
17:22 imirkin: there are some typedefs
17:22 imirkin: to get around various idiocy
17:22 karolherbst: which header should I use?
17:23 karolherbst: ohh
17:23 karolherbst: you use tr1 directly?
17:24 imirkin: iirc there were patches to normalize some of it
17:24 imirkin: and make it work on android
17:24 imirkin: did i not push those?
17:25 karolherbst: nv50_ir.h just used it
17:25 karolherbst: *uses
17:25 imirkin: looks like not.
17:25 imirkin: that's shitty of me =/
17:25 karolherbst: well, there is worse
17:25 karolherbst: depending on gcc4.2, well
17:25 karolherbst: is there any reason?
17:27 imirkin: that's the min mesa requirement
17:27 imirkin: i wasn't about to bump it up
17:30 karolherbst: is there some getIndirect stuff I have to take care off with the def instructions?
17:31 imirkin: mmmmmmmmmmmmmmmm
17:31 imirkin: yes.
17:31 imirkin: only 1-d though
17:31 imirkin: and extremely rarely :)
17:31 karolherbst: mhhh
17:31 karolherbst: too bad
17:32 imirkin: you'd be hard-pressed to write a shader that invoked such logic
17:32 karolherbst: rarely is something I don't want hear :D
17:32 imirkin: well, i can point you at a shader test that triggers it
17:32 karolherbst: how do I get this indirect one then?
17:32 imirkin: in practice, it's ~never
17:32 imirkin: for a lot of reasons
17:33 karolherbst: if I iterate over the devcount, how do I get the indirect instructions for an index? getIndirect I use for the source
17:33 imirkin: bin/shader_runner tests/spec/arb_tessellation_shader/execution/variable-indexing/tcs-output-array-vec4-index-wr.shader_test
17:34 imirkin: that one does it: 166: join export b128 # o[$r4+0x180] $r0q (8)
17:34 imirkin: oh crap, actually that doesn't even count
17:34 imirkin: since it's a source
17:34 imirkin: ok yeah, you won't be able to hit the condition.
17:34 karolherbst: :/
17:34 imirkin: stick an assert in and move on
17:36 karolherbst: yeah well, but how do I get the indirect instr. do I have to iterate over all sources of the def instructions?
17:36 imirkin: same as for source
17:37 imirkin: def->getIndirect(0, 0)
17:37 karolherbst: ohh
17:37 karolherbst: mhhh
17:37 karolherbst: makes sense somehow
17:38 karolherbst: have to improve the indirect logic anyway
17:39 karolherbst: I am sure off that in the end I scheduled all schedulable instructions and end up with a list of nonschedulable instructions...
17:39 imirkin: yeah, even a store takes its "dest" as a source. so i think there are never any indirect defs
17:41 karolherbst: anyway, when I do getIndirect(), how do I know its a def?
17:41 imirkin: you don't...
17:41 imirkin: it's just a value
17:41 imirkin: oh wait, yeah, getIndirect takes a source number
17:41 imirkin: further confirming that defs can't have indirect things
17:41 karolherbst: okay
17:41 karolherbst: so I can ignore that
17:41 imirkin: the indirectness is stored in the ValueRef i think
17:42 imirkin: and there just isn't one in ValueDef
17:42 karolherbst: ohhhh
17:42 karolherbst: now I see that
17:43 imirkin: there's a *lot* of stuff to take in here
17:43 imirkin: it took me like a year to get familiar with all of it
17:43 imirkin: and there are still parts i don't touch
17:43 imirkin: like RA
18:10 karolherbst: yeah, never use value = list.front() or *list.begin()
18:10 karolherbst: ...
18:10 karolherbst: got libc malloc/free verification error
18:11 karolherbst: imirkin: guess why: front() begin() return bs if the list is empty
18:11 karolherbst: ...
18:11 karolherbst: never use std container with pointer, never ...
18:14 tobijk: isnt ther some isEmpty() thing? :D
18:16 karolherbst: empty() yes
18:16 karolherbst: but look at this:
18:16 karolherbst: Instruction *insn = schedulable.front();
18:16 karolherbst: you might thing that insn == NULL if schedulable.empty()
18:16 karolherbst: but no
18:16 karolherbst: its ot
18:16 karolherbst: *not
18:17 tobijk: "Calling this function on an empty container causes undefined behavior."
18:17 tobijk: :D
18:17 karolherbst: ...
18:17 tobijk: for front()
18:18 karolherbst: I bet in c++11 its not undefined
18:18 karolherbst: oh wait, it does
18:18 karolherbst: *is
18:18 karolherbst: :/
18:19 karolherbst: even pop_front() is undefined
18:19 karolherbst: ...
18:19 tobijk: at least cplusplus.com does say so
18:19 karolherbst: yeah
18:22 tobijk: imirkin: any idea how to push clip/cull through as one thing to gallium in a sane way, so we dont have to sort these in gallium or the driver?
18:22 tobijk: i'd really like to do it in the glsl if possible
18:23 tobijk: manipulating the index seems a good idea at first sight, but on a second thought i'm not that sure
18:53 karolherbst: strange
18:54 karolherbst: somehwere in this fuction is something not right https://github.com/karolherbst/mesa/commit/4552d54c9b6e821538f9079ca9a458083786d08d#diff-bb3cc04dda7921a13da7e4e48cc6166dR478
18:55 karolherbst: I never hit "else std::cout << "found dep in output" << std::endl;"
18:55 karolherbst: but I should
18:57 karolherbst: ahh uses not src :(
19:15 imirkin: more like don't use .front() unless you're sure the list isn't empty
19:15 imirkin: tobijk: you could create a CLIPCULLDIST semantic
19:15 imirkin: tobijk: and pass that in, along with a mask, as i had suggested
19:16 imirkin: that'll make brian less-than-happy... probably.
19:16 tobijk: well what would make him happ(ier) then?
19:16 imirkin: separate
19:16 tobijk: just do the sorting on the gallium level?
19:17 imirkin: i dunno, i'm sort of ambivalent
19:17 tobijk: or seperate + mask :D
19:17 imirkin: you should bring up any questions on the list though
19:17 imirkin: rather than here
19:17 imirkin: since you'll get competent opinions from both brian and marek
19:17 imirkin: instead of idiot opinions from me
19:18 tobijk: he, well i dk if my opinions are woth discussing really :/ (maybe the dont even work)
19:20 imirkin: which is why you should be like "what is the best way to do this"
19:20 imirkin: and they will provide you with something
19:20 imirkin: instead of coming in posing an expert when you're clearly not :)
19:20 tobijk: i should just push this to the ml as is and enable intel ;-)
19:23 imirkin: sure
19:24 tobijk: not meant that too serious...
19:24 imirkin: i did :p
21:32 SolarAquarion: i'm having issues with nouveau and lightdm
21:32 SolarAquarion: it doesn't load
21:32 SolarAquarion: and i need to use gdm
21:32 SolarAquarion: i'm having issues also logging out for some reason
21:32 SolarAquarion: on most DM's
21:32 SolarAquarion: and WMs
22:45 pq: imirkin, cool, once I figure out how to update the wiki again, I can remove all reference to my snapshot, but documenting any new ways would be left for someone else.
22:46 imirkin: pq: i can probably take care of it... what were the pages again?
22:49 pq: imirkin, http://nouveau.freedesktop.org/wiki/InstallNouveau/ and http://nouveau.freedesktop.org/wiki/InstallDRM/
22:49 imirkin: cool thanks
22:50 pq: imirkin, I wonder if you find most of the content on those pages just misleading nowadays :-)
22:51 imirkin: pq: i find most content on most pages misleading :)
22:51 imirkin: pq: ever tried going to like cnn.com? :)
22:54 pq: I'd consider techincal sites should be better than the general crap :-p
23:07 imirkin: i'm just going to nuke the InstallDRM page
23:07 imirkin: nouveau wiki is not the place for a git tutorial
23:07 pq: good riddance
23:08 pq: don't forget translated pages if such still exist
23:08 imirkin: i killed those a long time ago
23:09 imirkin: replaced with the google translate widget
23:10 pq: nice
23:10 pq: oh, there's a http://nouveau.freedesktop.org/wiki/InstallNouveau-old/ too
23:11 imirkin: yeahhhh
23:11 imirkin: wtvr
23:11 imirkin: i think mupuf was queezy about deleting it
23:11 imirkin: nothing references it... kill i think
23:11 pq: *shrug*
23:12 pq: InstallDRM referenced it :-)
23:12 imirkin: other way around
23:12 imirkin: it references InstallDRM
23:12 pq: oh
23:12 imirkin: ... and it's gone
23:21 pq: imirkin, thanks, looks like I can now delete my cron job from people.fdo and the tar-ball
23:23 pq: done.
23:25 pq: it had been running there daily for almost 6 years
23:28 mupuf: pq: ah ah