00:13pmoreau: Almost done with structure support, only need to better handle OpUndef, and figure out why combineLd fails (somehow sizeRc ends up being negative, and so it tries to access defs that do not exist).
03:39koz_: Seems my issues with games randomly powering down my machine had nothing to do with nouveau, and everything to do with me trying to pull nearly 500W of power from a 400W PSU.
03:50Moondhum: I want to start 'DRI_PRIME=1 0ad' instead of just '0ad' so I edited my .desktop file. Now I get an error "Failed to execute child process "DRI_PRIME=1" (No such file or directory). How can I sucessfully add DRI_PRIME=1
03:53Moondhum: solved :)
06:01moongazer: pmoreau, I was able to compile it yesterday with your advice.
06:34moongazer: "WARNING: these tools *can* and *will* hang your machine if you don't know
06:34moongazer: what you're doing. Hardware destruction is likely also possible, although
06:34moongazer: no incidents are known to have happened yet. In most cases it's also not
06:34moongazer: recommended to use these tools while a driver is active for a given card.
06:34moongazer: What do you mean by 'while a driver is active for a given card?'
06:49gnarface: moongazer: i think they mean just don't be using the card for anything at the same time as you're poking at it
06:49gnarface: moongazer: like for display or whatever (don't load the driver on it)
06:49moongazer: gnarface, I don't get you.
06:50moongazer: gnarface, You need to analyze while you are using it, right? Only that way you know what is waht
06:50gnarface: moongazer: pretend you have more than one video card in a machine and you don't HAVE the driver for the second one, so you can't use it for display, and you're not mining bitcoins with it or anything. no driver loaded, nada.
06:51gnarface: moongazer: in that case, they're saying the second card is safe to poke at, but not the first
06:51moongazer: gnarface, ok...
06:51moongazer: gnarface, what development do you do here?
06:52gnarface: moongazer: absolutely none. i'm just watching from the sidelines.
06:52moongazer: gnarface, um...so how much do you know
06:52moongazer: like the code
06:52moongazer: I had a few doubts
06:54gnarface: moongazer: i know nothing about the driver at a code level, sorry. you're well on your way to surpassing me in understanding, and it's been somewhat educational for me to watch. but i've been using linux and nvidia hardware for *decades* now, that's why i pitch in when it seems like you have a conceptual question that nobody else is interested in answering for you
06:55gnarface: moongazer: my best guess is that if you have doubts about the code, your doubts are probably valid, but there's also probably a good excuse
06:55gnarface: (something like obstruction from Nvidia)
06:56gnarface: make no mistake, i salute your bravery. you've jumped into the deep end and you didn't even look back once.
06:56moongazer: gnarface, o_0 what are you saying are you asking me to stop or something
06:57gnarface: moongazer: no, no, far from it. is english your first language?
06:58gnarface: moongazer: what i'm saying is that i realize the task you've undertaken would be easier for me than you (though still damn near impossible) but i've been simply too lazy to try.
06:59moongazer: gnarface, no english isn't my first language, but I understand it as much as you do. I have been learning it since I was three years old. I do get what you mean, but you seem to be somewhat...pessimistic
06:59moongazer: gnarface, so tell me about the hardware as much as you know then
07:00gnarface: moongazer: sorry, i'm trying to be encouraging. it's fair to say i'm pessimistic. lots of bad things happened to me.
07:01gnarface: moongazer: i don't think i know enough to be useful just lecturing about it... i will pitch in with answers when i think it's something i know for sure you're missing though.
07:02moongazer: okay umm
07:02moongazer: Do you know how to use envytools?
07:02moongazer: gnarface, Lookie here: https://github.com/envytools/envytools/tree/master/nva
07:02gnarface: nope, i don't know envytools at all. your struggle with it has been educational for me.
07:02moongazer: I am trying out these commands
07:03moongazer: So far what I have read in the documentation indicates that MMIO registers are the only way to communicate with GPU...
07:03moongazer: gnarface, >.>
07:03gnarface: i believe it
07:03moongazer: gnarface, like were you laughing all the way long
07:04gnarface: no no, not laughing. you came in here asking where to start. someone told you, and you actually started on it. but then you had follow-up questions that i didn't ask. this helps me understand where to start too.
07:06gnarface: where i may be able to help that many of the others here won't bother with is if you have questions about expected behavior that's more general to Linux, that their documentation may forget to mention because it assumes too much about what you already know
07:09gnarface: see, i came in here to watch discussions with a sortof vague long-term goal of participating too. but in just a few days you've already progressed more to than end than i have in a year. it's encouraging actually. but i sense that you've talked to some people who have somewhat of a frustration that you don't seem to know prerequisite knowledge, and they don't have the patience to teach you that stuff. i hate that.
07:09gnarface: i always hated that when people did that to me. so now i try to have patience with people who are actually trying to learn, regardless of their starting knowledge.
07:10gnarface: no sarcasm.
07:10gnarface: sorry it's not much help though.
07:11moongazer: gnarface, how old are you?
07:11gnarface: i am 1000 years old
07:12gnarface: but i have mostly squandered the time
07:13mgottschlag: moongazer: MMIO is not necessarily the only way to communicate with the GPU... I'd consider command submission channels (polled by the GPU) not to be MMIO
07:14mgottschlag: more like shared memory
07:14moongazer: mgottschlag, oh ok. Maybe I will encounter them later. Is there any doc where I could read how a driver was developed? That would be helpful.
07:14moongazer: gnarface, ~.~
07:15mgottschlag: "how a driver was developed" - what do you mean?
07:15mgottschlag: I haven't worked with nvidia GPUs for a long time, and back then, I did not work with nouveau directly, and it was only for a short research project
07:16moongazer: oh ok
07:16mgottschlag: https://os.itec.kit.edu/downloads/publ_2013_kehne-ua_gpumigration.pdf <- my work on the matter... there is a very short text on the theoretical basics in section II
07:17mgottschlag: on *some* theoretical basics
07:18mgottschlag: most of the driver is actually implemented in Mesa
07:18mgottschlag: communicating with the GPU through those command submission channels
07:18mgottschlag: anyways, off to work, bbl
07:28mgottschlag: [09:16:47] <mgottschlag> https://os.itec.kit.edu/downloads/publ_2013_kehne-ua_gpumigration.pdf <- my work on the matter... there is a very short text on the theoretical basics in section II
07:28mgottschlag: [09:17:09] <mgottschlag> on *some* theoretical basics
07:28mgottschlag: really off to work now, bbl
07:50jkliemann: I have a question about mmiotrace. Is it possible to filter by address ranges, especially from the grub cmdline? or is there a better way to trace driver that load on boot?
07:54jkliemann: thanks, but how can I filter this at runtime. Grepping is no option. The driver I try to trace (pinctrl-baytrail) loads at boot time, but mmiotraing everything at boot time isn't possible
07:58moongazer: So basically does this mean that we have to figure out where the decoding engine sits in memory?
07:58jkliemann: are we talking about the same thing?
08:00moongazer: jkliemann, errr....no
08:00jkliemann: oh ok, I started wondering ^^
08:00moongazer: *sits at which address
08:00moongazer: jkliemann, what are you doing?
08:03jkliemann: I try to trace the intel gpio mem range for an atom core. It's used by the pinctlr-baytrail driver that cannot built as a module so it slways does it's ioremap at boot time. But when I enable mmiotrace at boot time (in grub) it slows down the boot so horribly that linux wont boot at all
08:04jkliemann: Is there a way to set a filter (like ftrace function filters) for specific memory ranges (I know which addresses I need) or to trigger a second ioremap after boot for a non-module driver?
08:05moongazer: jkliemann, Sorry, I am just a beginner here; and know a lot less than you
08:07jkliemann: ah, no problem ;) I can give you one tip, don't enable mmiotrace on boot, i just wont boot anymore
08:19karolherbst: jkliemann: there is a bug somewhere I am not able to track down really. It's something silly for sure, but maybe you can help out debugging it. What are you trying to mmiotrace, nvidia or nouveau or something else?
08:19karolherbst: ohh intel gpio
08:20jkliemann: karolherbst yep, theres some undocumented gpio range in the atom cores, but intel only gives commercial support for this range but thats no option for me
08:20karolherbst: jkliemann: okay, I won't have time for this right now (work), but I could send you a patch you could apply to debug the mmiotracer a little more in depth, which might help
08:21karolherbst: jkliemann: weekend would be perfect though to debug this
08:21karolherbst: sadly I can't trigger this issue on my machine for whatever reason
08:22jkliemann: karolherbst: do you mean a bug in mmiotrace? I think it wont boot because it creates mmioaccess when the logs are written which creates more logs by mmiotrace. this loop is going to lock down at some point. Also tracing slows down the kernel so much at boot that I get noumerous stack traces caused by interrupt timeouts
08:23karolherbst: jkliemann: yes
08:24karolherbst: but maybe you just step into a recursive thing indeed
08:24karolherbst: okay, here is the deal
08:24karolherbst: jkliemann: can you compile that intel stuff as a module?
08:24karolherbst: and load it later?
08:24karolherbst: or does anything depend on it
08:24jkliemann: I dont think this is a bug but a conceptual problem that you cant trace the whole address range at boot. I think creating a filter for mem ranges could solve this problem
08:24karolherbst: jkliemann: but we could also add options to the mmiotracer to just trace certain devices or so
08:25jkliemann: I can't built it as a module, I tried to force it in the kernel config but it either gets build in the kernel or not at all
08:25karolherbst: jkliemann: usually the mmiotracer only traces devices which gets their module loaded later
08:25karolherbst: okay I see
08:25jkliemann: that could be an option too, especially to exclude mmc woule be helpful as I think this is what causes the lockup
08:26karolherbst: jkliemann: well the mmiotracer is there to RE certain devices/drivers, so having the option to whitelist stuff and disable everything else makes sense
08:26karolherbst: I wouldn't do it on memory ranges
08:26karolherbst: especially because those aren't always the same
08:27jkliemann: thats true, I think how fine grained those device listings could be? The atom Z core has three different mmio ranges (where two are documented but not the third I use)
08:28karolherbst: jkliemann: I was thinking about PCI devices listed in lspci, but maybe that's not fine enough for you
08:29karolherbst: jkliemann: but couldn't you just read the source code? to figure out how it works?
08:30jkliemann: Well having all gpio ranges would be ok, but Idk what is also on that core, there isn't much PCI on this device, most things are done by the core itself
08:31jkliemann: I'm already doing this, but reading the source is quite hard (I think especially on intel drivers) and it doesn't always give you the correct understanding of what is done ( I noticed that after I traced the i2c bus with mmiotrace)
08:31karolherbst: I see
08:35jkliemann: karolherbst: are you the developer of mmiotrace?
08:41karolherbst: jkliemann: no, I just fixed a bug
08:42karolherbst: jkliemann: but the developer doesn't want to develop it anymore :p
08:49pmoreau: koz_: loool :-D Good thing that you found out the issue before destroying your PSU!
08:53jkliemann: karolherbst: ah ok, well that doesn't make it easier. and how are you going to upstream your bugfex then? is there still some kind of "official" repository?
08:53karolherbst: jkliemann: no there isn't. It will go through the tracer maintainer
08:53karolherbst: I already landed a patch, so this won't be a problem
08:54pq: jkliemann, if you can modify the driver, how about hacking out the normal ioremap hookup and making your driver call the mmiotrace stuff instead of ioremap directly?
08:55pq: or some shim to achieve the same
08:56jkliemann: karolherbst: ah ok, thanks for the info
08:58jkliemann: pq: thanks for the idea, I'm going to try if I can make the driver call mmiotrace directly
14:23moongazer: karolherbst, hi. Well, the compilation is done now. I read about nva tools as well and tried some of them
14:26karolherbst: moongazer: nice :) were you able to run your self compiled out of tree nouveau module?
14:27moongazer: karolherbst, yes I did that in the morning.
14:28moongazer: karolherbst, Look at the link above. Now, the card that I aim to work on; I have to figure out at which address the video decoding engine resides, right?
14:29moongazer: karolherbst, What do you suggest are the next steps for me?
14:32karolherbst: moongazer: write a small test application decoding 1-2 frames against the NVidia hardware and use valgrind-mmt and mmiotrace to see what is going on
14:33moongazer: karolherbst, duly noted. Any hints?
14:33karolherbst: use VDPAU
14:34karolherbst: or use an application which uses vdpau
14:34karolherbst: but it makes sense to use the VDPAU API directly. imirkin has a repository with stuff as well
14:34moongazer: But we don't know where the video decoding engine is
14:34karolherbst: here is some stuff he did for VP2: https://github.com/imirkin/re-vp2
14:34mwk: moongazer: of course we know
14:35mwk: which card are you working on?
14:35karolherbst: mwk: maxwell
14:35mwk: 0x84000-0x84fff then
14:36moongazer: mwk, https://envytools.readthedocs.io/en/latest/hw/mmio.html#gf100-mmio-map
14:36moongazer: mwk, That says :GM107 so it is not included right
14:36karolherbst: moongazer: ignore those docs and check inside rnndb directly
14:36mwk: yeah, well, fixing that now
14:38karolherbst: but I think there is little known about the maxwell engine right now
14:38mwk: pretty much only its MMIO address
14:38mwk: well, and if you look at the MMIO scan, you can see registers similar to VP5
14:38moongazer: mwk, still you got 2d and 3d rendering to work
14:38mwk: but rearranged
14:39karolherbst: moongazer: that's not video decoding
14:39karolherbst: we talked about video decoding
14:39karolherbst: and there is little known right now
14:39moongazer: karolherbst, I know it's not video decoding
14:39karolherbst: so you basically start from scratch, only having knowledge about the previos generation
14:39moongazer: I was just saying that you people figured out that part
14:40karolherbst: well, sure, but those things don't change as much generally
14:40moongazer: karolherbst, I seee
14:40karolherbst: maybe the maxwell video decoder is also basically the same
14:40karolherbst: nobody knows
14:43moongazer: karolherbst, I want to work on itt
14:44mwk: it appears I've managed to misplace my register scans... again...
14:47karolherbst: moongazer: nice
14:55karolherbst: moongazer: basically this is how such projects work: you will work on this project yourself, but you can always ask for help, but nobody here is responsible for getting your project done, so that's your responsibility. The mentor isn't responsible for your time management as well. Or is responsible to check how far you succeeded or to figure out if you lack something (knowledge/ability/etc...). The mentor is
14:55karolherbst: basically there to help you with community stuff, not with technical issues you encounter, allthough he can do both, but the latter only in his role of being member of the community, not as a mentor
14:57karolherbst: moongazer: also you need an upstream patch before you can attend the EVoC program, this can be anything and should only verify, that you are able to deal with Mailing lists/git/upstreaming patches
14:58moongazer: karolherbst, Okay. So can you suggest me some beginner ones so that I can begin?
14:59karolherbst: moongazer: it can be anything. Depends on your interest and what you think you can already do. Most basic thing would be whitespace/documentation fixes, but you could also search bugzilla for bugs you may want to fix (which can already take a month if you get unlucky).
15:00moongazer: karolherbst, a month why
15:00moongazer: Ok wait
15:00karolherbst: well bugs can be complicated
15:01karolherbst: also you might need too much time to figure things out
15:01karolherbst: that's why most suggest trivial whitespace/documentation fixes
15:01karolherbst: it really depends on how familiar you are with stuff
15:01pmoreau: Or way more than a month, for some. :-)
15:01karolherbst: pmoreau: well, I was only thinking about bugs some might think are trivial
15:02karolherbst: those are usually not bigger than one month
15:02pmoreau: Of course :-)
15:02karolherbst: but yeah, that could backfire as well
15:26moongazer: karolherbst, does it have to be nouveau or can it be envytools
15:27karolherbst: moongazer: anything xorg related
15:28moongazer: and envytools is x-org related
15:30karolherbst: well it isn't included in any software
16:47moongazer: I really need help discovering a simple enough bug
17:03john_cephalopoda: pmoreau: Do you know anything new yet about the freezes I experienced?
17:04john_cephalopoda: pmoreau: Also, you asked for a free (as in beer?) game to test it. Here is one: https://kabiscube.itch.io/corpseboxracers
17:07moongazer: pmoreau, can you help me in this regard
17:12karolherbst: imirkin: seems like I was slobby while rebasing my patches. For fixing those CTS precise issues I don't neet the mad splitting patch anymore....
17:13karolherbst: I will remove that splitting patch then, because we apperantly won't need it, because using fma to implement mad is totally fine
17:30pmoreau: john_cephalopoda: Thanks for the link, I’ll try it on my system; it has a GK107, which shouldn’t be too different.
17:34pmoreau: moongazer: Mmh… Maybe https://trello.com/c/lpudRntI/179-gm107-use-getreadlatency-for-reducing-stall-counts ? It’s not a bug, and is not related to video encoding/decoding, but it shouldn’t be too hard.
17:34pmoreau: What do you think hakzsam? -^
17:50pmoreau: john_cephalopoda: That game is running fine on my GK107 :-(
18:26moongazer: Guess I must look at other projects and study this one over time slowly
18:43pmoreau: john_cephalopoda: Do you reclock the GPU before playing the games?
19:09imirkin_: karolherbst: we should do that anyways, irrespective of any precise thing.
19:09karolherbst: imirkin_: there is really no reason though
19:10karolherbst: if TGSI_MAD is considered being fused or unfused mad, then we can legally do this
19:10karolherbst: except there is a different valid reason why we shouldn't
19:10imirkin_: well, you lose out on CSE opportunities
19:10karolherbst: I ran shader-db and the overall results were worse
19:11imirkin_: and i think that we need to stop doing IMAD, since it's slower than mul + add.
19:11karolherbst: yeah, that makes sense
19:11karolherbst: but I would keep it seperate from the precise series then because it is unrelated
19:11imirkin_: separately, someone needs to figure out how to operate XMAD on maxwell+
19:11imirkin_: sure, that's fine.
19:11karolherbst: anyway, sent out v3 of the series
19:12RSpliet: imirkin_: is imad somehow dual-issuable with another mul or add?
19:12imirkin_: RSpliet: no clue
19:12imirkin_: on maxwell it requires a *barrier*
19:12karolherbst: RSpliet: usually yes
19:12karolherbst: but with more constraints
19:13karolherbst: RSpliet: ohh wait, no, you are right
19:13karolherbst: it can't be
19:13karolherbst: only int ADDs can be dual issued with other arithmetic instructions
19:14karolherbst: that might explain why IMAD is that slow
19:14RSpliet: I'm starting to become really curious about which insn is executed by what piece of hw
19:14karolherbst: RSpliet: usually you can dual issue all float arithmetic instructions
19:14karolherbst: but yeah, integer maths is a different kind of think
19:15karolherbst: RSpliet: I think every float math instruction can be execute on the SFU as well
19:15RSpliet: karolherbst: on kepler that made sense given how there's 1.5 "FPU" per SM
19:15karolherbst: on gt200 something like that was written
19:16RSpliet: ehh... well, you get what I meant
19:16karolherbst: according to nvidia you could execute float mad on a SFU/FPU and dual issue with another mul on the FPU or something like this, letme dig it up again
19:17karolherbst: hol up
19:17karolherbst: fermi whitepaper: "Most instructions can be dual issued; two integer instructions, two floating instructions, or a mix of integer, floating point, load, store, and SFU instructions can be issued concurrently. Double precision instructions do not support dual dispatch with any other operation."
19:18imirkin_: there's FMA.FMA and FMA.FMA2 apparently
19:18imirkin_: perhaps for that reason
19:20karolherbst: might explain why I didn't got it to work on tesla
19:20karolherbst: if it has to be done explicitly
19:20karolherbst: I should read more of nvidias white paper
19:21karolherbst: maybe I should implement dual issueing on kepler just like that sentence above and see what the result is
19:25Lyude: Anyone know how long I should expect it to take for git to pull in all of https://github.com/skeggsb/linux ? git remote update seems to keep hanging for me, but it looks like wireshark says there's some data trickling to it from github and I'm curious if anyone else is having issues with fetching this taking forever
19:25karolherbst: Lyude: it
19:25karolherbst: 's a fully linux kernel, what do you expect? :p
19:26Lyude: i guess so, I just don't remember it ever taking this long on other people's branches since most of the repo objects should already be on my machine (I have kernel.org 's repo cloned already)
19:26Lyude: wonder if github just takes a very long time to process a whole repo like this
19:26karolherbst: Lyude: do you have a cloned kernel tree already?
19:26Lyude: (it doesn't seem to have gotten to the actually counting remote objects part)
19:26karolherbst: add another remote and fetch from it ;)
19:27Lyude: it's this specific one I need though to check whether or not anyone fixed a bug I just found
19:27Lyude: something with vblank is breaking suspend/resume on my machine
19:27airlied: make sure you have latest ddz
19:28Lyude: airlied: no this is without any kind of GUI running
19:28Lyude: i've already gotten kdump to spit out a stack trace, which is why i know it's vblank related
19:29karolherbst: Lyude: yeah, add skeggsb tree as another remote
19:29Lyude: that's what I've done, yeah
19:29karolherbst: mhh, then fetching from it shouldn't take that long
19:31airlied: if skeggsb has pushed any tags from Linus, git can take ages
19:32airlied: at least the next person asks me to resync my tags to avoid remote update slowness
19:32karolherbst: maybe I should refetch and check
19:33karolherbst: mhh, goes fast enough
19:33karolherbst: 117k objects
19:34karolherbst: and no new tags
19:37Lyude: ok, using git pull --verbose instead of git remote update seems to have shown that it is actually doing something, just very very slowly because of the remote...
19:43karolherbst: imirkin_: can I tell nvdisasm to print sched opcodes on SM30?
19:48imirkin_: Lyude: usually i just add remotes on a regular linux tree
20:04airlied: hasnt pushed tahs i meant
20:07imirkin_: i just have a single kernel tree with 25 different remotes
20:07imirkin_: and i use git-newworkspace as necessary
20:07imirkin_: or whatever that thing is called
20:08karolherbst: that "if (clA == OPCLASS_TEXTURE || clA == OPCLASS_FLOW) return false" check is wrong
20:09karolherbst: +1.5% dual issueing in pixmark_piano if that OPCLASS_FLOW check is removed
20:11imirkin_: how can you dual-issue a flow op?
20:11karolherbst: no clue
20:12karolherbst: ohh wait
20:12imirkin_: i guess it'd be fine for a OP_JOINAT
20:12karolherbst: those are always executed aren't they?
20:12karolherbst: just splitted in the warp or something
20:12karolherbst: I guess this could improve some branched code
20:14karolherbst: imirkin_: joinats are only responsible for around 15% of that improvement
20:16snkcld: any idea when the GTX 1050 will be supported by nouveau?
20:17Lyude: snkcld: it is, however your kerne may not be up to date enough to actually know how to load the display firmware onto it
20:18snkcld: im running 4.11.5
20:18Lyude: I think you need v4.12+?
20:18imirkin_: and updated linux-firmware.
20:18imirkin_: as the original gp107 firmware that went into linux-firmware was wrong.
20:18snkcld: ok, well im still on 4.11 ... hmm, ill wait i guess
20:18karolherbst: imirkin_: it's OP_BREAK
20:19imirkin_: karolherbst: yeah dunno
20:19karolherbst: I don't either
20:19snkcld: does the master branch of linux-firmware container the up to date firmware?
20:19Lyude: snkcld: yes
20:20snkcld: which file is the one in question?
20:20snkcld: i did a git pull and noticed only files in tegra changed, as far as nvidia goes
20:22Lyude: it might be that you only need a newer kernel then
20:25snkcld: is it possible to use cuda via nouveau?
20:25imirkin_: snkcld: */gp107/*
20:25imirkin_: snkcld: no.
20:26snkcld: is there any like... GPGPU hardware that has working open source drivers?
20:26snkcld: is the lack of cuda capability a legal thing?
20:27snkcld: or it just hasnt been reverse engineered yet?
20:28snkcld: sounds like a fun project
20:28imirkin_: if you want working open-source GPU drivers, use AMD
20:28imirkin_: their hw is well supported, and you can get access to compute functionality via OpenGL compute shaders
20:28Lyude: yeah, as it is not financially supporting nvidia also definitely helps us out...
20:29Lyude: karolherbst: hm?
20:29karolherbst: Lyude: unrelated to what you said
20:30Lyude: ooh ok
20:33snkcld: is the compute-via-shaders access a hack or is that actually how they implement their compute stuff?
20:35karolherbst: snkcld: it's part of OpenGL
20:35snkcld: yae i know
20:35snkcld: but i mean, as opposed to a direct interface to the compute stuff
20:36karolherbst: OpenCL is also no direct interface
20:36karolherbst: same goes for CUDA
20:36karolherbst: they all have different scopes and are all APIs having big runtimes managing the hardware and so on
20:37karolherbst: compute shaders just have their scope as being used within OpenGL applications
20:37karolherbst: and OpenCL has it's scope as being used in headless environments mainly where you can even mix hardware doing the same stuff
20:38karolherbst: and on a normal desktop compute shaders are totally fine, because most appliations these days use OpenGL anyway (GUI toolkits mainly)
20:38karolherbst: but most just use OpenCL, because it's way older
20:39imirkin_: yeah, it's all APIs
20:39pmoreau: And it all runs on the same units anyway IIRC
20:39karolherbst: the main focus for nouveau is the desktop, so OpenCL/CUDA is just less important to us
20:40imirkin_: there's some kind of program, it gets compiled into a shader, that shader gets executed along with the requisite data for that shader to do its work.
20:40imirkin_: the program is specified slightly differently with the different APIs, but it's all basically the same
20:41karolherbst: well OpenCL contains a lot more synchronisation stuff between hardware I figure
20:41pmoreau: imirkin_: If I load an 8-bit or 16-bit value into a 32-bit register, and Nouveau tries to coalesce does, sadness happens: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp#n2488
20:41imirkin_: what would you like me to do about it? :)
20:41pmoreau: imirkin_: Should I add a check for `sizeRc > 0` in the loop, or avoid loading a 8-bit value in a 32-bot reg?
20:43imirkin_:is too tired to really tell
20:43imirkin_: do whatever you think is right
20:43pmoreau: Cause right now, sizeRc goes from 1 to -3, and will continue looping for some time, until it wraps around to positive values again, and hit 0 one day
20:44pmoreau: Ok, I’ll think some more about it, and send a patch to the ML some day.
21:36karolherbst: :) https://gist.github.com/karolherbst/2c8adcfe47555a7dfd2348c640dc36ab
21:39karolherbst: allthough valgrind does basically says the same
22:27john_cephalopoda: pmoreau: Sorry, was afk for a while. I don't reclock before playing.
22:28john_cephalopoda: pmoreau: I should enable reclocking support and try an other frequency, maybe it has problems when it's too low. Got to look up where to do that in the kernel config.